Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linstor fix migration while node offline #8610

Merged

Conversation

rp-
Copy link
Contributor

@rp- rp- commented Feb 5, 2024

Description

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.

This PR also containes a cherry-picked commit from the main branch f176e7d

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Nested Linstor cluster with a non hyperconverged setup and a node evicted.

How did you try to break this feature and the system with this change?

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.
@rp- rp- force-pushed the linstor-fix-migration-while-node-offline branch from 83730d2 to 1f6954c Compare February 5, 2024 10:04
@rp-
Copy link
Contributor Author

rp- commented Feb 5, 2024

Commit: 1f6954c should also be applied to 4.19 and main
But I can make a separate PR for that?

}
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rp- any case returns false (on error / exception above ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIS every upper caller is ignoring the return value anyway
But we don't need to do any disconnect, it is just a bit of a tidy up to remove the property and nothing fatal if it fails.

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm

@sureshanaparti
Copy link
Contributor

Commit: 1f6954c should also be applied to 4.19 and main But I can make a separate PR for that?

Entire PR changes can be forward merged to 4.19 / main. I think, merge excludes other changes if there are already in.

@DaanHoogland DaanHoogland added this to the 4.18.2.0 milestone Feb 6, 2024
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@rohityadavcloud rohityadavcloud merged commit 56f0448 into apache:4.18 Feb 8, 2024
22 of 25 checks passed
@rohityadavcloud
Copy link
Member

Merged based on smoketests. All changes in Linstor plugin.

dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Feb 20, 2024
* linstor: Add util method getBestErrorMessage from main

* linstor: failed remove of allow-two-primaries is no fatal error

* linstor: Fix failure if a Linstor node is down while migrating

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.
rp- added a commit to LINBIT/cloudstack that referenced this pull request May 16, 2024
* linstor: Add util method getBestErrorMessage from main

* linstor: failed remove of allow-two-primaries is no fatal error

* linstor: Fix failure if a Linstor node is down while migrating

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants