Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARTEMIS-3340 layered over ARTEMIS-2716 - activation sequence tracking to protect the journal #3646

Closed
wants to merge 24 commits into from

Conversation

gtully
Copy link
Contributor

@gtully gtully commented Jul 7, 2021

No description provided.

@gtully gtully marked this pull request as draft July 7, 2021 20:02
@gtully
Copy link
Contributor Author

gtully commented Jul 7, 2021

leaving as draft pending complete test run and some conversation. these are new policies so the behaviour is more restrictive as indicated in the tests. I think activationSequence captures the intent nicely but am open to suggestions for better naming.

there are some nodeManager behaviours that may need a revisit.

@gtully gtully changed the title Artemis-3340 layered over ARTEMIS-2716 - activation sequence tracking to protect the journal ARTEMIS-3340 layered over ARTEMIS-2716 - activation sequence tracking to protect the journal Jul 7, 2021
@gtully gtully force-pushed the ARTEMIS-3340 branch 6 times, most recently from 0f149b6 to 0ddd602 Compare July 9, 2021 09:24
@gtully gtully marked this pull request as ready for review July 15, 2021 12:03
@gtully
Copy link
Contributor Author

gtully commented Jul 15, 2021

added some more tests to validate activation tracking and extended 'primary' policy to allow explicitly setting the nodeID. Not sure why that is a UUID and not human readable, there is already a UUID identifier. In any event, this allows 'peer' primary brokers to coordinate with each other on a shared identity which can support multi primary or peer mode. The activation sequence ensures that they don't step in each other as only the server with the current activation sequence will activate/go live. In the event of both crashing while replicated, any restarted server can activate.

@gtully
Copy link
Contributor Author

gtully commented Jul 16, 2021

all the tests are good on this branch :-)

@gtully
Copy link
Contributor Author

gtully commented Jul 16, 2021

I don't want to squash, I think all of the commits franz has are mostly isolated.

clebertsuconic added a commit to clebertsuconic/activemq-artemis that referenced this pull request Jul 16, 2021
@gtully
Copy link
Contributor Author

gtully commented Jul 21, 2021

@franz1981 @michaelandrepearce bits I would like to improve:

  • have the nodeID be human readable, just a string. it is currently a UUID b/c it it typically auto generated. However to facilitate coordination among peers it needs to be preconfigured/predetermined and it makes sense to have that human readable. using NodeID ha-server-01, state in zk could then be:
       /prod/activemq-artemis/ha-server-01/lock
       /prod/activemq-artemis/ha-server-01/activation-squence

in place of: 
       /prod/activemq-artemis/locks/ha-server-01/lock
       /prod/activemq-artemis/activation-squences/ha-server-01/activation-sequence

If the values are collocated is is trivial to check and revert an activation-sequence. I think this will help with maintainability but it may have other ramifications. It may need some consideration.

  • the RepicationBackupActivation is shared between roles and does two things... Replicate and Replicate for failback. It would be good to separate out these two jobs and have two activations. That would really simplify the peer case.

@gtully gtully marked this pull request as draft July 22, 2021 16:29
@gtully gtully force-pushed the ARTEMIS-3340 branch 3 times, most recently from aa27651 to 7eaa6a2 Compare July 23, 2021 18:17
@franz1981
Copy link
Contributor

franz1981 commented Jul 24, 2021

Not a biggie, but I see that data rotation on backup start need to (re)set activation sequence to 0 or during a fail-back the broker risk to believe to have any meaningful data (because sequence > 0), while it's not; it should look like an empty backup (with activation sequence == 0).

In addition (but I can work on this as a separate PR) I would like to provide logs or some folder naming strategy to relate activation sequence with the data version, to help recovering crazy bugs (we haven't yet discovered really :P).

This PR is getting in a good shape, enough to merge this and move on, well done @gtully !!!

@franz1981
Copy link
Contributor

Going to send a PR to the @gtully branch with:

  • refactored common methods on activations
  • fixed tests/config/docs to deprecate useless configs

These changes are related to #3646 (comment)

franz1981 and others added 20 commits July 29, 2021 17:11
…g data when appropriate and have backup wait for activation before restarting as backup after failback which avoids a race
* ARTEMIS-2716 - speed up failback backup behaviour by ignoring existing data when appropriate and have backup wait for activation before restarting as backup after failback which avoids a race

* ARTEMIS-3340 Implemented reusable curator primitive abstraction

Co-authored-by: gtully <gary.tully@gmail.com>
…lidate lock release on check for in sync replica
@gtully gtully marked this pull request as ready for review July 29, 2021 16:47
@franz1981
Copy link
Contributor

I'm going to send a new PR after squashing all commits of the @gtully's branch

@franz1981
Copy link
Contributor

@clebertsuconic #3680 replace this PR, that can be marked as not to be merged

@clebertsuconic
Copy link
Contributor

this is supeceeded by Franz's PR.

@clebertsuconic
Copy link
Contributor

#3680 replaces this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants