Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: auto recover support repaired not adhering placement ledger #3359

Merged

Conversation

horizonzy
Copy link
Member

@horizonzy horizonzy commented Jun 25, 2022

Descriptions of the changes in this PR:
There is a user case.

  1. They have two zones, they have a rack aware policy that ensures it writes across two zones
  2. They had some data on a topic with long retention
  3. They ran a disaster recovery test, during this test, they shut down one zone
  4. During the period of the DR test, auto-recovery ran. Because the DR test only has one zone active, and because the default of auto-recovery is to do rack aware with the best effort, it recovered up to an expected number of replicas
  5. They stopped the DR test and all was well, but now that ledger was only on one zone
  6. They ran another DR test, this time basically moving data to the another zone, but now data is missing because it is all only on one zone

We should support a feature to cover this case.

@horizonzy horizonzy changed the title Feature: auto recover support repaired not adhering placement ledger [WIP] Feature: auto recover support repaired not adhering placement ledger Jun 25, 2022
@horizonzy horizonzy changed the title [WIP] Feature: auto recover support repaired not adhering placement ledger Feature: auto recover support repaired not adhering placement ledger Jun 27, 2022
@horizonzy horizonzy force-pushed the feature-auto-recover-match-placement branch from 64bc687 to 12582a7 Compare June 27, 2022 07:57
@horizonzy
Copy link
Member Author

For this case, we already support detect these ledger which ensemble is not adhering placement policy at now.
In Auditor, if user config auditorPeriodicPlacementPolicyCheckInterval, it will start a scheduled task to trigger placementPolicyCheck, In placementPolicyCheck, it will record the count of ledger fragment which not adhering placement policy.

void placementPolicyCheck() throws BKAuditException {

But it only record it to stat, not recover data to make ensemble to adhere placement policy.

So we can add a config repairedPlacementPolicyNotAdheringBookieEnabled to control is to repaired the data to adhere placement policy.

In Auditor
It will mark ledgerId to unnder replication managed if the ensemble is not adhering placement policy.

In ReplicationWorker
It will move data from old bookie to new bookie which network location is different to adhere placement policy. If there is not bookie with different network location, do nothing.

Attention
In ReplicationWoker, it just poll under replicated ledger then process it. So when get an under replicated ledger, we should check two case. 1) Is the ledger fragments loss data. 2) Is the ledger fragments is not adhering placement policy. The one fragment maybe meet both case at the same time. If so, we will ignore case 2, just repaired the data loss. If the repaired result is not adhering the placement policy, the auditor will mark it again.

@horizonzy
Copy link
Member Author

How to use it?

If we want to repaired the ledger which ensemble is not adhering placement policy, we should config two param.

auditorPeriodicPlacementPolicyCheckInterval=3600
repairedPlacementPolicyNotAdheringBookieEnabled=true

In Auditor
auditorPeriodicPlacementPolicyCheckInterval control the placement policy check detect interval, repairedPlacementPolicyNotAdheringBookieEnabled control is mark ledger Id to under replication managed when found a ledger ensemble not adhere placement policy.

In ReplicationWorker
repairedPlacementPolicyNotAdheringBookieEnabled control is to repaired the ledger which ensemble not adhere placement policy.

Attention

  1. we need ensure the config repairedPlacementPolicyNotAdheringBookieEnabled=true in Auditor and ReplicationWorker at the same time.

  2. we also need the placement policy is same between Auditor and ReplicationWorker, cause both all need use placement policy to help to process.

@hangc0276
Copy link
Contributor

ping @merlimat @eolivelli @dlg99 @zymap @reddycharan Please help take a look at this PR, thanks.

@@ -402,6 +402,7 @@ public void registerLedgerMetadataListener(long ledgerId, LedgerMetadataListener
}
}
synchronized (listenerSet) {
listenerSet = listeners.computeIfAbsent(ledgerId, k -> new HashSet<>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you bing the previous change to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will remove it in this pr.

Map<String, List<BookieNode>> toPlaceGroup = new HashMap<>();
for (BookieId bookieId : ensemble) {
//If the bookieId shutdown, put it to inactive.
BookieNode bookieNode = clone.get(bookieId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bookie shutdown, it will be removed from knownBookies immediately. It belongs to DATA_LOSS type

Copy link
Member Author

@horizonzy horizonzy Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, If the the fragment is DATA_LOSS, it won't invoke this method. It will repair data loss firstly.
But the bookie maybe shutdown after ledger check, so here replace the shutdown bookie firstlt

return bn;
}
}
throw new BKNotEnoughBookiesException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log something if we reach to this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already log, in doReplaceToAdherePlacementPolicy, it catch BKNotEnoughBookiesException and log it.

}
}

private int differBetweenBookies(List<BookieId> bookiesA, List<BookieId> bookiesB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: static?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

int ackQuorumSize,
Set<BookieId> excludeBookies,
List<BookieId> currentEnsemble) {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you don't override this method?

Arw we handling this exception in the code that calls this method?

My understanding is that we cannot provide a good default implementation.

In the called code we could catch this exception, log something and abort gracefully the operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine.

@eolivelli eolivelli self-requested a review August 7, 2022 14:07
@horizonzy
Copy link
Member Author

@eolivelli ping, address the comment, could you review it again

@equanz
Copy link
Contributor

equanz commented Aug 8, 2022

And what about the default rack shutdown bookies?
#2931 (review)

If it is being addressed, please let me know where it is.

@horizonzy
Copy link
Member Author

And what about the default rack? #2931 (review)

If it is being addressed, please let me know where it is.

I means if we didn't handle the shutdown bookies, it will be handle as default-bookie, the default-rack is different with other bookie's rack, so it won't be replaced. Now we add the shutdown bookies to excludes nodes to replace it.

@equanz
Copy link
Contributor

equanz commented Aug 8, 2022

@horizonzy
Copy link
Member Author

Now we add the shutdown bookies to excludes nodes to replace it.

Just confirm, are you mentioned here? https://github.com/horizonzy/bookkeeper/blob/6ef1e2aa8ac0780bd8199b360e840506cbc85e5d/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L1101-L1105

yes

Copy link
Contributor

@equanz equanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation.
LGTM.

@StevenLuMT
Copy link
Contributor

fix old workflow,please see #3455 for detail

@horizonzy horizonzy closed this Aug 28, 2022
@horizonzy horizonzy reopened this Aug 28, 2022
@horizonzy
Copy link
Member Author

rerun failure checks

1 similar comment
@hangc0276
Copy link
Contributor

rerun failure checks

@eolivelli eolivelli merged commit fc981ba into apache:master Sep 5, 2022
@hangc0276
Copy link
Contributor

This PR is an enhancement for auto recovery, and the new interface has a default implementation, which is compatible with the old version. I suggest cherry-picking it to branch-4.14 and branch-4.15. Do you have any suggestions? @merlimat @eolivelli @dlg99 @rdhabalia @zymap

@eolivelli
Copy link
Contributor

@hangc0276 please ask on dev@
I agree with you.
Also 4.15 and 4.16 have some problems that are blocking the adoption.

hangc0276 pushed a commit to streamnative/bookkeeper-achieved that referenced this pull request Oct 18, 2022
@rdhabalia
Copy link
Contributor

yes, we should cherry-pick it.

hangc0276 pushed a commit to streamnative/bookkeeper-achieved that referenced this pull request Mar 28, 2023
hangc0276 pushed a commit that referenced this pull request May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants