-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table #572
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed. Makes sense to me.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshelser Mind taking a look at this one sir?
Yep. It's on my list -- Shardul pinged me already :) Thanks for asking, S. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks OK to solve this issue. I'm a little worried as this is holding on to objects in memory for a potentially long time, but I can't think of a different way to get around that easily.
I think this approach will break down after a restart of the Master. RegionSize reports are kept in memory, but after a table is disabled, we'll no longer get those reports coming into the Master. Thus, we'll flip the table back to active because we think we're missing reports and we have no way to re-create the old RegionSizes we had. This might be some follow-on work to do rather than take this back to the drawing board :)
long currentEntryTime = regionSizes.get(regionInfo).getTime(); | ||
boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); | ||
// do not prune the entries if table is in violation and | ||
// violation policy is disable.prune entries older than time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you buff out this comment with an explanation as to why we need to hold onto RegionSize
s for tables in violation with a disable policy or include a reference to HBASE-22012, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
boolean isInViolationAndPolicyDisable = false; | ||
SpaceViolationPolicy policy = null; | ||
try { | ||
if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary to re-check.
QuotaObserverChore would be calling pruneEntriesOlderThan(long)
but that chore is only scheduled if quotas are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { | ||
// Get Current Snapshot for the given table | ||
SpaceQuotaSnapshot spaceQuotaSnapshot = | ||
QuotaUtil.getCurrentSnapshotFromQuotaTable(masterServices.getConnection(), tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use QuotaObserverChore#getTableQuotaSnapshot(TableName)
instead? This is going to hit hbase:quota
every time which might be costly.
Since we're inside the master already, we should be able to safely pull from the in-memory state in QuotaObserverChore
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using QuotaObserverChore#getTableQuotaSnapshot(TableName)
now.
if (spaceQuotaSnapshot != null) { | ||
// check if table in violation | ||
isInViolationAtTable = spaceQuotaSnapshot.getQuotaStatus().isInViolation(); | ||
Optional<SpaceViolationPolicy> policyAtNamespace = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just policy
? AtNamespace
doesn't seem to be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
} | ||
isInViolationAndPolicyDisable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just make this return policy == SpaceViolationPolicy.DISABLE && isInViolationAtTable;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this, then you can also remove the variable isInViolationAndPolicyDisable
completely, lift the return isInViolationAndPolicyDisable
into the catch block and make it return false
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); | ||
Configuration conf = TEST_UTIL.getConfiguration(); | ||
conf.set(QUOTA_CONF_KEY, "false"); | ||
when(masterServices.getConfiguration()).thenReturn(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you disabling the quota feature here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
public void testDisablePolicyQuotaAndViolate() throws Exception { | ||
TableName tableName = helper.createTable(); | ||
helper.setQuotaLimit(tableName, SpaceViolationPolicy.DISABLE, 2L); | ||
helper.writeData(tableName, SpaceQuotaHelperForTests.ONE_MEGABYTE * 3L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1MB should be enough to trigger the quota, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to 1MB.
MasterQuotaManager quotaManager = master.getMasterQuotaManager(); | ||
|
||
// Sufficient time for all the chores to run. | ||
Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the state you're actually expecting to happen? RegionSize reports to be sent in?
Same issue as one of your recent PRs. Can you please convert this into an wait+explicit check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. added a wait predicate.
|
||
// Check if disabled table region report present in the map after retention period expired. | ||
// It should be present after retention period expired. | ||
for (Map.Entry<RegionInfo, Long> entry : quotaManager.snapshotRegionSizes().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making this whole for-loop:
quotaManager.snapshotRegionSize().keySet()
.stream()
.filter(k -> k.getTable().equals(tableName))
.count() > 0`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@joshelser , thanks for the review..did all the changes and pushed. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bug in the latest patch (thanks findbugs) and one minor readability request. Otherwise, I think this is fine.
// check namespace in violation | ||
isInViolationAtNamespace = namespaceQuotaSnapshot.getQuotaStatus().isInViolation(); | ||
Optional<SpaceViolationPolicy> namespacePolicy = | ||
tableQuotaSnapshot.getQuotaStatus().getPolicy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the NPE from findbugs. Should be namespaceQuotaSnapshot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java
Show resolved
Hide resolved
|
||
// Check if disabled table region report present in the map after retention period expired. | ||
// It should be present after retention period expired. | ||
Assert.assertTrue(quotaManager.snapshotRegionSizes().keySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull the lambda out of the assertTrue
to make this a little more readable? e.g.
final int regionSizes = quotaManager.snapshotRegionSizes()...;
Assert.assertTrue(regionSizes > 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
💔 -1 overall
This message was automatically generated. |
Looks ok now. Let me try to get this in. |
Required some minor changes across branch-2.2/2 and more changes in branch-2.1. Re-running the UTs on branch-2.1 now. |
…lation policy Space quotas has a feature which intends to avoid enacting a space quota violation policy when only a subset of the Regions for that Table have reported their space usage (under the assumption that we cannot make an informed decision if we do not include all regions in our calculations). This had the unintended side-effect, when a table is disabled as a part of a violation policy, of causing the regions for that table to not be reported which disables the violation policy and enables the table. Need to make sure that when a table is disabled because of a violation policy that the code does not automatically move that table out of violation because region sizes are not being reported (because those regions are not open). Closes #572 Signed-off-by: Josh Elser <elserj@apache.org>
…lation policy Space quotas has a feature which intends to avoid enacting a space quota violation policy when only a subset of the Regions for that Table have reported their space usage (under the assumption that we cannot make an informed decision if we do not include all regions in our calculations). This had the unintended side-effect, when a table is disabled as a part of a violation policy, of causing the regions for that table to not be reported which disables the violation policy and enables the table. Need to make sure that when a table is disabled because of a violation policy that the code does not automatically move that table out of violation because region sizes are not being reported (because those regions are not open). elserj: Had to replace some usages of Optional that don't exist in branch-2.1 Closes #572 Signed-off-by: Josh Elser <elserj@apache.org>
…lation policy Space quotas has a feature which intends to avoid enacting a space quota violation policy when only a subset of the Regions for that Table have reported their space usage (under the assumption that we cannot make an informed decision if we do not include all regions in our calculations). This had the unintended side-effect, when a table is disabled as a part of a violation policy, of causing the regions for that table to not be reported which disables the violation policy and enables the table. Need to make sure that when a table is disabled because of a violation policy that the code does not automatically move that table out of violation because region sizes are not being reported (because those regions are not open). Closes #572 Signed-off-by: Josh Elser <elserj@apache.org>
…lation policy Space quotas has a feature which intends to avoid enacting a space quota violation policy when only a subset of the Regions for that Table have reported their space usage (under the assumption that we cannot make an informed decision if we do not include all regions in our calculations). This had the unintended side-effect, when a table is disabled as a part of a violation policy, of causing the regions for that table to not be reported which disables the violation policy and enables the table. Need to make sure that when a table is disabled because of a violation policy that the code does not automatically move that table out of violation because region sizes are not being reported (because those regions are not open). Closes apache#572 Signed-off-by: Josh Elser <elserj@apache.org>
…lation policy Space quotas has a feature which intends to avoid enacting a space quota violation policy when only a subset of the Regions for that Table have reported their space usage (under the assumption that we cannot make an informed decision if we do not include all regions in our calculations). This had the unintended side-effect, when a table is disabled as a part of a violation policy, of causing the regions for that table to not be reported which disables the violation policy and enables the table. Need to make sure that when a table is disabled because of a violation policy that the code does not automatically move that table out of violation because region sizes are not being reported (because those regions are not open). Closes apache#572 Signed-off-by: Josh Elser <elserj@apache.org> (cherry picked from commit 16d6a9f) Change-Id: I01cbb39ef7046fedf2feff558b9f414b6397a19e
Space Quota: Policy state is getting changed from disable to Observance after sometime automatically.
Steps:
1: Create a table with space quota policy as Disable
2: Put some data so that table state is in space quota violation
3: So observe that table state is in violation
4: Now wait for some time
5: Observe that after some time table state is changing to to Observance however table is still disabled
Solution: it is better to not remove the entries from cache regionSizes if table is in violation and has DISABLE policy set on it to avoid this repeat cycle of enable/disable. If the entries are not removed from the cache for disable policy and violation tables the table will not go from disable to enable even though regions of the table are offline.