Skip to content

HBASE-21596 Delete for a specific cell version can bring back version…#2009

Merged
wchevreuil merged 2 commits intoapache:masterfrom
wchevreuil:HBASE-21596
Jul 8, 2020
Merged

HBASE-21596 Delete for a specific cell version can bring back version…#2009
wchevreuil merged 2 commits intoapache:masterfrom
wchevreuil:HBASE-21596

Conversation

@wchevreuil
Copy link
Contributor

…s above VERSIONS limit

While going through some other delete marker issue, I had found out this issue is still present on master branch. Had rebased the original patch and pushed a PR.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 7s master passed
+1 💚 checkstyle 1m 16s master passed
+1 💚 spotbugs 2m 8s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 45s the patch passed
+1 💚 checkstyle 1m 15s the patch passed
+1 💚 whitespace 0m 1s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 21s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 15s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
36m 5s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2009
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 1006133a1a0a 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9ad16aa
Max. process+thread count 84 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, overall seems good

throws IOException {
List<Cell> result = get(get, false);

void updateDeleteLatestVersionTimestamp(Cell cell, Get get, int count, int maxVersions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had not done previously because original method modified was package private. Doing now.

Comment on lines 3199 to 3200
List<Cell> result = new ArrayList<>();
result.addAll(deleteCells);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can wrap it with ArrayList constructor: new ArrayList<>(deleteCells);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

return 0;
});
List<Cell> cells = new ArrayList<>();
if (result.size() < count) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result sorting doesn't seem useful for this condition. Can we avoid sorting for this?

if (result.size() < count) {
..
..
  deleteCells.addAll(cells);
  return;
}

result.sort();

if (result.size() > count){
..
..
}else{
..
}
deleteCells.addAll(cells);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines 3175 to 3187
if (coprocessorHost != null) {
if (!coprocessorHost.prePrepareTimeStampForDeleteVersion(mutation, cell,
byteNow, get)) {
updateDeleteLatestVersionTimestamp(cell, get, count, byteNow);
updateDeleteLatestVersionTimestamp(cell, get, count,
this.htableDescriptor.getColumnFamily(family).getMaxVersions(),
byteNow, deleteCells);

}
} else {
updateDeleteLatestVersionTimestamp(cell, get, count, byteNow);
updateDeleteLatestVersionTimestamp(cell, get, count,
this.htableDescriptor.getColumnFamily(family).getMaxVersions(),
byteNow, deleteCells);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer using a boolean to make single call to updateDeleteLatestVersionTimestamp?

          boolean updateDelTs=false;
          if (coprocessorHost != null) {
            if (!coprocessorHost.prePrepareTimeStampForDeleteVersion(mutation, cell,
                byteNow, get)) {
              updateDelTs=true;
            }
          } else {
            updateDelTs=true;
          }
          if(updateDelTs){
            updateDeleteLatestVersionTimestamp(cell, get, count,
              this.htableDescriptor.getColumnFamily(family).getMaxVersions(),
              byteNow, deleteCells);
          }

Only if you feel this is more readable :)

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 3s master passed
+1 💚 compile 1m 15s master passed
+1 💚 shadedjars 6m 33s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 46s hbase-server in master failed.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 33s the patch passed
+1 💚 compile 1m 13s the patch passed
+1 💚 javac 1m 13s the patch passed
+1 💚 shadedjars 6m 43s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 51s hbase-server in the patch failed.
_ Other Tests _
-1 ❌ unit 214m 10s hbase-server in the patch failed.
243m 24s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2009
Optional Tests javac javadoc unit shadedjars compile
uname Linux cd9b83d2672a 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9ad16aa
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/testReport/
Max. process+thread count 3042 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 12s master passed
+1 💚 compile 0m 59s master passed
+1 💚 shadedjars 6m 12s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 56s the patch passed
+1 💚 compile 0m 59s the patch passed
+1 💚 javac 0m 59s the patch passed
+1 💚 shadedjars 6m 3s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 36s the patch passed
_ Other Tests _
-1 ❌ unit 227m 47s hbase-server in the patch failed.
253m 39s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2009
Optional Tests javac javadoc unit shadedjars compile
uname Linux 83d023340484 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9ad16aa
Default Java 1.8.0_232
unit https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/testReport/
Max. process+thread count 2997 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@wchevreuil
Copy link
Contributor Author

Addressed @virajjasani review comments. Fixed previous failed UTs, as those were validating the wrong behaviour.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 6m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 53s master passed
+1 💚 checkstyle 1m 16s master passed
+1 💚 spotbugs 2m 6s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 44s the patch passed
+1 💚 checkstyle 1m 14s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 20s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 16s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
41m 2s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2009
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 0cfce759e350 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e614b89
Max. process+thread count 84 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 31s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 12s master passed
+1 💚 compile 1m 4s master passed
+1 💚 shadedjars 5m 49s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 42s hbase-server in master failed.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 56s the patch passed
+1 💚 compile 1m 3s the patch passed
+1 💚 javac 1m 3s the patch passed
+1 💚 shadedjars 5m 43s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 39s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 129m 25s hbase-server in the patch passed.
155m 2s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2009
Optional Tests javac javadoc unit shadedjars compile
uname Linux f460b8d40baf 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e614b89
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/testReport/
Max. process+thread count 4620 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 6m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 4s master passed
+1 💚 compile 0m 57s master passed
+1 💚 shadedjars 6m 2s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 45s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 shadedjars 6m 2s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 35s the patch passed
_ Other Tests _
+1 💚 unit 201m 30s hbase-server in the patch passed.
232m 33s
Subsystem Report/Notes
Docker Client=19.03.12 Server=19.03.12 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2009
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8036fa02d1c8 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e614b89
Default Java 1.8.0_232
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/testReport/
Max. process+thread count 2858 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-2009/2/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

}

} else {
Cell getCell = result.get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if sorting of result is required here. If not required, rest looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed, because we don't have to worry about additional versions, we only need to put a single marker for current TS.

@wchevreuil wchevreuil merged commit 1db8977 into apache:master Jul 8, 2020
wchevreuil added a commit to wchevreuil/hbase that referenced this pull request Jul 8, 2020
@Apache9
Copy link
Contributor

Apache9 commented Jul 9, 2020

Oh, the PR has already been merged...

I skimmed the comments on jira and also the patch, so the approach here is when we delete a specific version, we will go through all the old versions and added a delete marker for them if needed.

This approach can solve a problem but will introduce other problems...
What if then user manually added a new cell with the version you deleted? In general, the user should see the new cell but actually, it depends on whether we have done a major compaction?

@wchevreuil
Copy link
Contributor Author

wchevreuil commented Jul 9, 2020

Oh, the PR has already been merged...

I can revert it back. This has been a recurring source of confusion and complains among our customer base, and having inconsistent results prior and after compaction doesn't seem right, maybe we should make HBASE-15968 solution default here?

I skimmed the comments on jira and also the patch, so the approach here is when we delete a specific version, we will go through all the old versions and added a delete marker for them if needed.

This approach can solve a problem but will introduce other problems...
What if then user manually added a new cell with the version you deleted? In general, the user should see the new cell but actually, it depends on whether we have done a major compaction?

Yeah, but it's the same current behaviour for manually deleted ones, right? I guess both are solved by HBASE-15968 solution?

@Apache9
Copy link
Contributor

Apache9 commented Jul 9, 2020

I guess both are solved by HBASE-15968 solution?

Theoretically, HBASE-15968 can solve the problem.
But HBASE-15968 itself is a PITA.
The implementation is much more complicated, and can impact performance. We used to also have customers wanted to solve the problem but AFAIK, we haven't actually enabled it on any of our production cluster(@infraio correct me if I'm wrong). So I do not think the feature is stable enough...
The problem is that, if you only use normal read/write/delete, you do not need this 'new behavior', everything is fine. The effort on changing the behavior will likely effect the performance...

For the customers who have special usage, for example, set keepDeletedCells, need more than one versions, want to delete a specific version, and which is the most important, set the timestamp manually and the timestamp is not always increasing, the 'new behavior' can solve the problem, but usually it could also be solved by the customers themselves, by changing their usage to avoid hittin the bad cases...

So if users meet the problem, I will suggest them to go to HBASE-15968, but usually, they will just change their usage...

@Apache9
Copy link
Contributor

Apache9 commented Jul 9, 2020

Yeah, but it's the same current behaviour for manually deleted ones, right?

No, for the scenario below they are not the same.

For example, you have t1 > t2 > t3, and you want to keep 2 versions. You delete t2, in the old behavior, t3 will come back, which is strange, so in your PR here, you also issue a delete for t3, to make sure it will not come back. But then if you put a new value with t3, in the old behavior, you can see the new value, but with the PR here, you can not see it, as there is a delete markar for t3...

wchevreuil added a commit that referenced this pull request Jul 9, 2020
@wchevreuil
Copy link
Contributor Author

So if users meet the problem, I will suggest them to go to HBASE-15968, but usually, they will just change their usage...

Ok, had reverted it from master.

For example, you have t1 > t2 > t3, and you want to keep 2 versions. You delete t2, in the old behavior, t3 will come back, which is strange, so in your PR here, you also issue a delete for t3, to make sure it will not come back. But then if you put a new value with t3, in the old behavior, you can see the new value, but with the PR here, you can not see it, as there is a delete marker for t3...

Yeah, got your point, what I meant was that currently you already have same problem for the deleted t2, in your example, where you can't bring it back until the delete marker is gone. And for both cases, keepDeletedCells is a problem.

There's another problem reported (not solved on this PR), when you put a Delete for a TS larger than all existing versions, it has actually no effects, not filtering out any of the older versions, because VERSION_DELETE requires delete TS equality, however, documentation states:

For example, let’s suppose we want to delete a row. For this you can specify a version, or else by default the currentTimeMillis is used. What this means is delete all cells where the version is less than or equal to this version.

It could be fixed in a similar way, by fabricating individual delete markers for all existing versions older than the specified delete timestamp, or we can just assume this is correct, then update documentation accordingly.

@Apache9
Copy link
Contributor

Apache9 commented Jul 9, 2020

The only way to completely solve all the related problems is HBASE-15968, where we also consider MVCC so earlier modification will not effect later modifications. All other solutions can lead to inconsistent results under some strange scenarios...

Agree that we should improve the document. I think we should provide some suggested ways to solve some strange behaviors. For example, the fix in this PR could also be done from client side right?

And on HBASE-15968, let's wait some time for the input from @infraio .

Thanks.

@infraio
Copy link
Contributor

infraio commented Jul 10, 2020

And on HBASE-15968, let's wait some time for the input from @infraio .

Not used in our production cluster now......

@infraio
Copy link
Contributor

infraio commented Jul 10, 2020

The only way to completely solve all the related problems is HBASE-15968, where we also consider MVCC so earlier modification will not effect later modifications.

Agreed. But we need more test and usage case for this feature. @wchevreuil Do you meet same case for this problem? You can take a try for it. It will be better if we can foucs on one same solution. And for the document, I am +1 to update it and make it more clearly.

@wchevreuil
Copy link
Contributor Author

The only way to completely solve all the related problems is HBASE-15968, where we also consider MVCC so earlier modification will not effect later modifications.

Agreed. But we need more test and usage case for this feature. @wchevreuil Do you meet same case for this problem? You can take a try for it. It will be better if we can foucs on one same solution. And for the document, I am +1 to update it and make it more clearly.

I agree we should aim towards a unique solution. Is HBASE-15968 not considered production ready because of performance concerns? And since "versions" feature behaves inconsistently without HBASE-15968, shouldn't we recall it as not production ready as well? I can work on documenting the inconsistencies caused by versions and related delete markers, meanwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants