HDDS-7508. [Snapshot] Finish TestOMSnapshotDAG, restore CompactionNode SnapshotID field for debugging #3981

smengcl · 2022-11-19T08:03:33Z

What changes were proposed in this pull request?

This is a follow-up to #3824. See jira HDDS-7508 description.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7508

How was this patch tested?

Integration test TestOMSnapshotDAG.

…Node SnapshotID field debugging functionality Change-Id: If85e8a5d00b4f1a8575d44b48e2630d039133d7f

…ore `FileAlreadyExistsException` in listener.

Change-Id: Iab7915677387df77a80a1448464e03ec951b676f

Change-Id: I46aa460b19ed912cde90d48264ad471e636d31cc

Change-Id: I5b9877bc329ef97a9c6628f419b572121f8fe8b8

kerneltime · 2022-11-21T17:06:48Z

@GeorgeJahad @neils-dev

GeorgeJahad · 2022-11-21T22:19:16Z

...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java

+    // The result is deterministic. Retrieved from a run.
+//    final List<String> expectedDiffList21 = Arrays.asList(
+//        "000066", "000080", "000087", "000073", "000095");
+//    Assertions.assertEquals(expectedDiffList21, actualDiffList21);


Is this test no longer valid?

I'm going to remove this. It appears that once I set the DB column family write buffer size to a small enough value like we have here (now 256 KB, was 128 MB), the flush and compaction start to behave slightly differently on different architectures/environments. It gives a different set of results in the CI (x86, iirc) than on my Mac (arm64).

This indicates some other factors might be impacting the list of SSTs generated here now that write buffer size might no longer be the dominant factor triggering compactions. Could be some default RocksDB tunable differences for each platform? (This might be possible as each platform does have its own native JNI lib in the rocksdbjni jar.) Or simply because the detected system memory and # of CPU cores difference are causing some RDB compaction-relevant defaults to be different.

At least it seems the result here is stable if the environment stays the same. Two runs from a previous commit:

https://github.com/smengcl/hadoop-ozone/actions/runs/3502672241/jobs/5867215890#step:5:3353

https://github.com/apache/ozone/actions/runs/3502681795/jobs/5867234494#step:5:3350

Error: TestOMSnapshotDAG.testZeroSizeKey:218 expected: <[000066, 000080, 000087, 000073, 000095]> but was: <[000067, 000060]>

GeorgeJahad · 2022-11-21T22:19:24Z

...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java

-    //  works.
+    List<String> actualDiffList32 = differ.getSSTDiffList(snap3, snap2);
+//    final List<String> expectedDiffList32 = Arrays.asList("000105");
+//    Assertions.assertEquals(expectedDiffList32, actualDiffList32);


Is this test no longer valid?

GeorgeJahad · 2022-11-21T22:30:11Z

These methods don't appear to be used. Can they be removed from TestRocksDBCheckpointDiffer.java?

updateRocksDBInstance
testDefaultColumnFamilyOriginal
testCertainColumnFamily

GeorgeJahad · 2022-11-21T23:00:22Z

Are these methods still needed?:
traverseGraph
printMutableGraphFromAGivenNode
printMutableGraph

They seem like they might be useful tools, so it might be good to keep them, but they don't appear to be checked by any tests. So there is no incentive to keep maintaining them, or awareness of when they've stopped working.

GeorgeJahad · 2022-11-21T23:10:52Z

It seems like internalGetSSTDiffList() is the heart of the compaction diff code. But it's not clear the tests fully excercise it. It contains 5 or 6 different if statements to handle all the different possible cases, and I can't tell whether the tests hit each of those cases.

smengcl · 2022-11-22T02:29:46Z

These methods don't appear to be used. Can they be removed from TestRocksDBCheckpointDiffer.java?

updateRocksDBInstance testDefaultColumnFamilyOriginal testCertainColumnFamily

Thanks. Will remove those unused test helper methods.

Change-Id: Id995eea1270ec40ba9b05c2e761c748f5cdc5965

Change-Id: Idd994ea9d535453ab783adad640847b57524abce

Change-Id: I6a52aad6b6d4a941c6bc70f0468a85e3aa0083fc

smengcl · 2022-11-22T02:49:49Z

Are these methods still needed?: traverseGraph printMutableGraphFromAGivenNode printMutableGraph

They seem like they might be useful tools, so it might be good to keep them, but they don't appear to be checked by any tests. So there is no incentive to keep maintaining them, or awareness of when they've stopped working.

Make sense. Will move them to the test class.

… test class. Change-Id: I44f93ada5ce66aa0ecf4899fff4d9e2c1c616a3d

Change-Id: Id16f497f83977a3b75dc6aa296aa46eb0da19a14

Change-Id: I060f464692af4f18bff9261a521bb88f2f40c921

prashantpogde · 2022-11-29T01:44:21Z

LGTM. But I suspect the result can be different if the RocksDB behavior changes. We should have a way to directly call the
rocksDB compaction lister and generate the deterministic DAG that we want to generate.Such a DAG traversal can give a constant output result.
I guess the current PR can go in as it is. But this is the idea for a next PR that will be very deterministic.

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java

GeorgeJahad · 2022-11-29T21:34:23Z

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java

+        + "C 000093,000090,000083:\n"  // Deletion ?
+        + "S 14980 e7ad72f8-52df-4430-93f6-0ee91d4a47fd\n"  // Snapshot 2
+        + "C 000098,000096,000085,000078,000071,000064,000060,000052:000099\n"
+        + "C 000105,000095,000088:000107\n"


This line is right before the creation of snapshot 3, and compacts 3 sst files into "107". But in snapSST3 below I don't see 107 and do see the precompacted files. Am I misunderstanding something?

Hmm. I took the compaction log from TestOMSnapshotDAG where I set a breakpoint right before OM is restarted (cluster.restartOzoneManager()).

It looks like this compaction is indeed not captured in snapshot 3. As I checked the active DB, it seems the compaction happens after snapshot 3 is taken.

So it seems when the compaction log is being appended (even in onCompactionCompleted), RocksDB hasn't really completed the flush (and updated the manifest). Thus, the immediately following DB checkpoint operation wouldn't pick it up.

Though this (unexpected ordering) alone shouldn't affect the correctness of the differ since we currently read the whole compaction log and reconstruct the DAG upon OM restarts, I'm curious what is actually happening.

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java

Change-Id: I541719b0d4eeb0ec9a9a428c46624291a9702fc9

Change-Id: I0313625e930d68461a9fb718553b28f1c5011e00

Change-Id: I1a891f80b707d20607891f01391461f36f0bcd4b

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java

GeorgeJahad · 2022-11-30T22:30:29Z

lgtm (I made a few final comments above.)

hemantk-12

General comment:
I don't understand the purpose of having debug logs in tests. Test name, scenarios and comments (if needed) should be enough to tell what and why test is doing. You can add log or print statements when writing test but should be removed before sent for review.

That's my understanding.

hemantk-12 · 2022-12-01T19:32:54Z

...-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/CompactionNode.java

+  // Name of the SST file
+  private final String fileName;
+  // The last snapshot created before this node came into existence
+  private final String snapshotId;


I'm skeptical about this. I don't understand its usage and it might not provide correct information. Let's say node A gets added because of snapshot-1 and it is used by snapshot-1, snapshot-2, ... snapshot-10. Now DAG pruning kicks in and deletes Snapshot-1 to Snapshot-5 but node A can't be deleted because it is used by Snapshot-6 to Snapshot-10 (That's my understanding). What would be the SnapshotID of the node in this case? Snapshot-1's ID or Snapshot-6's ID? I don't think Snapshot-1's ID is correct because Snapshot-1 doesn't exist in DAG.

snapshotId here is purely for debugging. It is not used in the SST diff calculation at all.

https://github.com/smengcl/hadoop-ozone/blob/8c62f798d49e925610bfba98ea42ff0666589393/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L855-L856

What difference does it make from fileName if it can be any random string? Why can't we just use fileName?

ah. Originally it was used by @prashantpogde to provide extra info when drawing the DAG for debugging. SST file name alone doesn't tell which checkpoint captures the SST. I broke it with my last PR and is just restoring that here.

TBH we don't even need to fill in snapshotId when debugging is disabled. Though it shouldn't take too much memory to store that.

...-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/CompactionNode.java

hemantk-12 · 2022-12-01T19:35:09Z

...-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/CompactionNode.java

+  private final long totalNumberOfKeys;
+  private long cumulativeKeysReverseTraversal;
+
+  CompactionNode(String file, String ssId, long numKeys, long seqNum) {


Why is it package private?

As it was an internal class in RocksDBCheckpointDiffer. It is only used in the differ (for now). We can make it public later if used in other packages.

Is there a strong reason on your mind to make it public atm?

Don't have very strong reason to make it public. You can keep it as it is.

To me, it is a simple data class. Don't see any issue if it is made public.

hemantk-12 · 2022-12-01T19:36:35Z

...-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/CompactionNode.java

+    this.cumulativeKeysReverseTraversal = cumulativeKeysReverseTraversal;
+  }
+
+  public void addCumulativeKeysReverseTraversal(long diff) {


Curious if it needs to be thread safe?

I think not, at least for now. It is only used in the test for traverseGraph.

hemantk-12 · 2022-12-01T19:41:00Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStoreBuilder.java

+   * @param writeBufferSize Specify column family write buffer size.
+   * @return ManagedColumnFamilyOptions
+   */
+  private ManagedColumnFamilyOptions getDefaultCfOptions(long writeBufferSize) {


I think getDefaultCfOption() and getDefaultCfOptions(long writeBufferSize) are unnecessary one liner functions. Also CfOptions returned by getDefaultCfOptions(long writeBufferSize) is not default anymore.

This function is getting invoked multiple unnecessarily at line # 287 and then inside loop # 297. CfOptions can be store in a variable and used at both the places.

I'll prefer:

private Set<TableConfig> makeTableConfigs() { Set<TableConfig> tableConfigs = new HashSet<>(); ManagedColumnFamilyOptions columnFamilyOptions = Optional.ofNullable(defaultCfOptions) .orElseGet(defaultCfProfile::getColumnFamilyOptions); columnFamilyOptions.setWriteBufferSize(rocksDbCfWriteBufferSize); // If default column family was not added, add it with the default options. cfOptions.putIfAbsent(DEFAULT_COLUMN_FAMILY_NAME, columnFamilyOptions); for (Map.Entry<String, ManagedColumnFamilyOptions> entry: cfOptions.entrySet()) { String name = entry.getKey(); ManagedColumnFamilyOptions options = entry.getValue(); if (options == null) { LOG.debug("using default column family options for table: {}", name); tableConfigs.add(new TableConfig(name, columnFamilyOptions)); } else { tableConfigs.add(new TableConfig(name, options)); } } return tableConfigs; }

If you really want a function, it could be one function only:

private Set<TableConfig> makeTableConfigs() { Set<TableConfig> tableConfigs = new HashSet<>(); ManagedColumnFamilyOptions columnFamilyOptions = getCfOptions(rocksDbCfWriteBufferSize); // If default column family was not added, add it with the default options. cfOptions.putIfAbsent(DEFAULT_COLUMN_FAMILY_NAME, columnFamilyOptions); for (Map.Entry<String, ManagedColumnFamilyOptions> entry: cfOptions.entrySet()) { String name = entry.getKey(); ManagedColumnFamilyOptions options = entry.getValue(); if (options == null) { LOG.debug("using default column family options for table: {}", name); tableConfigs.add(new TableConfig(name, columnFamilyOptions)); } else { tableConfigs.add(new TableConfig(name, options)); } } return tableConfigs; } /** * Get default column family options, but with column family write buffer * size limit overridden. */ private ManagedColumnFamilyOptions getCfOptions(long writeBufferSize) { ManagedColumnFamilyOptions cfOptions = Optional.ofNullable(defaultCfOptions) .orElseGet(defaultCfProfile::getColumnFamilyOptions); cfOptions.setWriteBufferSize(writeBufferSize); return cfOptions; }

I'd prefer delaying any further refactoring of a shared class like DBStoreBuilder until we merge to master to prevent headaches around merge conflicts. Would you like to file a jira for this so we can fix this after the merge?

It is still the "default" in the sense that we are still getting all the other default CF options. Just with write buffer size explicitly overridden, as mentioned in the javadoc. Similar to getDefaultDBOptions(tableConfigs) right below.

I will disagree that it is still the "default" but I'll leave it upto you.

I think the real confusion here is if you are creating default config or using default config when config is absent. We are doing the later, using default config when config is not provided.

Renamed to getCfOptions

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java

hemantk-12 · 2022-12-01T20:44:52Z

...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java

-    //  RDB compaction. Take another snapshot and do the diff again.
-    //  Then restart OM, do the same diff again to see if DAG reconstruction
-    //  works.
+    List<String> sstDiffList32 = differ.getSSTDiffList(snap3, snap2);


I believe these test cases can be changed to parameterized tests.

Why test is package private not public? What is testZeroSizeKey test actually testing? testZeroSizeKey name doesn't give me any insight of what it is testing.

I don't see how we would change the existing checks one dependent on another into parameterized test here?

testZeroSizeKey name was originally taken from existing freon test:

ozone/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestRandomKeyGenerator.java

Lines 168 to 169 in e38607b

@Test

void testZeroSizeKey() {

Renamed. Key size would stay zero as the test doesn't need to involve DNs.

...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java

Change-Id: I37916276a53bb6fa36021c72da5df5121738d945

Change-Id: I5b899e34f402b85b24ddaf73f2e2dddf17c9a7cc

Change-Id: I34dacd837e36f497c64ae90b7f9dee3c0e99abad

Change-Id: I0b009a14a3a89d748ad7eae3dd684b004821f201

hemantk-12

LGTM.

Thanks @smengcl for the patch and addressing all the comments.

Conflicts: hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java Change-Id: I92a09d4bad91bd91a59cbb6878d0417b417ff37c

smengcl · 2022-12-07T00:01:22Z

Thanks @GeorgeJahad @prashantpogde @hemantk-12 for reviewing this.

smengcl added 5 commits November 18, 2022 18:33

HDDS-7508. [Snapshot] Finish TestOMSnapshotDAG and restore Compaction…

dc68235

…Node SnapshotID field debugging functionality Change-Id: If85e8a5d00b4f1a8575d44b48e2630d039133d7f

Add config ozone.metastore.rocksdb.cf.write.buffer.size; Safely ign…

9d26809

…ore `FileAlreadyExistsException` in listener.

Test DAG reconstruction during OM restart; test key deletion diff.

dd818e7

Change-Id: Iab7915677387df77a80a1448464e03ec951b676f

Address config test failure; findbugs.

a07b344

Change-Id: I46aa460b19ed912cde90d48264ad471e636d31cc

checkstyle

30841ad

Change-Id: I5b9877bc329ef97a9c6628f419b572121f8fe8b8

kerneltime added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 21, 2022

kerneltime requested a review from neils-dev November 21, 2022 17:38

smengcl requested a review from prashantpogde November 21, 2022 19:11

GeorgeJahad reviewed Nov 21, 2022

View reviewed changes

smengcl added 3 commits November 21, 2022 18:35

Remove unused test helper methods

c2bcb9c

Change-Id: Id995eea1270ec40ba9b05c2e761c748f5cdc5965

Remove hard-coded test result checking.

c077045

Change-Id: Idd994ea9d535453ab783adad640847b57524abce

Move traverseGraph from differ to test class.

5afdcf7

Change-Id: I6a52aad6b6d4a941c6bc70f0468a85e3aa0083fc

smengcl added 3 commits November 21, 2022 18:52

Move printMutableGraph and printMutableGraphFromAGivenNode to the…

7019d46

… test class. Change-Id: I44f93ada5ce66aa0ecf4899fff4d9e2c1c616a3d

Move class CompactionNode and DifferSnapshotInfo to separate files.

bc42a50

Change-Id: Id16f497f83977a3b75dc6aa296aa46eb0da19a14

Add more test cases. internalGetSSTDiffList is 100% covered.

9b38169

Change-Id: I060f464692af4f18bff9261a521bb88f2f40c921

GeorgeJahad reviewed Nov 29, 2022

View reviewed changes

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Show resolved Hide resolved

hemantk-12 reviewed Nov 29, 2022

View reviewed changes

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java Show resolved Hide resolved

GeorgeJahad reviewed Nov 29, 2022

View reviewed changes

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java Show resolved Hide resolved

GeorgeJahad reviewed Nov 29, 2022

View reviewed changes

hemantk-12 reviewed Nov 29, 2022

View reviewed changes

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Show resolved Hide resolved

smengcl added 2 commits November 29, 2022 16:42

Fix typo in compaction reader.

a6f7817

Change-Id: I541719b0d4eeb0ec9a9a428c46624291a9702fc9

Parameterize test cases.

a8a8b47

Change-Id: I0313625e930d68461a9fb718553b28f1c5011e00

Clean up.

8c62f79

Change-Id: I1a891f80b707d20607891f01391461f36f0bcd4b

GeorgeJahad reviewed Nov 30, 2022

View reviewed changes

...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java Show resolved Hide resolved

GeorgeJahad reviewed Nov 30, 2022

View reviewed changes

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Show resolved Hide resolved

GeorgeJahad reviewed Nov 30, 2022

View reviewed changes

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Show resolved Hide resolved

hemantk-12 reviewed Dec 1, 2022

View reviewed changes

smengcl added 4 commits December 5, 2022 11:42

Check SST file existence after diffAllSnapshots(). Thanks George.

710e151

Change-Id: I37916276a53bb6fa36021c72da5df5121738d945

Throw exception when graph traversal level exceeded.

4e352d7

Change-Id: I5b899e34f402b85b24ddaf73f2e2dddf17c9a7cc

Javadoc

fb1d32c

Change-Id: I34dacd837e36f497c64ae90b7f9dee3c0e99abad

Address comments from Hemant.

b980743

Change-Id: I0b009a14a3a89d748ad7eae3dd684b004821f201

hemantk-12 approved these changes Dec 6, 2022

View reviewed changes

prashantpogde approved these changes Dec 6, 2022

View reviewed changes

smengcl merged commit 2554ec2 into apache:HDDS-6517-Snapshot Dec 7, 2022

hemantk-12 mentioned this pull request Oct 30, 2023

HDDS-8196. DB listener to only track SSTs that belong to the tables of interest. #5511

Merged

HDDS-7508. [Snapshot] Finish TestOMSnapshotDAG, restore CompactionNode SnapshotID field for debugging #3981

HDDS-7508. [Snapshot] Finish TestOMSnapshotDAG, restore CompactionNode SnapshotID field for debugging #3981

Uh oh!

Conversation

smengcl commented Nov 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

kerneltime commented Nov 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad commented Nov 21, 2022

Uh oh!

GeorgeJahad commented Nov 21, 2022

Uh oh!

GeorgeJahad commented Nov 21, 2022

Uh oh!

smengcl commented Nov 22, 2022

Uh oh!

smengcl commented Nov 22, 2022

Uh oh!

prashantpogde commented Nov 29, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GeorgeJahad commented Nov 30, 2022

Uh oh!

hemantk-12 left a comment

Choose a reason for hiding this comment

Uh oh!

hemantk-12 Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Dec 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

smengcl commented Nov 19, 2022 •

edited

Loading

smengcl Nov 22, 2022 •

edited

Loading

smengcl Nov 29, 2022 •

edited

Loading

smengcl Nov 29, 2022 •

edited

Loading

hemantk-12 Dec 1, 2022 •

edited

Loading

smengcl Dec 5, 2022 •

edited

Loading

smengcl Dec 6, 2022 •

edited

Loading

smengcl Dec 6, 2022 •

edited

Loading

hemantk-12 left a comment •

edited

Loading