HDDS-6312. Use KeyPrefixContainer table to accelerate the process of DELETE/UPDATE events by symious · Pull Request #3082 · apache/ozone

symious · 2022-02-14T10:50:27Z

What changes were proposed in this pull request?

Recon stores the mapping of ContainerKeyPrefix in local RocksDB. When Recon is applying DELETE or UPDATE events from OM, it will run search the whole table for each to_be_deleted record.

In a big cluster, the record count in this table could be very large, and the search loop for each records is very slow. In our cluster there are 90m records, each loop cost over 70 seconds, if a delta OM events have 100 DELETE or UPDATE events, it will took about two hours to apply these updates.

This ticket is to accelerate the process with the help of a new local table.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6312

How was this patch tested?

unit test.

…DELETE/UPDATE events

symious · 2022-02-15T14:04:47Z

@adoroszlai @avijayanhwx @ferhui Could you help to review this PR?

ferhui · 2022-02-18T10:04:02Z

@JacksonYao287 Could you please help review?

symious · 2022-04-06T03:25:58Z

@avijayanhwx Could you help to check this issue?
This issue can be a problem if we want to retrieve updated information from OM.

In a small cluster, the connection from Recon to OM can be quite normal, with 10 minutes' interval. Please ignore the content in the following block, but each request is from Recon to OM to get delta updates.

2022-04-06 09:41:57,273 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:56662 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 09:51:57,357 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:60932 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:01:57,436 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:63884 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:11:57,544 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:24028 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:22:04,916 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:26968 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE).

But in a big cluster, since the process is too slow, this connection can be far longer then 10min's interval.

adoroszlai · 2022-07-24T19:53:15Z

Thanks @symious for the patch and sorry for the long delay in review. Can you please resolve merge conflicts?

symious · 2022-07-25T03:23:23Z

@adoroszlai Thanks for the review. Updated the patch, please have a check.

smengcl · 2022-08-23T21:05:57Z

Thanks @symious for the patch. There's another patch causing minor conflicts. Would you merge latest master to the PR branch again?

I have resolved the conflict on my branch (I tried to push to your PR branch but got no permission), you can take this as a reference: https://github.com/smengcl/hadoop-ozone/blob/HDDS-6312/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java#L249-L275

symious · 2022-08-24T03:37:32Z

@smengcl Updated the PR, please have a look.

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java

smengcl

Thanks @symious for the improvement.

Latest revision looks good overall. Minor comments inline.

...one/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/ReconContainerMetadataManager.java

smengcl · 2022-08-24T10:51:46Z

...zone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/KeyPrefixContainerCodec.java

+    // When reading from byte[], we can always expect to have the key, version
+    // and version parts in the byte array.
+    byte[] keyBytes = ArrayUtils.subarray(rawData,
+        0, rawData.length - Long.BYTES * 2 - 2);


We are encoding and decoding the DB entry value manually, with error-prone offset calculations here. But since ContainerKeyPrefixCodec was already doing this, this should be acceptable.

I wonder if it makes sense to just add a new proto message type, so we can leverage protobuf instead? If it is worth it I think we should, depending on how much more efficient (spatial and time) it can be if we switch to protobuf.

Just trying to open up a discussion here. IMO it is fine use manual encode/decode here to make it consistent with the containerKeyTable we already have. Maybe in a future jira we can switch both tables to use protobuf to serialize the persisted DB value and call those tables _V2.

Noted with thanks, I'll try to create a new ticket for this transition.

.../src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ReconContainerMetadataManagerImpl.java

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/KeyPrefixContainer.java

…/spi/ReconContainerMetadataManager.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

…/spi/impl/ReconContainerMetadataManagerImpl.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

symious · 2022-08-24T13:39:28Z

@smengcl Updated the patch, please have a look.

smengcl

+1 pending CI

symious · 2022-08-25T03:26:29Z

@smengcl Thank you for the review.

symious · 2022-08-25T06:58:09Z

@smengcl Do you have any suggestions on the failed unit test?

smengcl · 2022-08-25T07:09:12Z

@smengcl Do you have any suggestions on the failed unit test?

The failure looks related at a first glance: https://github.com/apache/ozone/runs/8008158105

Error:  org.apache.hadoop.ozone.recon.recovery.TestReconOmMetadataManagerImpl.testUpdateOmDB  Time elapsed: 0.249 s  <<< FAILURE!
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertTrue(Assert.java:53)
	at org.apache.hadoop.ozone.recon.recovery.TestReconOmMetadataManagerImpl.testUpdateOmDB(TestReconOmMetadataManagerImpl.java:139)

But Line 139 in TestReconOmMetadataManagerImpl.java doesn't really match any meaningful code on your branch, hmm.

I ran the same test locally and it passed.

Retriggering the CI.

smengcl · 2022-08-30T07:31:26Z

Thanks @symious for the patch. Thanks @adoroszlai for the review.

szetszwo · 2023-05-29T10:32:37Z

@symious , question -- what is the reason for adding KeyPrefixContainer but not reusing ContainerKeyPrefix? I would like to see if we can get ride one of them.

symious · 2023-05-30T02:38:05Z

In a big cluster, the record count in this table could be very large, and the search loop for each records is very slow. In our cluster there are 90m records, each loop cost over 70 seconds, if a delta OM events have 100 DELETE or UPDATE events, it will took about two hours to apply these updates."

@szetszwo It is to solve the slowness of processing.

szetszwo · 2023-05-30T03:19:41Z

@symious , I understand that this is a to fix the slowness. Well done!

My question is why we cannot reuse ContainerKeyPrefix? How can KeyPrefixContainer fix the slowness but ContainerKeyPrefix cannot?

symious · 2023-05-30T06:52:00Z

My question is why we cannot reuse ContainerKeyPrefix? How can KeyPrefixContainer fix the slowness but ContainerKeyPrefix cannot?

The slowness was coming from the operation of "getContainerForKeyPrefixes", since ContainerKeyPrefix is for mapping "container -> keyPrefix", so if we are trying to get the containers for a specific keyPreifx, we need to travel all the table of ContainerKeyPrefix, and it's time consuming when ContainerKeyPrefix is large.

With the help of "KeyPrefixContainer", the operation of "getContainerForKeyPrefixes" will be quite fast.

@szetszwo

szetszwo · 2023-05-30T10:24:28Z

@symious , sorry, I still cannot understand. I probably have not asked the right question. These two classes (shown below) look the same. Why we need to add KeyPrefixContainer but not reusing ContainerKeyPrefix ?

public class ContainerKeyPrefix {

  private long containerId;
  private String keyPrefix;
  private long keyVersion = -1;

public class KeyPrefixContainer {

  private String keyPrefix;
  private long keyVersion = -1;
  private long containerId = -1;

With the help of "KeyPrefixContainer", ...

How it helps?

symious · 2023-05-30T14:37:15Z

@szetszwo Oh, I got your question.

I think I created a new class for the readability.

It is kind of confusing to have

  private Table<ContainerKeyPrefix, Integer> containerKeyTable;
  private Table<ContainerKeyPrefix, Integer> keyContainerTable;

It's been a while of this change. But I remember I was using the same class for implementation at first, but I was getting confused when dealing with the two tables with different ordering of containerId and KeyPrefix. By using a different class, it's kind of easier to understand and write the logic.

szetszwo · 2023-05-30T14:48:27Z

... for the readability.

@symious , If the code are the same, it is better to reuse the code.

//SCMMetadataStoreImpl
  private Table<BigInteger, X509Certificate> validCertsTable;
  private Table<BigInteger, X509Certificate> validSCMCertsTable;
  private Table<BigInteger, X509Certificate> revokedCertsTable;

For examples, we won't create a different BigInteger and a different X509Certificate for the tables above.

symious · 2023-05-31T01:08:41Z

If the code are the same, it is better to reuse the code.

Reasonable to me, also.

szetszwo · 2023-05-31T08:40:24Z

@symious , thanks for confirming it! Filed HDDS-8733.

szetszwo · 2023-05-31T09:27:14Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/KeyPrefixContainer.java

+
+  @Override
+  public int hashCode() {
+    return Objects.hash(containerId, keyPrefix, keyPrefix);


@symious , another question -- the repeated keyPrefix, keyPrefix looks like a typo. For fixing it, we will replace the last keyPrefix with keyVersion as below.

return Objects.hash(containerId, keyPrefix, keyVersion);

However, the hash function will be changed. Now the question is -- Is it an incompatible change? Have the hash value used or written anywhere in the DB?

Oh, thanks for spotting that.

I didin't find the persistence of the hash value. The only place I found being used should be in the "getContainerForKeyPrefixes" function, which should not cause incompatibilities.

@symious , thanks a lot for checking it! Let me change it to use the same hashCode() as ContainerKeyPrefix in HDDS-8733.

symious added 2 commits February 14, 2022 18:49

HDDS-6312. Use KeyPrefixContainer table to accelerate the process of …

801afcc

…DELETE/UPDATE events

HDDS-6312. Remove unused variable

42071a3

adoroszlai requested a review from avijayanhwx March 9, 2022 20:52

adoroszlai requested review from kerneltime and smengcl July 24, 2022 17:20

symious and others added 2 commits July 25, 2022 10:48

Merge branch 'master' into HDDS-6312

ae53130

HDDS-6312. Fix test

3d11904

trigger new CI check

f3afa38

prashantpogde self-requested a review August 3, 2022 18:08

symious and others added 2 commits August 24, 2022 10:57

Merge branch 'master' into HDDS-6312

fe6879c

HDDS-6312. Resolve format issue

0de7c12

smengcl reviewed Aug 24, 2022

View reviewed changes

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java Show resolved Hide resolved

smengcl reviewed Aug 24, 2022

View reviewed changes

symious and others added 4 commits August 24, 2022 21:21

Update hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon…

244ed6e

…/spi/ReconContainerMetadataManager.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

Update hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon…

3025232

…/spi/impl/ReconContainerMetadataManagerImpl.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

Update hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon…

2ad0424

…/spi/impl/ReconContainerMetadataManagerImpl.java Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

HDDS-6312. Resolve comments

64eb246

HDDS-6312. Revert bracket for try block

5d13be4

smengcl approved these changes Aug 25, 2022

View reviewed changes

trigger new CI check

7dc4f79

smengcl merged commit 09e89fd into apache:master Aug 30, 2022

szetszwo reviewed May 31, 2023

View reviewed changes

Conversation

symious commented Feb 14, 2022

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

symious commented Feb 15, 2022

Uh oh!

ferhui commented Feb 18, 2022

Uh oh!

symious commented Apr 6, 2022

Uh oh!

adoroszlai commented Jul 24, 2022

Uh oh!

symious commented Jul 25, 2022

Uh oh!

smengcl commented Aug 23, 2022

Uh oh!

symious commented Aug 24, 2022

Uh oh!

Uh oh!

smengcl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smengcl Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

symious Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

symious commented Aug 24, 2022

Uh oh!

smengcl left a comment

Choose a reason for hiding this comment

Uh oh!

symious commented Aug 25, 2022

Uh oh!

symious commented Aug 25, 2022

Uh oh!

smengcl commented Aug 25, 2022

Uh oh!

smengcl commented Aug 30, 2022

Uh oh!

szetszwo commented May 29, 2023

Uh oh!

symious commented May 30, 2023

Uh oh!

szetszwo commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

symious commented May 30, 2023

Uh oh!

szetszwo commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

symious commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

symious commented May 31, 2023

Uh oh!

szetszwo commented May 31, 2023

Uh oh!

szetszwo May 31, 2023

Choose a reason for hiding this comment

Uh oh!

symious May 31, 2023

Choose a reason for hiding this comment

Uh oh!

szetszwo May 31, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

szetszwo commented May 30, 2023 •

edited

Loading

szetszwo commented May 30, 2023 •

edited

Loading

symious commented May 30, 2023 •

edited

Loading

szetszwo commented May 30, 2023 •

edited

Loading