Skip to content

HDDS-6312. Use KeyPrefixContainer table to accelerate the process of DELETE/UPDATE events#3082

Merged
smengcl merged 13 commits intoapache:masterfrom
symious:HDDS-6312
Aug 30, 2022
Merged

HDDS-6312. Use KeyPrefixContainer table to accelerate the process of DELETE/UPDATE events#3082
smengcl merged 13 commits intoapache:masterfrom
symious:HDDS-6312

Conversation

@symious
Copy link
Contributor

@symious symious commented Feb 14, 2022

What changes were proposed in this pull request?

Recon stores the mapping of ContainerKeyPrefix in local RocksDB. When Recon is applying DELETE or UPDATE events from OM, it will run search the whole table for each to_be_deleted record.

In a big cluster, the record count in this table could be very large, and the search loop for each records is very slow. In our cluster there are 90m records, each loop cost over 70 seconds, if a delta OM events have 100 DELETE or UPDATE events, it will took about two hours to apply these updates.

This ticket is to accelerate the process with the help of a new local table.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6312

How was this patch tested?

unit test.

@symious
Copy link
Contributor Author

symious commented Feb 15, 2022

@adoroszlai @avijayanhwx @ferhui Could you help to review this PR?

@ferhui
Copy link
Contributor

ferhui commented Feb 18, 2022

@JacksonYao287 Could you please help review?

@adoroszlai adoroszlai requested a review from avijayanhwx March 9, 2022 20:52
@symious
Copy link
Contributor Author

symious commented Apr 6, 2022

@avijayanhwx Could you help to check this issue?
This issue can be a problem if we want to retrieve updated information from OM.

In a small cluster, the connection from Recon to OM can be quite normal, with 10 minutes' interval. Please ignore the content in the following block, but each request is from Recon to OM to get delta updates.

2022-04-06 09:41:57,273 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:56662 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 09:51:57,357 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:60932 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:01:57,436 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:63884 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:11:57,544 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:24028 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 
2022-04-06 10:22:04,916 [Socket Reader #1 for port 9862] WARN org.apache.hadoop.ipc.Server: Connection Authentication from Recon:26968 for protocol org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol is failed for user ozone (auth:SIMPLE). 

But in a big cluster, since the process is too slow, this connection can be far longer then 10min's interval.

@adoroszlai adoroszlai requested review from kerneltime and smengcl July 24, 2022 17:20
@adoroszlai
Copy link
Contributor

Thanks @symious for the patch and sorry for the long delay in review. Can you please resolve merge conflicts?

@symious
Copy link
Contributor Author

symious commented Jul 25, 2022

@adoroszlai Thanks for the review. Updated the patch, please have a check.

@prashantpogde prashantpogde self-requested a review August 3, 2022 18:08
@smengcl
Copy link
Contributor

smengcl commented Aug 23, 2022

Thanks @symious for the patch. There's another patch causing minor conflicts. Would you merge latest master to the PR branch again?

I have resolved the conflict on my branch (I tried to push to your PR branch but got no permission), you can take this as a reference: https://github.com/smengcl/hadoop-ozone/blob/HDDS-6312/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/ContainerKeyMapperTask.java#L249-L275

@symious
Copy link
Contributor Author

symious commented Aug 24, 2022

@smengcl Updated the PR, please have a look.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for the improvement.

Latest revision looks good overall. Minor comments inline.

// When reading from byte[], we can always expect to have the key, version
// and version parts in the byte array.
byte[] keyBytes = ArrayUtils.subarray(rawData,
0, rawData.length - Long.BYTES * 2 - 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are encoding and decoding the DB entry value manually, with error-prone offset calculations here. But since ContainerKeyPrefixCodec was already doing this, this should be acceptable.

I wonder if it makes sense to just add a new proto message type, so we can leverage protobuf instead? If it is worth it I think we should, depending on how much more efficient (spatial and time) it can be if we switch to protobuf.

Just trying to open up a discussion here. IMO it is fine use manual encode/decode here to make it consistent with the containerKeyTable we already have. Maybe in a future jira we can switch both tables to use protobuf to serialize the persisted DB value and call those tables _V2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted with thanks, I'll try to create a new ticket for this transition.

symious and others added 4 commits August 24, 2022 21:21
…/spi/ReconContainerMetadataManager.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
…/spi/impl/ReconContainerMetadataManagerImpl.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
…/spi/impl/ReconContainerMetadataManagerImpl.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
@symious
Copy link
Contributor Author

symious commented Aug 24, 2022

@smengcl Updated the patch, please have a look.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

@symious
Copy link
Contributor Author

symious commented Aug 25, 2022

@smengcl Thank you for the review.

@symious
Copy link
Contributor Author

symious commented Aug 25, 2022

@smengcl Do you have any suggestions on the failed unit test?

@smengcl
Copy link
Contributor

smengcl commented Aug 25, 2022

@smengcl Do you have any suggestions on the failed unit test?

The failure looks related at a first glance: https://github.com/apache/ozone/runs/8008158105

Error:  org.apache.hadoop.ozone.recon.recovery.TestReconOmMetadataManagerImpl.testUpdateOmDB  Time elapsed: 0.249 s  <<< FAILURE!
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertTrue(Assert.java:53)
	at org.apache.hadoop.ozone.recon.recovery.TestReconOmMetadataManagerImpl.testUpdateOmDB(TestReconOmMetadataManagerImpl.java:139)

But Line 139 in TestReconOmMetadataManagerImpl.java doesn't really match any meaningful code on your branch, hmm.

I ran the same test locally and it passed.

Retriggering the CI.

@smengcl smengcl merged commit 09e89fd into apache:master Aug 30, 2022
@smengcl
Copy link
Contributor

smengcl commented Aug 30, 2022

Thanks @symious for the patch. Thanks @adoroszlai for the review.

@szetszwo
Copy link
Contributor

@symious , question -- what is the reason for adding KeyPrefixContainer but not reusing ContainerKeyPrefix? I would like to see if we can get ride one of them.

@symious
Copy link
Contributor Author

symious commented May 30, 2023

In a big cluster, the record count in this table could be very large, and the search loop for each records is very slow. In our cluster there are 90m records, each loop cost over 70 seconds, if a delta OM events have 100 DELETE or UPDATE events, it will took about two hours to apply these updates."

@szetszwo It is to solve the slowness of processing.

@szetszwo
Copy link
Contributor

szetszwo commented May 30, 2023

@symious , I understand that this is a to fix the slowness. Well done!

My question is why we cannot reuse ContainerKeyPrefix? How can KeyPrefixContainer fix the slowness but ContainerKeyPrefix cannot?

@symious
Copy link
Contributor Author

symious commented May 30, 2023

My question is why we cannot reuse ContainerKeyPrefix? How can KeyPrefixContainer fix the slowness but ContainerKeyPrefix cannot?

The slowness was coming from the operation of "getContainerForKeyPrefixes", since ContainerKeyPrefix is for mapping "container -> keyPrefix", so if we are trying to get the containers for a specific keyPreifx, we need to travel all the table of ContainerKeyPrefix, and it's time consuming when ContainerKeyPrefix is large.

With the help of "KeyPrefixContainer", the operation of "getContainerForKeyPrefixes" will be quite fast.

@szetszwo

@szetszwo
Copy link
Contributor

szetszwo commented May 30, 2023

@symious , sorry, I still cannot understand. I probably have not asked the right question. These two classes (shown below) look the same. Why we need to add KeyPrefixContainer but not reusing ContainerKeyPrefix ?

public class ContainerKeyPrefix {

  private long containerId;
  private String keyPrefix;
  private long keyVersion = -1;
public class KeyPrefixContainer {

  private String keyPrefix;
  private long keyVersion = -1;
  private long containerId = -1;

With the help of "KeyPrefixContainer", ...

How it helps?

@symious
Copy link
Contributor Author

symious commented May 30, 2023

@szetszwo Oh, I got your question.

I think I created a new class for the readability.

It is kind of confusing to have

  private Table<ContainerKeyPrefix, Integer> containerKeyTable;
  private Table<ContainerKeyPrefix, Integer> keyContainerTable;

It's been a while of this change. But I remember I was using the same class for implementation at first, but I was getting confused when dealing with the two tables with different ordering of containerId and KeyPrefix. By using a different class, it's kind of easier to understand and write the logic.

@szetszwo
Copy link
Contributor

szetszwo commented May 30, 2023

... for the readability.

@symious , If the code are the same, it is better to reuse the code.

//SCMMetadataStoreImpl
  private Table<BigInteger, X509Certificate> validCertsTable;
  private Table<BigInteger, X509Certificate> validSCMCertsTable;
  private Table<BigInteger, X509Certificate> revokedCertsTable;

For examples, we won't create a different BigInteger and a different X509Certificate for the tables above.

@symious
Copy link
Contributor Author

symious commented May 31, 2023

If the code are the same, it is better to reuse the code.

Reasonable to me, also.

@szetszwo
Copy link
Contributor

@symious , thanks for confirming it! Filed HDDS-8733.


@Override
public int hashCode() {
return Objects.hash(containerId, keyPrefix, keyPrefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symious , another question -- the repeated keyPrefix, keyPrefix looks like a typo. For fixing it, we will replace the last keyPrefix with keyVersion as below.

    return Objects.hash(containerId, keyPrefix, keyVersion);

However, the hash function will be changed. Now the question is -- Is it an incompatible change? Have the hash value used or written anywhere in the DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for spotting that.

I didin't find the persistence of the hash value. The only place I found being used should be in the "getContainerForKeyPrefixes" function, which should not cause incompatibilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symious , thanks a lot for checking it! Let me change it to use the same hashCode() as ContainerKeyPrefix in HDDS-8733.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants