HDDS-5126. Recon should check new containers of a container report with batch by JacksonYao287 · Pull Request #2172 · apache/ozone

JacksonYao287 · 2021-04-22T09:10:49Z

What changes were proposed in this pull request?

in my test environment, 400000 containers exist. when bootstrap a new recon, every container will be checked and added to recondb.but , for now , recon check all the containers in a container report one by one , and each check will take a rpc call to scm. this is too slow and in my test environment , it leads to recon oom, because too many containers to be consumed are waitting in the message queue . It is better for recon to check new containers of a container report with batch.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5126

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

unit test

JacksonYao287 · 2021-04-23T02:01:32Z

@avijayanhwx @GlenGeng please take a review, thx!

bshashikant · 2021-04-23T06:04:56Z

@avijayanhwx , can you please take a look at this?

avijayanhwx

Thank you for working on this @JacksonYao287. Approach looks good to me, I have couple of comments inline.

But, I still think this is not a scalable solution. We have to boostrap Recon with the SCM DB on startup (HDDS-2852) to solve this problem as well as something like HDDS-4355.

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java

...p-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconContainerManager.java

JacksonYao287 · 2021-04-24T16:08:49Z

@avijayanhwx thanks for your review!

But, I still think this is not a scalable solution. We have to boostrap Recon with the SCM DB on startup (HDDS-2852) to solve this problem as well as something like HDDS-4355.

yes, I agree. but this patch can Speed up processing the container report for recon. Batch is faster than one-by-one. when the number of containers in an ozone is not particularly large, i think this will have a Significant improvement.

BTW, I will try to complete HDDS-4177.Bootstrap Recon SCM Container DB later.

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java

GlenGeng-awx · 2021-04-25T02:58:17Z

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java

stream().filter() may be simpler and more straightforward here.

here , I want to divide "containerReplicaProtoList" into two parts , and both of the two parts will be used in the subsequent code。if i use "stream().filter()" instead, only one of the two parts got, so I have to call "stream().filter()" twice.

I think these codes can be simpler, seems too redundant for lambda here.

GlenGeng-awx · 2021-04-25T02:59:03Z

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java

If we finally adopt the batch solution, can we drop this singular method ?

thanks for the review! maybe we can maintain it here for the present, and remove it later in another patch

I think the original check is kind of optimistic locking here?

@symious Thanks for the review! "addNewContainer" will call "ContainerStateMap#contains" ,and this is what "containerExist" exactly does, so no need here.

There are still some steps that need to be taken before addNewContainer calls ContainerStateMap$contains, with the contains check here, the performance would be better.
Besides, in ContainerStateManagerImpl, the writeLock would be required to check if the container exists, it's not reasonable to involve the lock here.

in my opinion ，"check the existence and add" should be an atomic operation( just like CAS operation), and actually this is what ContainerStateManagerImpl#addContainer does (as you mentioned that a write lock is involved)

lock.writeLock().lock(); try { if (!containers.contains(containerID)) { add it; } finally { lock.writeLock().unlock(); }

so, if we call "containerExist" before "addNewContainer", it seem like that we make a compare operation before an atomic CAS operation, and it seems redundant.

There are still some steps that need to be taken before addNewContainer calls ContainerStateMap$contains

yes, so the later we call ContainerStateMap$contains , the better result we may get, because the container existence maybe changed within the time window, in which these steps is executed.

"check the existence and add" isn't meant to be an atomic operation here, it's just used to avoid some unnecessary operations if the other clients have updated the containerId already.

...ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerReportHandler.java

…h batch

JacksonYao287 · 2021-04-29T16:28:58Z

@avijayanhwx @symious @GlenGeng please take a review.
I rewrite this pr , there a two big changes :
1 add a new SCM service "getExistContainerWithPipelinesInBatch" , which will ask scm that which containers in the given list can be found by scm and will never throw an exception at client-side. "ReconContainerManager#checkAndAddNewContainerBatch" now use this method instead of "getContainerWithPipelineBatch" , which may throw an exception and drop the whole reports in a heartbeat.

2 fix a bug. for now , a closed container may be reported by datanode, and the pipeline of this container may be closed , recon can not sync this pipeline info from scm. so recon should have this ability and a no-open container to it's containerStateMap without the corresponding pipeling info.

avijayanhwx

I am OK with getting this patch as it is now, but would like to revisit/remove it when we do SCM DB boostrap on Recon. Generally, I am not for adding ad hoc APIs in SCM to support Recon. Slowly, the SCM API layer will become harder to maintain. @symious / @GlenGeng if you are ok with this approach as well, I can get this in.

symious · 2021-05-04T16:00:34Z

@avijayanhwx Thanks for the review. I think we can keep the following check.

      if (!containerExist(containerID)) {
        addNewContainer(containerID.getId(), containerWithPipeline);
      }

Since the inner check @JacksonYao287 mentioned would involve additional write lock operations.

JacksonYao287 · 2021-05-07T08:50:27Z

@avijayanhwx Thanks for the review. I think we can keep the following check.
      if (!containerExist(containerID)) {
        addNewContainer(containerID.getId(), containerWithPipeline);
      }
Since the inner check @JacksonYao287 mentioned would involve additional write lock operations.

@avijayanhwx please take a look, what is your opinion?

avijayanhwx

I have added a TODO in HDDS-4177 to revisit this logic. For now, I am +1 with this approach since it is causing an OOM issue.

avijayanhwx · 2021-05-10T06:12:19Z

Thanks for working on this @JacksonYao287 and the reviews @symious, @GlenGeng.

JacksonYao287 force-pushed the HDDS-5126 branch 4 times, most recently from c442ab3 to ac893dd Compare April 22, 2021 12:00

avijayanhwx self-requested a review April 22, 2021 14:12

JacksonYao287 force-pushed the HDDS-5126 branch from ac893dd to c51bbfc Compare April 23, 2021 00:12

JacksonYao287 force-pushed the HDDS-5126 branch from c51bbfc to 9bfae1a Compare April 23, 2021 03:44

avijayanhwx reviewed Apr 23, 2021

View reviewed changes

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java Outdated Show resolved Hide resolved

...p-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconContainerManager.java Outdated Show resolved Hide resolved

JacksonYao287 force-pushed the HDDS-5126 branch 2 times, most recently from ee5799a to 5a95af5 Compare April 25, 2021 02:59

GlenGeng-awx reviewed Apr 25, 2021

View reviewed changes

JacksonYao287 force-pushed the HDDS-5126 branch from 5a95af5 to 58a28de Compare April 25, 2021 03:03

JacksonYao287 closed this Apr 25, 2021

JacksonYao287 reopened this Apr 25, 2021

JacksonYao287 requested a review from avijayanhwx April 25, 2021 06:54

symious reviewed Apr 25, 2021

View reviewed changes

...ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerReportHandler.java Outdated Show resolved Hide resolved

JacksonYao287 requested a review from GlenGeng-awx April 26, 2021 06:49

JacksonYao287 force-pushed the HDDS-5126 branch 3 times, most recently from 6aae097 to beb21e0 Compare April 27, 2021 11:45

adoroszlai changed the title ~~HDDS-5126.Recon should check new containers of a container report wit…~~ HDDS-5126. Recon should check new containers of a container report with batch Apr 27, 2021

JacksonYao287 force-pushed the HDDS-5126 branch from beb21e0 to 18ed781 Compare April 29, 2021 12:35

JacksonYao287 closed this Apr 29, 2021

JacksonYao287 reopened this Apr 29, 2021

HDDS-5126.Recon should check new containers of a container report wit…

3363ed8

…h batch

JacksonYao287 force-pushed the HDDS-5126 branch from 18ed781 to 3363ed8 Compare April 29, 2021 16:13

JacksonYao287 closed this Apr 30, 2021

JacksonYao287 reopened this Apr 30, 2021

avijayanhwx reviewed May 4, 2021

View reviewed changes

JacksonYao287 requested a review from avijayanhwx May 9, 2021 03:24

avijayanhwx approved these changes May 10, 2021

View reviewed changes

avijayanhwx merged commit a16e648 into apache:master May 10, 2021

JacksonYao287 deleted the HDDS-5126 branch July 20, 2021 11:44

Conversation

JacksonYao287 commented Apr 22, 2021

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

JacksonYao287 commented Apr 23, 2021

Uh oh!

bshashikant commented Apr 23, 2021

Uh oh!

avijayanhwx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JacksonYao287 commented Apr 24, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JacksonYao287 commented Apr 29, 2021

Uh oh!

avijayanhwx left a comment

Choose a reason for hiding this comment

Uh oh!

symious commented May 4, 2021

Uh oh!

JacksonYao287 commented May 7, 2021

Uh oh!

avijayanhwx left a comment

Choose a reason for hiding this comment

Uh oh!

avijayanhwx commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JacksonYao287 Apr 26, 2021 •

edited

Loading