Skip to content

HDDS-10880. Duplicate Pipeline ID Detected in ReconContainerManager.#6742

Merged
devmadhuu merged 7 commits intoapache:masterfrom
ArafatKhan2198:HDDS-10880
Jun 13, 2024
Merged

HDDS-10880. Duplicate Pipeline ID Detected in ReconContainerManager.#6742
devmadhuu merged 7 commits intoapache:masterfrom
ArafatKhan2198:HDDS-10880

Conversation

@ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented May 28, 2024

What changes were proposed in this pull request?

Description:

This pull request proposes changes to handle the scenario where a duplicate pipeline ID is detected in Recon. The modifications ensure that when a pipeline with a duplicate ID is encountered, the system logs a warning and skips the addition, instead of throwing an exception.

Proposed Changes:

  1. Modified the addPipeline method in the ReconPipelineManager class to check if the pipeline already exists before attempting to add it.
  2. If a duplicate pipeline ID is detected, the system logs a warning and skips the addition.
  3. Implemented a test case to verify that no exception is thrown when attempting to add a duplicate pipeline.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10880

How was this patch tested?

A new test case was added to TestReconPipelineManager to verify that no exception is thrown when a duplicate pipeline is added. The test case involves:

@ArafatKhan2198
Copy link
Contributor Author

@devmadhuu @dombizita
Can you please take a look.

@ArafatKhan2198 ArafatKhan2198 requested a review from dombizita May 29, 2024 13:02
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for working on this patch. LGTM +1

@raju-balpande
Copy link
Contributor

Skipping addition of pipeline with same Id shall not be the solution. This will mislead the caller that the pipeline is added successfully.
And hence shall keep throwing exception as it was implemented.
Instead, we shall correct the the logic of generation of pipeline Id so that it should not be duplicated Or,
we shall handle the exception indicating the duplicate pipeline and then we will retry with new pipeline Id.

@ArafatKhan2198
Copy link
Contributor Author

Thanks for the comment, @raju-balpande . Recon does not work the way you have described. It is not responsible for creating pipeline IDs. SCM (Storage Container Manager) is actually responsible for creating pipelines and their related information along with their UUIDs, which have been thoroughly tested and are robust. SCM will not create a duplicate pipeline ID. If it did, it would affect a lot of other things, not just Recon.

Pipelines are created by SCM and then registered in Recon. We have a PipelineSyncTask in Recon that runs every 5 minutes. This task makes an RPC call to SCM every 5 minutes to fetch the latest set of pipelines and adds or removes pipelines as needed. Additionally, the ReconContainerManager can add a new pipeline. If Recon receives a container report for a new container whose pipeline does not exist, it registers the new pipeline by making an RPC call to SCM, fetches the pipeline information, and then adds it.

The addPipeline method in ReconPipelineManager can be called from two entities. The reason for the duplicate pipeline error was that in the ReconContainerManager code, we first check if the pipeline exists. If it does not, we proceed to add it. However, as soon as the existence check passed, the PipelineSyncTask had already added the pipeline, causing the duplicate pipeline error.

    // Check if the pipeline is present in Recon
    if (!pipelineManager.containsPipeline(pipelineID)) {
      // Pipeline is not present, add it first.
      LOG.info("Adding new pipeline {} from SCM.", pipelineID);
      reconPipelineManager.addPipeline(containerWithPipeline.getPipeline());
    }

To address this, we added a final check inside the addPipeline method of ReconPipelineManager. Here, we acquire a write lock, check if the pipeline exists, and then add the pipeline if it does not. This ensures that the exception is not thrown and the appropriate log message is delivered.

  @VisibleForTesting
  public void addPipeline(Pipeline pipeline)
      throws IOException {
    acquireWriteLock();
    try {
      if (!containsPipeline(pipeline.getId())) {
        getStateManager().addPipeline(
          pipeline.getProtobufMessage(ClientVersion.CURRENT_VERSION));
      }
    } finally {
      releaseWriteLock();
    }
  }

Recon operates on eventual consistency, meaning it will eventually update and correct itself when state changes.

@raju-balpande
Copy link
Contributor

Thanks for the comment, @raju-balpande . Recon does not work the way you have described. It is not responsible for creating pipeline IDs. SCM (Storage Container Manager) is actually responsible for creating pipelines and their related information along with their UUIDs, which have been thoroughly tested and are robust. SCM will not create a duplicate pipeline ID. If it did, it would affect a lot of other things, not just Recon.

Pipelines are created by SCM and then registered in Recon. We have a PipelineSyncTask in Recon that runs every 5 minutes. This task makes an RPC call to SCM every 5 minutes to fetch the latest set of pipelines and adds or removes pipelines as needed. Additionally, the ReconContainerManager can add a new pipeline. If Recon receives a container report for a new container whose pipeline does not exist, it registers the new pipeline by making an RPC call to SCM, fetches the pipeline information, and then adds it.

The addPipeline method in ReconPipelineManager can be called from two entities. The reason for the duplicate pipeline error was that in the ReconContainerManager code, we first check if the pipeline exists. If it does not, we proceed to add it. However, as soon as the existence check passed, the PipelineSyncTask had already added the pipeline, causing the duplicate pipeline error.

    // Check if the pipeline is present in Recon
    if (!pipelineManager.containsPipeline(pipelineID)) {
      // Pipeline is not present, add it first.
      LOG.info("Adding new pipeline {} from SCM.", pipelineID);
      reconPipelineManager.addPipeline(containerWithPipeline.getPipeline());
    }

To address this, we added a final check inside the addPipeline method of ReconPipelineManager. Here, we acquire a write lock, check if the pipeline exists, and then add the pipeline if it does not. This ensures that the exception is not thrown and the appropriate log message is delivered.

  @VisibleForTesting
  public void addPipeline(Pipeline pipeline)
      throws IOException {
    acquireWriteLock();
    try {
      if (!containsPipeline(pipeline.getId())) {
        getStateManager().addPipeline(
          pipeline.getProtobufMessage(ClientVersion.CURRENT_VERSION));
      }
    } finally {
      releaseWriteLock();
    }
  }

Recon operates on eventual consistency, meaning it will eventually update and correct itself when state changes.

Thanks for the clarification, I got the point now.

@adoroszlai
Copy link
Contributor

adoroszlai commented Jun 11, 2024

@ArafatKhan2198 If we always want to check whether pipeline exists before addition, then the check should be performed inside addPipeline. The method should be changed to return whether the pipeline was added or not. The caller can then use this information to perform conditional action, e.g.

  • update pipeline state/timestamp (for existing pipeline whose addition was skipped)
  • log the action

Benefits:

  1. existence is checked atomically with addition
  2. existence is checked only once
  3. logging etc. are performed without lock

@ArafatKhan2198
Copy link
Contributor Author

@ArafatKhan2198 If we always want to check whether pipeline exists before addition, then the check should be performed inside addPipeline. The method should be changed to return whether the pipeline was added or not. The caller can then use this information to perform conditional action, e.g.

  • update pipeline state/timestamp (for existing pipeline whose addition was skipped)
  • log the action

Benefits:

  1. existence is checked atomically with addition
  2. existence is checked only once
  3. logging etc. are performed without lock

Thanks for the suggestion @adoroszlai
I have added the following changes :-

ReconContainerManager:

  • Simplified the addNewContainer method by using the return value of addPipeline to conditionally log the addition of new pipelines, removing redundant pipeline existence checks.

ReconPipelineManager:

  • Modified the initializePipelines method to use the return value of addPipeline for logging whether a new pipeline was added or already existed, ensuring updates are made to the state and creation time only if the pipeline already exists.

ReconPipelineReportHandler:

  • No changes were made. The existing early check for pipeline existence and fetching from SCM if it doesn't exist remains intact to ensure the necessary pipeline information is retrieved before attempting to add it.

@adoroszlai
Copy link
Contributor

Thanks @ArafatKhan2198 for updating the patch.

ReconPipelineReportHandler:

  • No changes were made. The existing early check for pipeline existence and fetching from SCM if it doesn't exist remains intact to ensure the necessary pipeline information is retrieved before attempting to add it.

It's not a big deal, but maybe this one could be changed to make logging conditional:

LOG.info("Adding new pipeline {} to Recon pipeline metadata.",
pipelineFromScm);
reconPipelineManager.addPipeline(pipelineFromScm);

similar to:

if (reconPipelineManager.addPipeline(containerWithPipeline.getPipeline())) {
LOG.info("Added new pipeline {} to Recon pipeline metadata.", pipelineID);
}

just in case some other process added the same pipeline in the meantime.

@ArafatKhan2198
Copy link
Contributor Author

Done with the comments @adoroszlai

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for improving the patch.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for improving the patch. Few comments.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 Thanks for improving the patch. Changes LGTM +1.

@devmadhuu devmadhuu merged commit 2aa5617 into apache:master Jun 13, 2024
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments