Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9343. Shift sortDatanodes logic to OM #5391

Merged
merged 29 commits into from Mar 6, 2024

Conversation

tanvipenumudy
Copy link
Contributor

@tanvipenumudy tanvipenumudy commented Oct 5, 2023

What changes were proposed in this pull request?

Following HDDS-8300, the key metadata API within OM sorts datanodes allowing clients to optimize data I/O on the basis of locality, yet this currently involves OM making additional calls to SCM for sorting datanodes accounting for over 80% of getKeyInfo latency, reducing peak pure read ops/sec.

Motivation behind this change:

  • When OM has to call SCM then the performance of the objects on the read-path can affect SCM’s scaling requirements, so SCM needs to perform at the same level as that of OM for the read-path - harming the performance of OM by going to SCM for every read.
  • Instead, a more efficient approach would be to perform sorting within OM itself, eliminating the need to rely on SCM for every read.

The current patch does the following:

  1. Introduce a new SCM API for serving the SCM network topology clusterTree information (Type: InnerNode) to OM.
  2. Implement caching and periodic refreshing of this clusterTree information obtained from SCM at regular intervals.
  3. Refactor the NetworkTopology calculation and sorting logic to KeyManagerImpl.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9343

How was this patch tested?

@tanvipenumudy tanvipenumudy marked this pull request as ready for review October 9, 2023 09:09
Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @tanvipenumudy for the patch.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanvipenumudy Thanks for working over this, have few question over solution,
SCM and OM may be over different node or access for SCM deployed path may not be accessible by OM, So IMO, just passing filename may not be enough.
Also,
Please check how user update the conf and places the network topology file? can same be done at OM to simplify the solution?

@tanvipenumudy
Copy link
Contributor Author

SCM and OM may be over different node or access for SCM deployed path may not be accessible by OM, So IMO, just passing filename may not be enough.

This is a valid point! The current ideation is to utilize this file for generating the NetworkTopology object here, which will be used to call NetworkTopology#sortByDistanceCost (in KeyManagerImpl) for performing sorting.

I believe this can be solved using two different approaches:

  1. Serving the loaded SCM schema file content (NodeSchemaLoadResult object) to OM instead of the SCM schema file config.

Please check how user update the conf and places the network topology file? can same be done at OM to simplify the solution?

  1. As an alternative, a separate configuration can be introduced on the OM-side as suggested.

Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should serve the topology itself in the API using a schema. If the HDFS schema is well known we can continue using that with an explanation for how the client can parse the topology in the file.

@kerneltime
Copy link
Contributor

kerneltime commented Oct 18, 2023

SCM and OM may be over different node or access for SCM deployed path may not be accessible by OM, So IMO, just passing filename may not be enough.

This is a valid point! The current ideation is to utilize this file for generating the NetworkTopology object here, which will be used to call NetworkTopology#sortByDistanceCost (in KeyManagerImpl) for performing sorting.

I believe this can be solved using two different approaches:

  1. Serving the loaded SCM schema file content (NodeSchemaLoadResult object) to OM instead of the SCM schema file config.

Please check how user update the conf and places the network topology file? can same be done at OM to simplify the solution?

  1. As an alternative, a separate configuration can be introduced on the OM-side as suggested.

The goal here is that the admin only needs to configure the topology with the SCM once and if any clients OM or others would like to fetch the topology for that SCM they can fetch it. The question that then comes up is, is the topology scheme as defined the file passed to SCM good enough to serve directly or should SCM serialize the topology some other way.
Archiecturally, OM should be able to communicate with multiple SCM, so keeping the isolation of config files is important. We should not be reading from local FS.

@tanvipenumudy
Copy link
Contributor Author

tanvipenumudy commented Oct 25, 2023

Summarizing the discussion points with @ChenSammi:

  • The SCM network topology schema file is used only once when SCM starts for building the static information required by NetworkTopology in memory.
  • However, obtaining this information alone from SCM does not provide a complete solution for shifting sortDatanodes logic to OM.
  • Another aspect to consider is that the NetworkTopology's clusterTree can dynamically change when a new DN is registered with SCM or when a DN is removed (dead).
  • This information can be served to OM similarly by introducing a new SCM API - caching and periodically refreshing the clusterTree in case of updates.
  • As per discussion, we would be skipping serving the schema file from SCM to OM as the schema file should be present under OM’s etc/hadoop/ directory even when OM and SCM are deployed on different nodes. Hence there should be no need for OM to access SCM’s etc/hadoop directory.

@adoroszlai adoroszlai marked this pull request as draft November 18, 2023 17:30
@tanvipenumudy tanvipenumudy changed the title HDDS-9389. Introduce new API and cache refresh for serving network topology schema to OM HDDS-9343. Shift sortDatanodes logic to OM Dec 18, 2023
@tanvipenumudy tanvipenumudy marked this pull request as ready for review December 18, 2023 18:18
@kerneltime
Copy link
Contributor

I can approve and run the CI here once you this error is looked into https://github.com/tanvipenumudy/ozone/actions/runs/7252248660/job/19756622066

@ChenSammi
Copy link
Contributor

Currently, we do not have the data for this. But we intend to benchmark a scenario such as this one (with 1000 nodes) to see what would be the performance impact for a large cluster!

@tanvipenumudy , I would like to see the data first to make sure there is no major performance impact with this approach. There is no need to create a benchmark for this, a unit test with a 1000 nodes topology tree can get the data.

@tanvipenumudy
Copy link
Contributor Author

There is no need to create a benchmark for this, a unit test with a 1000 nodes topology tree can get the data.

I conducted benchmarking of OM fetching the network topology tree information from SCM using an integration test (MiniOzoneCluster). However, local limitations prevented me from spinning up more than 50 nodes at a time.

On an average, fetching the network topology tree information for 50 nodes took approximately 0.02 to 0.03 seconds. Even in a worst-case scenario, where we extrapolate to 1000 nodes, the estimated time would be 0.6 seconds (0.03 * 20), still less than a second for the refresh duration frequency of every one hour (default).

@tanvipenumudy
Copy link
Contributor Author

Thank you for the review @ChenSammi.
@kerneltime, @adoroszlai, @duongkame could you please take another look at the patch, thanks!

@@ -1673,6 +1698,18 @@ public void start() throws IOException {

keyManager.start(configuration);

try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: This can be fixed in a new PR, since KeyManager is dependent on scmTopologyClient, maybe we should start the client before the KeyManager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thank you for the review @kerneltime!

Ticket tracking this task: HDDS-10512.

@kerneltime
Copy link
Contributor

The change looks good to me overall. Few follow up work post merge should be

  1. Add documentation that OM needs to have access the topology config
  2. More metrics around cache refresh and payload received.

@kerneltime kerneltime merged commit 140c5de into apache:master Mar 6, 2024
35 checks passed
@smengcl
Copy link
Contributor

smengcl commented Mar 7, 2024

This is breaking build on master branch:

https://github.com/apache/ozone/actions/runs/8179843819/job/22366707752

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Mar 15, 2024
(cherry picked from commit 140c5de)
Change-Id: I2bf25ac0879b497639e0f69e3ef3d65e43c7c359
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Mar 15, 2024
(cherry picked from commit a145dd5)
Change-Id: I5ddc6d2c69d5eeb6386d0038e3166b8b951acdd8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants