Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-5094. [FSO] Fail OM startup when turn on prefix layout with old buckets #2151

Merged
merged 11 commits into from
Apr 30, 2021

Conversation

rakeshadr
Copy link
Contributor

@rakeshadr rakeshadr commented Apr 12, 2021

What changes were proposed in this pull request?

Start an existing OM using new PREFIX configuration layout format, which(OM) already contains old buckets.

Scenario:

  1. Start cluster with configs,
    OZONE-SITE.XML_ozone.om.enable.filesystem.paths=true
    OZONE-SITE.XML_ozone.om.metadata.layout=simple

  2. create /vol1/bucket1/dir1/dir2/key1

  3. Stop OM

  4. Update config OZONE-SITE.XML_ozone.om.metadata.layout=prefix

  5. Start OM.

Output: Failed to start OM as there is a bucket with old layout format.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5094

How was this patch tested?

Added unit test case.

@rakeshadr rakeshadr changed the title HDDS-5094. [FSO] Existing keys become unavailable after turning on prefix layout HDDS-5094. [FSO] Fail OM startup when turn on prefix layout with old buckets Apr 12, 2021
@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Apr 12, 2021

@rakeshadr
I think with this approach, a cluster upgraded from an older version and if someone want to try a new layout version format, they cannot use it. IMHO, I think we should support older buckets read/write, and provide the capability to support users to try out the new format on the newly created buckets.

One Idea here is to persist format layout in bucket metadata and use that. And also iterate the buckets(Even now we iterate buckets during startup) and set layout version of the buckets to older format layout during startup if the format is not set. (I think this approach has been discussed during initial meetings)

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment the approach/solution taken to solve this issue.

@rakeshadr
Copy link
Contributor Author

rakeshadr commented Apr 12, 2021

@rakeshadr
I think with this approach, a cluster upgraded from an older version and if someone want to try a new layout version format, they cannot use it. IMHO, I think we should support older buckets read/write, and provide the capability to support users to try out the new format on the newly created buckets.

One Idea here is to persist format layout in bucket metadata and use that. And also iterate the buckets(Even now we iterate buckets during startup) and set layout version of the buckets to older format layout during startup if the format is not set. (I think this approach has been discussed during initial meetings)

Thanks @bharatviswa504 for bringing the point. I think, I haven't explain it clearly about the purpose of this patch. I will try to add more details here:

Yes, I have persisted the FSO layout in bucketInfo.getMetadata() (bucket level property) to differentiate older buckets and newer buckets. As OM is persisting the layout in the bucket level the feature can be extended to support both older and newer buckets. Please see rename and delete APIs, the older and newer buckets are implemented in that fashion. I have plans to support all the cases gracefully, I hope my comment in the jira conveys the same thoughts.

The purpose of this patch is to validate and fail during OM startup till I support and implement all the cases incrementally(phase by phase), then will remove this check. This means that, now in phase-1 the new feature is allowed only in a fresh new cluster. Later, I agree to support this case in follow-up tasks, existing old key(s) should be visible in prefix based OM layout.

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the quick fix @rakeshadr.

  1. This patch seems to disable the usage of this feature in any existing cluster, and it can be used only in new cluster. It seems to be a quite huge limitation and I would suggest to mention in at least in the doc: if somebody creates at least one bucket this value can not be changed anymore.

  2. Can you please confirm if this check works well when the metadata is empty? I may be wrong (I didn't test) but I think if I create buckets with existing code (no metadata) and update to this branch and set the layout to 'simple' the cluster couldn't be started. Fix me if I am wrong.

  3. This validation won't work if somebody enables only the prefix-layout without the fs path support. What is the expected behavior in this case? Do we need any validation to avoid such case?

@@ -1115,6 +1118,9 @@ public void start() throws IOException {
getOMMetadataLayout();

metadataManager.start(configuration);

validatesBucketLayoutMismatches();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be checked before starting the ratis/RPC servers? Seems to be more safe IMHO...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be checked before starting the ratis/RPC servers? Seems to be more safe IMHO...

Thanks @elek for the comment. IMHO to maintain the existing order of starting of all the services as it is and don't like to impact existing OM startup behavior due to this temporary validation logic, by default the feature is disabled and there is no impact to the existing users/clusters. Like I said the contributors(dev team) will be actively working to support older[simple] buckets in prefix layout and will remove this validation logic later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impact existing OM startup behavior due to this temporary validation logic

Thanks the answer @rakeshadr. Can you please help me to understand how can moving this line upper would affect OM start behavior?

As far as I understand this check is only applied when OzoneManagerRatisUtils.isBucketFSOptimized is true. I just asked if this new (!) check would be more safe to do before the ratis initialization as additional actions may happen before this safety check stops the clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elek Yes, its validating the bucket metadata only when OzoneManagerRatisUtils.isBucketFSOptimized is true. IIUC, you are suggesting to do the modification like below in OzoneManager#start() function, right? .

I have tried this but my test fail as I could see validation logic reads bucketMetadata successfully only if I keep it after metadataManager.start(configuration); function.
The validation logic reads bucket table data using iterator = metadataManager.getBucketTable().iterator();,

    getOMMetadataLayout();
    validatesBucketLayoutMismatches();

    // Start Ratis services
    if (omRatisServer != null) {
      omRatisServer.start();
    }

    metadataManager.start(configuration);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elek I hope you agree to move the metadataManager.start(configuration) call upwards before the ratis server start ?

OzoneManager#start()

    LOG.info(buildRpcServerStartMessage("OzoneManager RPC server",
        omRpcAddress));

    metadataManager.start(configuration);
    getOMMetadataLayout();
    validatesBucketLayoutMismatches();

    // Start Ratis services
    if (omRatisServer != null) {
      omRatisServer.start();
    }

    startSecretManagerIfNecessary();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elek I've modified the startup steps in latest (2nd) commit -

  1. Validate Invalid Configuration(layout PREFIX and ozone.om.enable.filesystem.paths=false) at the beginning
  2. Moved validatesBucketLayoutMismatches before ratis server.

Hope this looks fine to you. Thanks!

@rakeshadr
Copy link
Contributor Author

  1. This patch seems to disable the usage of this feature in any existing cluster, and it can be used only in new cluster. It seems to be a quite huge limitation and I would suggest to mention in at least in the doc: if somebody creates at least one bucket this value can not be changed anymore.

Sure, I will update the wiki page and apache docs. "The feature is disabled by default and can be used only in new cluster".

  1. Can you please confirm if this check works well when the metadata is empty? I may be wrong (I didn't test) but I think if I create buckets with existing code (no metadata) and update to this branch and set the layout to 'simple' the cluster couldn't be started. Fix me if I am wrong.

Your previous question has the answer - "it can be used only in new cluster". A new cluster will not have any volumes. Simple answer is, during cluster startup the validation logic will check all the existing buckets, if any and then will fail the OM startup if there are any older buckets[simple layout].

  1. This validation won't work if somebody enables only the prefix-layout without the fs path support. What is the expected behavior in this case? Do we need any validation to avoid such case?

Please refer isFSOptimizedBucket logic. Here it checks both the properties and FSO requires both to be set(ozone.om.metadata.layout=PREFIX and ozone.om.enable.filesystem.paths=true).

@rakeshadr
Copy link
Contributor Author

Thanks @elek for the review comments.

@bharatviswa504
Copy link
Contributor

@rakeshadr
I think with this approach, a cluster upgraded from an older version and if someone want to try a new layout version format, they cannot use it. IMHO, I think we should support older buckets read/write, and provide the capability to support users to try out the new format on the newly created buckets.
One Idea here is to persist format layout in bucket metadata and use that. And also iterate the buckets(Even now we iterate buckets during startup) and set layout version of the buckets to older format layout during startup if the format is not set. (I think this approach has been discussed during initial meetings)

Thanks @bharatviswa504 for bringing the point. I think, I haven't explain it clearly about the purpose of this patch. I will try to add more details here:

Yes, I have persisted the FSO layout in bucketInfo.getMetadata() (bucket level property) to differentiate older buckets and newer buckets. As OM is persisting the layout in the bucket level the feature can be extended to support both older and newer buckets. Please see rename and delete APIs, the older and newer buckets are implemented in that fashion. I have plans to support all the cases gracefully, I hope my comment in the jira conveys the same thoughts.

The purpose of this patch is to validate and fail during OM startup till I support and implement all the cases incrementally(phase by phase), then will remove this check. This means that, now in phase-1 the new feature is allowed only in a fresh new cluster. Later, I agree to support this case in follow-up tasks, existing old key(s) should be visible in prefix based OM layout.

Thank You @rakeshadr for the clear explanation and offline discussion. I am fine with this, if we are planning to address them in furthere Jiras.

@elek
Copy link
Member

elek commented Apr 19, 2021

Simple answer is, during cluster startup the validation logic will check all the existing buckets, if any and then will fail the OM startup if there are any older buckets[simple layout].

Fix me if I am wrong, but the current check doesn't handle case when bucketMetadata is empty (old data). For example if I set layout prefix to simple explicitly (because I have old data) the cluster will be failed even if it's a valid scenario.

+  public static boolean isFSOptimizedBucket(
+      Map<String, String> bucketMetadata) {
+    // layout 'PREFIX' represents optimized FS path
+    boolean metadataLayoutEnabled =
+        org.apache.commons.lang3.StringUtils.equalsIgnoreCase(
+            OMConfigKeys.OZONE_OM_METADATA_LAYOUT_PREFIX,
+            bucketMetadata
+                .get(OMConfigKeys.OZONE_OM_METADATA_LAYOUT));
+
+    boolean fsEnabled =
+        Boolean.parseBoolean(bucketMetadata
+            .get(OMConfigKeys.OZONE_OM_ENABLE_FILESYSTEM_PATHS));
+
+    return metadataLayoutEnabled && fsEnabled;
+  }

Please refer isFSOptimizedBucket logic. Here it checks both the properties and FSO requires both to be set(ozone.om.metadata.layout=PREFIX and ozone.om.enable.filesystem.paths=true).

Thanks, will check it. For your information: during my test I was able to create buckets with layout=PREFIX and paths=false. Will try to reproduce the problem.

@elek
Copy link
Member

elek commented Apr 19, 2021

I can confirm that I can create buckets where PREFIX is enabled but filesystem.paths is not:

bash-4.2$ ozone sh volume create /vol1
bash-4.2$ ozone sh bucket create /vol1/bucket1
bash-4.2$ ozone sh bucket info /vol1/bucket1     
{
  "metadata" : {
    "ozone.om.metadata.layout" : "PREFIX",
    "ozone.om.enable.filesystem.paths" : "false"
  },
  "volumeName" : "vol1",
  "name" : "bucket1",
  "storageType" : "DISK",
  "versioning" : false,
  "usedBytes" : 0,
  "usedNamespace" : 0,
  "creationTime" : "2021-04-19T10:06:25.059Z",
  "modificationTime" : "2021-04-19T10:06:25.059Z",
  "encryptionKeyName" : null,
  "sourceVolume" : null,
  "sourceBucket" : null,
  "quotaInBytes" : -1,
  "quotaInNamespace" : -1
}
bash-4.2$ cat /etc/hadoop/ozone-site.xml | grep paths
bash-4.2$ cat /etc/hadoop/ozone-site.xml | grep layout
<property><name>ozone.om.metadata.layout</name><value>PREFIX</value></property>
bash-4.2$ 

If it's a supported case we need to improve the validation of this patch. If it's not, we should open a a new issue for validation and deny this mis-configuration as it can cause problems.

@rakeshadr
Copy link
Contributor Author

I can confirm that I can create buckets where PREFIX is enabled but filesystem.paths is not:

bash-4.2$ ozone sh volume create /vol1
bash-4.2$ ozone sh bucket create /vol1/bucket1
bash-4.2$ ozone sh bucket info /vol1/bucket1     
{
  "metadata" : {
    "ozone.om.metadata.layout" : "PREFIX",
    "ozone.om.enable.filesystem.paths" : "false"
  },
  "volumeName" : "vol1",
  "name" : "bucket1",
  "storageType" : "DISK",
  "versioning" : false,
  "usedBytes" : 0,
  "usedNamespace" : 0,
  "creationTime" : "2021-04-19T10:06:25.059Z",
  "modificationTime" : "2021-04-19T10:06:25.059Z",
  "encryptionKeyName" : null,
  "sourceVolume" : null,
  "sourceBucket" : null,
  "quotaInBytes" : -1,
  "quotaInNamespace" : -1
}
bash-4.2$ cat /etc/hadoop/ozone-site.xml | grep paths
bash-4.2$ cat /etc/hadoop/ozone-site.xml | grep layout
<property><name>ozone.om.metadata.layout</name><value>PREFIX</value></property>
bash-4.2$ 

If it's a supported case we need to improve the validation of this patch. If it's not, we should open a a new issue for validation and deny this mis-configuration as it can cause problems.

Thanks for the confirmation @elek with a test case. Instead of failing with an error, presently OM proceeds silently with default metadata layout by making isBucketFSOptimized flag to false. I agree to fail the startup and make it visible to everyone. I will include this check along with this OM start up validation patch.

@rakeshadr
Copy link
Contributor Author

Simple answer is, during cluster startup the validation logic will check all the existing buckets, if any and then will fail the OM startup if there are any older buckets[simple layout].

Fix me if I am wrong, but the current check doesn't handle case when bucketMetadata is empty (old data). For example if I set layout prefix to simple explicitly (because I have old data) the cluster will be failed even if it's a valid scenario.

+  public static boolean isFSOptimizedBucket(
+      Map<String, String> bucketMetadata) {
+    // layout 'PREFIX' represents optimized FS path
+    boolean metadataLayoutEnabled =
+        org.apache.commons.lang3.StringUtils.equalsIgnoreCase(
+            OMConfigKeys.OZONE_OM_METADATA_LAYOUT_PREFIX,
+            bucketMetadata
+                .get(OMConfigKeys.OZONE_OM_METADATA_LAYOUT));
+
+    boolean fsEnabled =
+        Boolean.parseBoolean(bucketMetadata
+            .get(OMConfigKeys.OZONE_OM_ENABLE_FILESYSTEM_PATHS));
+
+    return metadataLayoutEnabled && fsEnabled;
+  }

I'm adding test scenarios explicitly with config values to understand it better, please feel free to add if I missed anything. Thanks!

Scenario-1) Created a bucket with older cluster. Here bucketMetadata is empty.

Test Step-1) Stops OM server and updated configs ozone.om.metadata.layout=PREFIX and ozone.om.enable.filesystem.paths=true.

Test Step-2) Starts OM, validate isFSOptimizedBucket and metadataLayoutEnabled will be false.

Test Result: OM startup will fail.

Scenario-2) Created a bucket with older cluster. Here bucketMetadata is empty.

Test Step-1) Stops OM server and updated configs ozone.om.metadata.layout=SIMPLE and ozone.om.enable.filesystem.paths=true.

Test Step-2) Starts OM

Test Result: OM won't validate the bucketmetadata layout as the cluster level layout is SIMPLE. It will start successfully with prefix feature disabled.

Please add you expectations, thanks @elek

@elek
Copy link
Member

elek commented Apr 21, 2021

Instead of failing with an error, presently OM proceeds silently with default metadata layout by making isBucketFSOptimized flag to false.

It might be better to throw an exception in case of misconfiguration. the isBucketFSOptimized can be false when I configure PREFIX + fs.enabled = false, but the wrong metadata was saved to the key:

bash-4.2$ ozone sh bucket info /vol1/bucket1     
{
  "metadata" : {
    "ozone.om.metadata.layout" : "PREFIX",
    "ozone.om.enable.filesystem.paths" : "false

Even better: it would be great to use just the ozone.om.metadata.layout and always turn on ozone.om.enable.filesystem.paths implicitly as it seems to be a strong requirement.

@elek
Copy link
Member

elek commented Apr 21, 2021

Scenario-2) Created a bucket with older cluster. Here bucketMetadata is empty.

Test Step-1) Stops OM server and updated configs ozone.om.metadata.layout=SIMPLE and ozone.om.enable.filesystem.paths=true.

Test Step-2) Starts OM

Test Result: OM won't validate the bucketmetadata layout as the cluster level layout is SIMPLE. It will start successfully with prefix feature disabled.

I think it's better to always validate, because we have this scenario, too:

Scenarion-3) Created a bucket with the new cluster and layout=PREFIX. bucketMetadata is filled

Test Step-1) Stops OM server and updated configs ozone.om.metadata.layout=SIMPLE and ozone.om.enable.filesystem.paths=true.

Test Step-2) Starts OM

Test Result: OM should validate the layouts and fail as SIMPLE and PREFIX couldn't be handled in the same system (today).

@elek
Copy link
Member

elek commented Apr 21, 2021

BTW, can we test it without using MiniOzoneCluster? Seems to be way more faster which would enable us the test more scenarios. I can be wrong, but it seems to be easy as we test a very well separated local check (we need bucket metadata + the checker...)

@rakeshadr
Copy link
Contributor Author

Instead of failing with an error, presently OM proceeds silently with default metadata layout by making isBucketFSOptimized flag to false.

It might be better to throw an exception in case of misconfiguration. the isBucketFSOptimized can be false when I configure PREFIX + fs.enabled = false, but the wrong metadata was saved to the key:

bash-4.2$ ozone sh bucket info /vol1/bucket1     
{
  "metadata" : {
    "ozone.om.metadata.layout" : "PREFIX",
    "ozone.om.enable.filesystem.paths" : "false

Even better: it would be great to use just the ozone.om.metadata.layout and always turn on ozone.om.enable.filesystem.paths implicitly as it seems to be a strong requirement.

Thanks @elek for the clear information, IMHO to throw exception in case of invalid configuration and fail OM startup. I've updated the PR with this behavior.

@rakeshadr
Copy link
Contributor Author

Scenarion-3) Created a bucket with the new cluster and layout=PREFIX. bucketMetadata is filled

Test Step-1) Stops OM server and updated configs ozone.om.metadata.layout=SIMPLE and ozone.om.enable.filesystem.paths=true.

Test Step-2) Starts OM

Test Result: OM should validate the layouts and fail as SIMPLE and PREFIX couldn't be handled in the same system (today).

Thanks @elek for pointing out this case. I have added this case in the latest commit.

@rakeshadr
Copy link
Contributor Author

rakeshadr commented Apr 21, 2021

BTW, can we test it without using MiniOzoneCluster? Seems to be way more faster which would enable us the test more scenarios. I can be wrong, but it seems to be easy as we test a very well separated local check (we need bucket metadata + the checker...)

Thanks a lot @elek for the suggestion. Meantime, I've modified existing unit test case by covering the cases that we discussed in this PR but I am still using MiniOzoneCluster and would like to get your feedback on this. I am doing start/stop only OM to test various cases. I have ran this unit test multiple times(20 times) locally in my env and I could see that each run took only <=5secs. Hope this make sense to you!

FYI, I'm also having future plans to create MiniOMCluster by mocking other services except OM, especially for the PREFIX/SIMPLE based integration-tests(applicable tests). IMHO, that will save some additional cluster startup time. I will also revisit this unit test along with that task. I will raise a jira task to track it, in some time.

@rakeshadr
Copy link
Contributor Author

Hi @elek, I've updated the PR based on your comment. Can you please review it again, thanks!

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @rakeshadr the update. Validation logic looks good to me. 👍

I have a few -- very small -- nit comments (and I assume that some unit test changes supposed to be part of an integration test related patch not this one)

I have ran this unit test multiple times(20 times) locally in my env and I could see that each run took only <=5secs. Hope this make sense to you!

It seems you have a powerful machine. On CI server it's:

2021-04-26T03:50:12.2809634Z [INFO] Running org.apache.hadoop.ozone.om.TestOMStartupWithLayout
2021-04-26T03:50:45.3975735Z [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.113 s - in org.apache.hadoop.ozone.om.TestOMStartupWithLayout

But I think moving the check to the beginning is helped, and I didn't consider that in most of the cases the MiniOzoneCluster is not required to be started (as it should be failing) So let's keep it as is for now...

I'm also having future plans to create MiniOMCluster by mocking other services except OM, especially for the PREFIX/SIMPLE based integration-tests

👍 It sounds like a great idea. I like it.

Copy link
Contributor Author

@rakeshadr rakeshadr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @elek for the review comments. Please find my reply and I will update the patch soon.

@rakeshadr
Copy link
Contributor Author

I have one comment the approach/solution taken to solve this issue.

Marked this as addressed based on @bharatviswa504's reply - comment link here

@rakeshadr rakeshadr dismissed bharatviswa504’s stale review April 29, 2021 03:06

Marked this as addressed based on @bharatviswa504's reply - #2151 (comment)

@rakeshadr
Copy link
Contributor Author

@elek I've addressed your comments. Can you please review it again, thanks!

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks, (the continuous) update and patience.

@rakeshadr
Copy link
Contributor Author

Thanks a lot @elek, @bharatviswa504, @mukul1987 for the detailed reviews. After several CI runs, finally got a clean build report. I'm merging it to the branch.

@rakeshadr rakeshadr merged commit 3c82503 into apache:HDDS-2939 Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants