HDDS-4102. Normalize Keypath for lookupKey. #1328

bharatviswa504 · 2020-08-13T17:33:04Z

What changes were proposed in this pull request?

When ozone.om.enable.filesystem.paths, OM normalizes path, and stores the Keyname.

Now when user tries to read the file from S3 using the keyName which user has used to create the Key, it will return error KEY_NOT_FOUND

The issue is, lookupKey also need to normalize path, when ozone.om.enable.filesystem.paths is enabled. This is a common API used by S3/FS.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4102

How was this patch tested?

Added a test.

elek · 2020-08-26T14:01:04Z

Thanks for the patch @bharatviswa504, I would suggest adding a related robot test, too, which would make it possible to test the AWS compatibility of the change.

What do you think?

elek · 2020-08-26T14:34:50Z

Tested and file/ and file are different:

$ aws s3api get-object --bucket=ozonetest --key=test1/file /tmp/www
{
    "AcceptRanges": "bytes",
    "LastModified": "Wed, 26 Aug 2020 14:24:06 GMT",
    "ContentLength": 79,
    "ETag": "\"6fcb0dbe8f0ef330b38f81a6a6a7a66e\"",
    "ContentType": "text/markdown",
    "Metadata": {}
}

aws s3api get-object --bucket=ozonetest --key=test1/file/ /tmp/www

An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Just an idea: what do you think about normalizing the path on the client side? (o3fs/ofs). Would it make any problems?

bharatviswa504 · 2020-08-26T21:08:27Z

Tested and file/ and file are different:

$ aws s3api get-object --bucket=ozonetest --key=test1/file /tmp/www
{
    "AcceptRanges": "bytes",
    "LastModified": "Wed, 26 Aug 2020 14:24:06 GMT",
    "ContentLength": 79,
    "ETag": "\"6fcb0dbe8f0ef330b38f81a6a6a7a66e\"",
    "ContentType": "text/markdown",
    "Metadata": {}
}

aws s3api get-object --bucket=ozonetest --key=test1/file/ /tmp/www

An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Just an idea: what do you think about normalizing the path on the client side? (o3fs/ofs). Would it make any problems?

Thank You @elek for the test.

Uploaded design doc has provided reasoning to perform on the OM side.

And also when used via ofs/o3fs, it uses path Object, and paths are normalized already.

bharatviswa504 · 2020-08-26T21:10:47Z

Thanks for the patch @bharatviswa504, I would suggest adding a related robot test, too, which would make it possible to test the AWS compatibility of the change.

What do you think?

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

elek · 2020-08-27T11:59:06Z

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

bharatviswa504 · 2020-08-27T16:55:39Z

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

When this flag is enabled, as the config explains, we consider paths as unix paths and provide file system semantics. So when this is enabled we cannot gurantee 100% AWS compatibility. As said we will make this bucket level config to provide ease to users to have both on a cluster.

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

Currently, the test is simulated in the Integration test. As when this flag is enabled user has chosen to consider key as unix paths. I will add a test suite in HDDS-4154. For now, all tests are in integration tests.

  <property>
    <name>ozone.om.enable.filesystem.paths</name>
    <tag>OZONE, OM</tag>
    <value>false</value>
    <description>If true, key names will be interpreted as file system paths.
      "/" will be treated as a special character and paths will be normalized
      and must follow Unix filesystem path naming conventions. This flag will
      be helpful when objects created by S3G need to be accessed using OFS/O3Fs.
      If false, it will fallback to default behavior of Key/MPU create
      requests where key paths are not normalized and any intermediate
      directories will not be created or any file checks happens to check
      filesystem semantics.
    </description>
  </property>

So, test/file/ will be normalized to test/file. This PR provides user ability to read with the same file name user has used during the creation of key "test/file/".

bharatviswa504 · 2020-08-31T17:01:26Z

@elek Any more comments?

elek · 2020-09-01T15:06:54Z

@elek Any more comments?

Looks like we don't have the consensus, yet, in HDDS-4097. It was discussed yesterday during the community meeting, and I felt that everybody agree more or less, but I am not sure if @arp7 fully agrees.

arp7 · 2020-09-01T15:14:35Z

What is your opinion about AWS compatibility issues?

What exactly is the compatibility issue? If the paths are not interpreted (default behavior), then there is full AWS compatibility.

bharatviswa504 · 2020-09-01T15:45:11Z

@elek
There is no issue with this, and I believe we have agreed on the part we need this like 100% HCFS with few changes to AWS S3 semantics. I don't see what is the problem in moving forward?.

As the other argument is when the flag is disabled to support compromised HCFS and 100% AWS.

Let me know if you have any concerns about this PR.

elek · 2020-09-02T12:34:12Z

I uploaded my proposal to https://issues.apache.org/jira/browse/HDDS-4097, please let me know your opinion.

I think the normalization should be optional and should be driven by a configuration which is independent of the intermediate directory creation (ozone.om.enable.filesystem.paths).

While intermediate directories are a must-have for providing services for both S3 and HCFS, normalization seems to be an optional (but very useful) feature.

bharatviswa504 · 2020-09-14T21:38:39Z

I have rebased this PR.

cc @elek @arp7

elek

+1 LGTM

Based on my understanding this is exactly what we discussed: normalization is turned on when OZONE_OM_ENABLE_FILESYSTEM_PATHS_DEFAULT is enabled, but not in any other cases.

* master: HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)

* HDDS-4122-remove-code-consolidation: (21 commits) Restore files that had deduplicated code from master Revert other delete request/response files back to their original states on master HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) ...

bharatviswa504 mentioned this pull request Aug 13, 2020

HDDS-4102. Normalize Keypath for lookupKey. #1315

Closed

bharatviswa504 requested review from rakeshadr, arp7, mukul1987 and avijayanhwx August 13, 2020 17:35

bharatviswa504 force-pushed the HDDS-4102 branch from 6c58200 to 3cecfc1 Compare August 13, 2020 19:16

bharatviswa504 mentioned this pull request Aug 13, 2020

HDDS-4117. Normalize Keypath for listKeys. #1331

Closed

HDDS-4102. Normalize Keypath for lookupKey.

8147e54

bharatviswa504 force-pushed the HDDS-4102 branch from 8371c31 to 8147e54 Compare September 14, 2020 21:36

bharatviswa504 requested a review from elek September 14, 2020 21:39

elek approved these changes Sep 28, 2020

View reviewed changes

elek merged commit 004dd3f into apache:master Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-4102. Normalize Keypath for lookupKey. #1328

HDDS-4102. Normalize Keypath for lookupKey. #1328

bharatviswa504 commented Aug 13, 2020

elek commented Aug 26, 2020

elek commented Aug 26, 2020

bharatviswa504 commented Aug 26, 2020 •

edited

bharatviswa504 commented Aug 26, 2020 •

edited

elek commented Aug 27, 2020

bharatviswa504 commented Aug 27, 2020 •

edited

bharatviswa504 commented Aug 31, 2020

elek commented Sep 1, 2020

arp7 commented Sep 1, 2020

bharatviswa504 commented Sep 1, 2020 •

edited

elek commented Sep 2, 2020

bharatviswa504 commented Sep 14, 2020

elek left a comment

HDDS-4102. Normalize Keypath for lookupKey. #1328

HDDS-4102. Normalize Keypath for lookupKey. #1328

Conversation

bharatviswa504 commented Aug 13, 2020

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

elek commented Aug 26, 2020

elek commented Aug 26, 2020

bharatviswa504 commented Aug 26, 2020 • edited

bharatviswa504 commented Aug 26, 2020 • edited

elek commented Aug 27, 2020

bharatviswa504 commented Aug 27, 2020 • edited

bharatviswa504 commented Aug 31, 2020

elek commented Sep 1, 2020

arp7 commented Sep 1, 2020

bharatviswa504 commented Sep 1, 2020 • edited

elek commented Sep 2, 2020

bharatviswa504 commented Sep 14, 2020

elek left a comment

Choose a reason for hiding this comment

bharatviswa504 commented Aug 26, 2020 •

edited

bharatviswa504 commented Aug 26, 2020 •

edited

bharatviswa504 commented Aug 27, 2020 •

edited

bharatviswa504 commented Sep 1, 2020 •

edited