Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4102. Normalize Keypath for lookupKey. #1328

Merged
merged 1 commit into from Sep 28, 2020

Conversation

bharatviswa504
Copy link
Contributor

What changes were proposed in this pull request?

When ozone.om.enable.filesystem.paths, OM normalizes path, and stores the Keyname.

Now when user tries to read the file from S3 using the keyName which user has used to create the Key, it will return error KEY_NOT_FOUND

The issue is, lookupKey also need to normalize path, when ozone.om.enable.filesystem.paths is enabled. This is a common API used by S3/FS.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4102

How was this patch tested?

Added a test.

@elek
Copy link
Member

elek commented Aug 26, 2020

Thanks for the patch @bharatviswa504, I would suggest adding a related robot test, too, which would make it possible to test the AWS compatibility of the change.

What do you think?

@elek
Copy link
Member

elek commented Aug 26, 2020

Tested and file/ and file are different:

$ aws s3api get-object --bucket=ozonetest --key=test1/file /tmp/www
{
    "AcceptRanges": "bytes",
    "LastModified": "Wed, 26 Aug 2020 14:24:06 GMT",
    "ContentLength": 79,
    "ETag": "\"6fcb0dbe8f0ef330b38f81a6a6a7a66e\"",
    "ContentType": "text/markdown",
    "Metadata": {}
}
aws s3api get-object --bucket=ozonetest --key=test1/file/ /tmp/www

An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Just an idea: what do you think about normalizing the path on the client side? (o3fs/ofs). Would it make any problems?

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Aug 26, 2020

Tested and file/ and file are different:

$ aws s3api get-object --bucket=ozonetest --key=test1/file /tmp/www
{
    "AcceptRanges": "bytes",
    "LastModified": "Wed, 26 Aug 2020 14:24:06 GMT",
    "ContentLength": 79,
    "ETag": "\"6fcb0dbe8f0ef330b38f81a6a6a7a66e\"",
    "ContentType": "text/markdown",
    "Metadata": {}
}
aws s3api get-object --bucket=ozonetest --key=test1/file/ /tmp/www

An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Just an idea: what do you think about normalizing the path on the client side? (o3fs/ofs). Would it make any problems?

Thank You @elek for the test.

Uploaded design doc has provided reasoning to perform on the OM side.

image

And also when used via ofs/o3fs, it uses path Object, and paths are normalized already.

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Aug 26, 2020

Thanks for the patch @bharatviswa504, I would suggest adding a related robot test, too, which would make it possible to test the AWS compatibility of the change.

What do you think?

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

@elek
Copy link
Member

elek commented Aug 27, 2020

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Aug 27, 2020

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

When this flag is enabled, as the config explains, we consider paths as unix paths and provide file system semantics. So when this is enabled we cannot gurantee 100% AWS compatibility. As said we will make this bucket level config to provide ease to users to have both on a cluster.

Opened a jira HDDS-4154 to test S3 test suite with ozone.om.enable.filesystem.paths enabled.

Thanks. I think we need additional s3 tests as the behavior what I tested above (and failing) is not part of the current test suite.

Uploaded design doc has provided reasoning to perform on the OM side.

Thanks to share it. Now I understood the motivation. But what is your opinion about the AWS compatibility? For me it seems to be a bigger problem compared to having the validation code in one at the OM or one place at the client side.

What is your opinion about AWS compatibility issues?

Currently, the test is simulated in the Integration test. As when this flag is enabled user has chosen to consider key as unix paths. I will add a test suite in HDDS-4154. For now, all tests are in integration tests.

  <property>
    <name>ozone.om.enable.filesystem.paths</name>
    <tag>OZONE, OM</tag>
    <value>false</value>
    <description>If true, key names will be interpreted as file system paths.
      "/" will be treated as a special character and paths will be normalized
      and must follow Unix filesystem path naming conventions. This flag will
      be helpful when objects created by S3G need to be accessed using OFS/O3Fs.
      If false, it will fallback to default behavior of Key/MPU create
      requests where key paths are not normalized and any intermediate
      directories will not be created or any file checks happens to check
      filesystem semantics.
    </description>
  </property>

So, test/file/ will be normalized to test/file. This PR provides user ability to read with the same file name user has used during the creation of key "test/file/".

@bharatviswa504
Copy link
Contributor Author

@elek Any more comments?

@elek
Copy link
Member

elek commented Sep 1, 2020

@elek Any more comments?

Looks like we don't have the consensus, yet, in HDDS-4097. It was discussed yesterday during the community meeting, and I felt that everybody agree more or less, but I am not sure if @arp7 fully agrees.

@arp7
Copy link
Contributor

arp7 commented Sep 1, 2020

What is your opinion about AWS compatibility issues?

What exactly is the compatibility issue? If the paths are not interpreted (default behavior), then there is full AWS compatibility.

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Sep 1, 2020

@elek
There is no issue with this, and I believe we have agreed on the part we need this like 100% HCFS with few changes to AWS S3 semantics. I don't see what is the problem in moving forward?.

As the other argument is when the flag is disabled to support compromised HCFS and 100% AWS.

Let me know if you have any concerns about this PR.

@elek
Copy link
Member

elek commented Sep 2, 2020

I uploaded my proposal to https://issues.apache.org/jira/browse/HDDS-4097, please let me know your opinion.

I think the normalization should be optional and should be driven by a configuration which is independent of the intermediate directory creation (ozone.om.enable.filesystem.paths).

While intermediate directories are a must-have for providing services for both S3 and HCFS, normalization seems to be an optional (but very useful) feature.

@bharatviswa504
Copy link
Contributor Author

I have rebased this PR.

cc @elek @arp7

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

Based on my understanding this is exactly what we discussed: normalization is turned on when OZONE_OM_ENABLE_FILESYSTEM_PATHS_DEFAULT is enabled, but not in any other cases.

@elek elek merged commit 004dd3f into apache:master Sep 28, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Sep 28, 2020
* master:
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
errose28 added a commit to errose28/ozone that referenced this pull request Sep 28, 2020
* HDDS-4122-remove-code-consolidation: (21 commits)
  Restore files that had deduplicated code from master
  Revert other delete request/response files back to their original states on master
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants