Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable bucket cache in v2 s3 proxy #17022

Merged
merged 17 commits into from Apr 18, 2023
Merged

Conversation

007DXR
Copy link
Contributor

@007DXR 007DXR commented Mar 7, 2023

What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR #16806.

Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency to deal with requests .

Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default.

@007DXR
Copy link
Contributor Author

007DXR commented Mar 7, 2023

@Jackson-Wang-7 @lucyge2022 could you please take a look?

add `overwrite` and `checkpath` tags to `copyFilePOptions` in `CopyObjectTask()`
@007DXR
Copy link
Contributor Author

007DXR commented Mar 12, 2023

@lucyge2022 This PR introduces v2 s3 proxy which relates to yours. Could you review it?

.expireAfterWrite(
Configuration.global().getMs(PropertyKey.PROXY_S3_BUCKETPATHCACHE_TIMEOUT_MS),
TimeUnit.MILLISECONDS)
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keep boolean as value? if u delete the bucket why can't just remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?

Copy link
Contributor Author

@007DXR 007DXR Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the first question, BUCKET_PATH_CACHE is a cache and there is a nearly perfect implement of cache in Alluxio while this cache is map type. So I set BUCKET_PATH_CACHE as a map type as well. boolean is most suitable as the second key of this map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the second question, S3RestUtils.checkPathIsAlluxioDirectory calls master with arg AlluxioURI(bucketPath) to check. so AlluxioURI(bucketPath) is better than bucketPath to be the first key of BUCKET_PATH_CACHE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the first question, BUCKET_PATH_CACHE is a cache and there is a nearly perfect implement of cache in Alluxio while this cache is map type. So I set BUCKET_PATH_CACHE as a map type as well. boolean is most suitable as the second key of this map.

isn't this guava cache instead of alluxio impl?

for the second question, S3RestUtils.checkPathIsAlluxioDirectory calls master with arg AlluxioURI(bucketPath) to check. so AlluxioURI(bucketPath) is better than bucketPath to be the first key of BUCKET_PATH_CACHE.

but in every call the use of S3RestUtils.checkPathIsAlluxioDirectory also uses bucketPath, so now its 1)conversion of bucketpath to alluxiouri 2) check cache with it 3) if not in cache checkPathIsAlluxioDirectory with bucketpath again and another conversion to alluxiouri and call master? u see my point? since from s3 api layer its already using string type bucketpath, why not cache the bucketpath str only?

Copy link
Contributor Author

@007DXR 007DXR Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this guava cache instead of alluxio impl?

It's guava cache.

also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?

I changed the first key of bucketPathCache to str type. Is the change right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u delete the bucket why can't just remove it?

if we introduce negative cache in the future, setting false will work different from removing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?

I changed the first key of bucketPathCache to str type. Is the change right?

yeah str is better to avoid multiple str<->alluxiouri conversion, just want to make sure that u r aware of the choices : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u delete the bucket why can't just remove it?

if we introduce negative cache in the future, setting false will work different from removing it.

Actually here in current way of using cache in checkPathIsAlluxioDirectory, the cache is not treated as negative cache, AKA its not acting back directly from the cache if 1)cache hit and 2) value is negative, it still went ahead to check master instead of throwing bucket-not-exist exception. And regarding proxy should stay stateless at all times, in order to prevent stale entires (e.g. delete bucket from this proxy but create same bucket on another proxy) , we should never use a negative cache here. So I would suggest : 1) keep the current way of check on master on cache miss 2) remove the boolean if u can find a way with guava cache or internal impl with a expiration, 3) or keep the boolean still but give it a comment of this boolean is simply a placeholder Does that make sense to u?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep the boolean still but give it a comment of the usage of boolean

@@ -733,6 +739,7 @@ public Response continueTask() {
.setWriteType(S3RestUtils.getS3WriteType())
.putAllXattr(xattrMap).setXattrPropStrat(XAttrPropagationStrategy.LEAF_NODE)
.setOverwrite(true)
.setCheckS3BucketPath(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this check needed if we have the checkPathIsAlluxioDirectory step? it will anyway prevent any s3 create obj to create bucket folder in the inode tree is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master didn't throw exception when the bucket doesn't exist without setCheckS3BucketPath(true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but with checkPathIsAlluxioDirectory in front of all createobj involved call no one can create a file without creating the bucket folder first isnt it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. someone create bucket on this proxy , delete same bucket from another proxy. if we create obj on this proxy after that,checkPathIsAlluxioDirectory tells the bucket exists . but it's wrong. so setCheckS3BucketPath(true) is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucyge2022 can this PR be merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood your point, actually in this case I have another question, in this case, would other calls gets a incorrect error response, say I delete the bucket from another proxy while this proxy hasn't expire the bucket path, if I issue a call like putobjecttagging to this proxy, would it become a internal error or NoSuchKey bcos of filenotexistexception from master instead of NoSuchBucket error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. in this case proxy throws NoSuchKey exception. We can consider NoSuchBucket shows more details than NoSuchKey. but NoSuchKey is not wrong.

@007DXR 007DXR closed this Mar 16, 2023
@007DXR 007DXR reopened this Mar 16, 2023
@alluxio-bot alluxio-bot added the API Change Changes covering public API label Apr 13, 2023
@007DXR
Copy link
Contributor Author

007DXR commented Apr 14, 2023

@lucyge2022 I disabled the bucket path cache by default and added some description to PropertyKey.java file. Is it OK?

+ "Set 0min to disable the cache.")
+ "Set 0min to disable the cache. If enabling the cache, "
+ "be careful that Alluxio S3 API will behave differently from AWS S3 API"
+ " when coming an illegal request.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be "be careful that Alluxio S3 API will behave differently from AWS S3 API if bucketpath cache entries becomes stale"

@lucyge2022
Copy link
Contributor

some minor comment, otherwise LGTM. thanks

@007DXR
Copy link
Contributor Author

007DXR commented Apr 18, 2023

@Jackson-Wang-7

@jiacheliu3 jiacheliu3 added the type-feature This issue is a feature request label Apr 18, 2023
@jiacheliu3
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 46270b8 into Alluxio:master Apr 18, 2023
19 checks passed
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 18, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 19, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 19, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 19, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 19, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
zhezhidashi pushed a commit to zhezhidashi/alluxio-zhaozihao that referenced this pull request May 23, 2023
### What changes are proposed in this pull request?

Create 'BUCKET_CACHE' in  v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806.

### Why are the changes needed?

To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency  to deal with requests .

### Does this PR introduce any user facing changes?

If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API  when coming an illegal request. This bucket path cache is swithed off by default.

			pr-link: Alluxio#17022
			change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Change Changes covering public API area-s3-rest-api type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants