New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable bucket cache in v2 s3 proxy #17022
Conversation
@Jackson-Wang-7 @lucyge2022 could you please take a look? |
add `overwrite` and `checkpath` tags to `copyFilePOptions` in `CopyObjectTask()`
@lucyge2022 This PR introduces v2 s3 proxy which relates to yours. Could you review it? |
.expireAfterWrite( | ||
Configuration.global().getMs(PropertyKey.PROXY_S3_BUCKETPATHCACHE_TIMEOUT_MS), | ||
TimeUnit.MILLISECONDS) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why keep boolean as value? if u delete the bucket why can't just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the first question, BUCKET_PATH_CACHE
is a cache and there is a nearly perfect implement of cache in Alluxio while this cache is map type. So I set BUCKET_PATH_CACHE
as a map type as well. boolean
is most suitable as the second key of this map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the second question, S3RestUtils.checkPathIsAlluxioDirectory
calls master with arg AlluxioURI(bucketPath)
to check. so AlluxioURI(bucketPath)
is better than bucketPath
to be the first key of BUCKET_PATH_CACHE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the first question,
BUCKET_PATH_CACHE
is a cache and there is a nearly perfect implement of cache in Alluxio while this cache is map type. So I setBUCKET_PATH_CACHE
as a map type as well.boolean
is most suitable as the second key of this map.
isn't this guava cache instead of alluxio impl?
for the second question,
S3RestUtils.checkPathIsAlluxioDirectory
calls master with argAlluxioURI(bucketPath)
to check. soAlluxioURI(bucketPath)
is better thanbucketPath
to be the first key ofBUCKET_PATH_CACHE
.
but in every call the use of S3RestUtils.checkPathIsAlluxioDirectory
also uses bucketPath, so now its 1)conversion of bucketpath to alluxiouri 2) check cache with it 3) if not in cache checkPathIsAlluxioDirectory with bucketpath again and another conversion to alluxiouri and call master? u see my point? since from s3 api layer its already using string type bucketpath, why not cache the bucketpath str only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this guava cache instead of alluxio impl?
It's guava cache.
also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?
I changed the first key of bucketPathCache to str type. Is the change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u delete the bucket why can't just remove it?
if we introduce negative cache in the future, setting false
will work different from removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also why keep alluxiouri as key? in S3RestUtils.checkPathIsAlluxioDirectory we anyways are using bucketPath to check right?
I changed the first key of bucketPathCache to str type. Is the change right?
yeah str is better to avoid multiple str<->alluxiouri conversion, just want to make sure that u r aware of the choices : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u delete the bucket why can't just remove it?
if we introduce negative cache in the future, setting
false
will work different from removing it.
Actually here in current way of using cache in checkPathIsAlluxioDirectory, the cache is not treated as negative cache, AKA its not acting back directly from the cache if 1)cache hit and 2) value is negative, it still went ahead to check master instead of throwing bucket-not-exist exception. And regarding proxy should stay stateless at all times, in order to prevent stale entires (e.g. delete bucket from this proxy but create same bucket on another proxy) , we should never use a negative cache here. So I would suggest : 1) keep the current way of check on master on cache miss 2) remove the boolean if u can find a way with guava cache or internal impl with a expiration, 3) or keep the boolean still but give it a comment of this boolean is simply a placeholder Does that make sense to u?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep the boolean still but give it a comment of the usage of boolean
@@ -733,6 +739,7 @@ public Response continueTask() { | |||
.setWriteType(S3RestUtils.getS3WriteType()) | |||
.putAllXattr(xattrMap).setXattrPropStrat(XAttrPropagationStrategy.LEAF_NODE) | |||
.setOverwrite(true) | |||
.setCheckS3BucketPath(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this check needed if we have the checkPathIsAlluxioDirectory step? it will anyway prevent any s3 create obj to create bucket folder in the inode tree is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master didn't throw exception when the bucket doesn't exist without setCheckS3BucketPath(true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but with checkPathIsAlluxioDirectory in front of all createobj involved call no one can create a file without creating the bucket folder first isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. someone create bucket on this proxy , delete same bucket from another proxy. if we create obj on this proxy after that,checkPathIsAlluxioDirectory
tells the bucket exists . but it's wrong. so setCheckS3BucketPath(true)
is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucyge2022 can this PR be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood your point, actually in this case I have another question, in this case, would other calls gets a incorrect error response, say I delete the bucket from another proxy while this proxy hasn't expire the bucket path, if I issue a call like putobjecttagging to this proxy, would it become a internal error or NoSuchKey bcos of filenotexistexception from master instead of NoSuchBucket error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. in this case proxy throws NoSuchKey
exception. We can consider NoSuchBucket
shows more details than NoSuchKey
. but NoSuchKey
is not wrong.
@lucyge2022 I disabled the bucket path cache by default and added some description to PropertyKey.java file. Is it OK? |
+ "Set 0min to disable the cache.") | ||
+ "Set 0min to disable the cache. If enabling the cache, " | ||
+ "be careful that Alluxio S3 API will behave differently from AWS S3 API" | ||
+ " when coming an illegal request.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be "be careful that Alluxio S3 API will behave differently from AWS S3 API if bucketpath cache entries becomes stale"
some minor comment, otherwise LGTM. thanks |
alluxio-bot, merge this please |
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
### What changes are proposed in this pull request? Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR Alluxio#16806. ### Why are the changes needed? To keep the consistence between v2 s3 proxy and v1 s3 proxy. To speed up the Alluxio proxy efficency to deal with requests . ### Does this PR introduce any user facing changes? If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default. pr-link: Alluxio#17022 change-id: cid-a6ae6484aab704bc39ce3a02d22d70bd633cc6a6
What changes are proposed in this pull request?
Create 'BUCKET_CACHE' in v2 s3 proxy to reduce time cost of checking bucket path in the same way as PR #16806.
Why are the changes needed?
To keep the consistence between v2 s3 proxy and v1 s3 proxy.
To speed up the Alluxio proxy efficency to deal with requests .
Does this PR introduce any user facing changes?
If enabling the cache, alluxio will cache bucket path statistics for specified time period(configured in alluxio-site.properties file). Be careful to use this cache because Alluxio S3 API will behave differently from AWS S3 API when coming an illegal request. This bucket path cache is swithed off by default.