-
Notifications
You must be signed in to change notification settings - Fork 111
TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK. #953
Conversation
Here is my benchmark results as follows. Configuration
Contents summary time
|
Removed TAJO-2063(#952) dependency. |
I wonder why the time taken by getTotalSize() is not proportional to the number of directories. It shows faster speed for more directories sometimes. |
There may be various reasons : local network connection, and the health of Amazon's servers, AWS SDK retry mechanism. |
If they are reasons, you can mitigate those overheads by testing several times and averaging the results. |
|
||
<dependency> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-aws</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop-aws is included in 2.6.0 and higher
If you add hadoop-aws, We should discuss hadoop compatibility
@jihoonson @jinossy |
I removed hadoop-aws dependency and added Amazon SDK dependency. |
Here is my second benchmark results as follows.
|
Finished test successfully as following:
|
This PR had been moved to #1024. |
Not yet implemented unit test cases and it depends on TAJO-2063 (#952).