v0.0.1
Alpha release of Analytics Accelerator Library for Amazon S3, an open source library that accelerates data access to S3 for client applications, lowering processing times and compute costs for your data analytics workloads. See README for further details.
Note: This is an Alpha release and should not be used in production.
Please see GitHub for known issues.
What's Changed
- Initial commit by @CsengerG in #1
- Add simple CI building with ./gradlew build by @CsengerG in #2
- Add code coverage verification to builds by @CsengerG in #3
- Build: Streamlined dependency and plugin management by @oleg-lvovitch-aws in #4
- Add spotless checks by @CsengerG in #13
- Initial implementation of object client. by @ahmarsuhail in #14
- Implement very first version of seekable stream by @CsengerG in #15
- Eliminate open-range GET requests by @CsengerG in #16
- Adds in jar task by @ahmarsuhail in #17
- Implement first version of JMH microbenchmarks by @CsengerG in #18
- Implements read(buf[], offset, len) by @ahmarsuhail in #19
- Implement readTail() by @CsengerG in #20
- Make client singleton, add a sensible readAhead. by @ahmarsuhail in #21
- Fixes off by one error. by @ahmarsuhail in #23
- Start of the new configuration approach, with opportunistic changes required to support it by @oleg-lvovitch-aws in #25
- New
commonmodule by @oleg-lvovitch-aws in #26 - Factor implementation into Logical and Physical IO layers by @CsengerG in #27
- Github action to build and upload s3 seekable stream jars to s3 bucket by @radhisat in #24
- Fixing the command to copy jars to s3 in GitHub actions by @radhisat in #28
- Extend GitHub workflow to upload JARs for S3FileIO by @CsengerG in #31
- Fix copy-paste typo in build-upload script by @CsengerG in #33
- Implements initial parquet parsing. by @ahmarsuhail in #32
- Fix reference tests by @CsengerG in #30
- Simplify local development of integrations by @CsengerG in #29
- Adds in initial logic for build column maps. by @ahmarsuhail in #37
- [Preview] Implementing footer caching with introducing single cache option. by @IsaevIlya in #35
- Prefetching fixes and improvements by @radhisat in #41
- Add property based testing by @CsengerG in #34
- Moves to byte array instead of ByteBuffer. by @ahmarsuhail in #44
- This change fixes a bug in creating a prefetch block by @radhisat in #43
- Adding new logic to send user-agent in S3 Requests by @fuatbasik in #42
- Extending unit test coverage and fixing race condition by @IsaevIlya in #45
- Parquet aware prefetching. by @ahmarsuhail in #39
- Prefetch recent columns. by @ahmarsuhail in #46
- Add extra logs and throw Exception when seeing wrong range. by @IsaevIlya in #47
- Replace sleep call with waiting on task for caching data by @IsaevIlya in #49
- Handles multi row grow group parquet files by @ahmarsuhail in #48
- Adds in prefetch details to referrer header. by @ahmarsuhail in #50
- Update instructions on how to run micro-benchmarks by @CsengerG in #51
- Supporting nested parquet schema by @radhisat in #52
- Use S3CrtAsyncClient to prevent writing time out by @IsaevIlya in #54
- Shades parquet-format dependency. by @ahmarsuhail in #55
- Optimize block reading by @IsaevIlya in #53
- Updates logs to the right levels. by @ahmarsuhail in #57
- Adds some debug logs. by @ahmarsuhail in #59
- Fixing reference tests by @radhisat in #60
- [Refactor] Move Parquet awareness out of the PhysicalIO layer by @CsengerG in #56
- Adds in minimum confidence ratio to prevent over reading. by @ahmarsuhail in #61
- Parquet metadata parsing improvements by @radhisat in #62
- Prevent overreading per schema. by @ahmarsuhail in #63
- Reverting parquet metadata improvements by @radhisat in #64
- Implement sequential prefetching in PhysicalIO by @CsengerG in #65
- [Fix] Resolve dependency conflict in uber JAR by @CsengerG in #67
- Update part size to 8MB by @ahmarsuhail in #68
- Prefetch lengths were asking for one extra byte by @ahmarsuhail in #69
- Add Configuration Modification by @fuatbasik in #70
- First cut at Telemetry and initial instrumentation by @oleg-lvovitch-aws in #74
- Fixed the
IOplan.toStringNSE and changed the telemetry to turn off the console by @oleg-lvovitch-aws in #76 - Telemetry: added a concept of
leveland reduced noisiness by @oleg-lvovitch-aws in #77 - Further telemetry updates to manage verbosity and control perf by @oleg-lvovitch-aws in #78
- Added SpotBugs coverage and fixed the codebase by @oleg-lvovitch-aws in #82
- Refactoring: Move all
Object-Clientmodeling tocommonand get rid of duplicateRange. by @oleg-lvovitch-aws in #87 - Remove IOUtils dependency by @CsengerG in #75
- Set appropriate groupId by @CsengerG in #91
- Change version from 1.0.0 to 0.0.1 by @CsengerG in #92
- Telemetry refactoring for
input-streamby @oleg-lvovitch-aws in #90 - Adds in a Default logical IO to be used for all non parquet objects. by @ahmarsuhail in #85
- Added lifetime management/
flushtoTelemetryto set up support for reporter that allocate resources by @oleg-lvovitch-aws in #94 - Produce downstream S3A and S3FileIO artifacts to be used by benchmarks. by @shintaroonuma in #95
- Fix iceberg spark runtime jar path by @shintaroonuma in #97
- Telemetry for ObjectClient and related minor refactoring by @oleg-lvovitch-aws in #96
- Lookup iceberg spark runtime jar path by @shintaroonuma in #99
- Build all artifacts before uploading to S3 by @shintaroonuma in #100
- Rename cicd workflow iceberg artifact by @shintaroonuma in #102
- Telemetry: added support for metrics and simple aggregations by @oleg-lvovitch-aws in #103
- ConnectorConfiguration enhancements by @oleg-lvovitch-aws in #104
- Prevents over reading of columns. by @ahmarsuhail in #101
- Renames ParquetMetadataStore to ParquetColumnPrefetchStore. by @ahmarsuhail in #105
- S3SdkObjectClient: better lifetime controls on
close()by @oleg-lvovitch-aws in #106 - Enabled Java
-XLintwarnings and made them errors. by @oleg-lvovitch-aws in #107 - Gradle test-logger plugin integration by @oleg-lvovitch-aws in #108
- Expose sequential prefetching constants as configurable by @CsengerG in #109
- Prepare repo for public by adding dependabot and PR/Issue Templates by @fuatbasik in #110
- Bump io.freefair.lombok from 8.6 to 8.10 by @dependabot in #116
- Bump net.jqwik:jqwik from 1.8.5 to 1.9.1 by @dependabot in #114
- Bump org.testcontainers:testcontainers from 1.16.2 to 1.20.2 by @dependabot in #118
- Bump aws-actions/configure-aws-credentials from 1.7.0 to 4.0.2 by @dependabot in #113
- Bump webfactory/ssh-agent from 0.7.0 to 0.9.0 by @dependabot in #115
- Bump com.adobe.testing:s3mock-testcontainers from 3.6.0 to 3.11.0 by @dependabot in #122
- Bump io.freefair.lombok from 8.10 to 8.10.2 by @dependabot in #120
- Adds in new Row_Group prefetch mode. by @ahmarsuhail in #119
- Remove asana links by @ahmarsuhail in #125
- [Bugfix] Sequential prefetching should not shrink ranges by @CsengerG in #126
- Integration tests and benchmarking refresh. by @oleg-lvovitch-aws in #127
- Add THIRD-PARTY-NOTICES.txt generation logic by @fuatbasik in #131
- Addressed feedback for the Benchmarking/Integration tests PR by @oleg-lvovitch-aws in #132
- Uses SLF4J. by @ahmarsuhail in #128
- Added the changelog and updated the PR template by @oleg-lvovitch-aws in #133
- Apply License Headers Requirements by @fuatbasik in #134
- Add THIRD-PARTY-NOTICES by @fuatbasik in #135
- Changed the package name to
software.amazon.s3.dataacceleratorby @oleg-lvovitch-aws in #136 - Updates READMe by @ahmarsuhail in #138
- Various Document updates by @fuatbasik in #137
- Update Iceberg integration in CI/CD by @CsengerG in #142
- Launch integration tests on PRs and pushes by @oleg-lvovitch-aws in #147
- Change default of sequential prefetch base to 2.0 by @CsengerG in #144
- Track columns in merged ranges. by @ahmarsuhail in #153
- [Pure refactor] Add
TelemetryFormatby @CsengerG in #154 - Add a JSON
TelemetryFormatimplementation by @CsengerG in #155 - Add telemetry to logical and physical reads by @CsengerG in #157
- Minor telemetry fixes by @oleg-lvovitch-aws in #159
- Suggestsed edits to README by @aws-docs-suej in #143
- Fix null pointer exception by @ahmarsuhail in #161
- Refactor package and module name by @matthaddaws in #160
- Change Project Name from root gradle settings by @fuatbasik in #167
- Remove mentions of old repository name by @CsengerG in #168
- Updated README.md by @aws-docs-suej in #171
- update readme by @ahmarsuhail in #172
- Add task publishing to Maven by @matthaddaws in #163
- Make signing optional for publishToMavenLocal by @matthaddaws in #174
- Trim Uber Jar to remove awssdk and slf4j and netty dependencies. by @fuatbasik in #169
- Update README.md by @aws-docs-suej in #177
- Minor fixes to README by @fuatbasik in #178
New Contributors
- @CsengerG made their first contribution in #1
- @oleg-lvovitch-aws made their first contribution in #4
- @ahmarsuhail made their first contribution in #14
- @radhisat made their first contribution in #24
- @IsaevIlya made their first contribution in #35
- @fuatbasik made their first contribution in #42
- @shintaroonuma made their first contribution in #95
- @dependabot made their first contribution in #116
- @aws-docs-suej made their first contribution in #143
- @matthaddaws made their first contribution in #160
Full Changelog: https://github.com/awslabs/analytics-accelerator-s3/commits/v0.0.1