-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
good first issueGood for newcomersGood for newcomers
Milestone
Description
Overview and Requirements
Hi everyone - help is wanted!
This is the official project plan tracking the work to refactor Delta's LogStore classes to a new artifact delta-storage, and in Java (instead of Scala). The Delta LogStore is a general interface for all critical file system operations required to read and write the Delta log.
There are a variety of reasons for this initiative.
- Reduce code duplication. Currently, both the Delta Lake OSS and Delta Standalone libraries require access to this interface. However, without any separate
LogStoreartifact to depend on, any implementation needs to be duplicated accross both of these repos. We'd like to avoid that. - Remove the Apache Spark™ dependency. Currently, the
LogStoreinterface that thedelta-coreanddelta-contribsartifacts use is contained withindelta-core. This means any downstream dependencies will inherintely have to depend on Spark. As Delta Standalone is distinctly Spark-less, the current dependency hierarchy won't work. - No redundant Scala cross publishing. These
LogStoreimplementations don't use any fancy Scala language features, and by re-writing the relatively lighweight implementations in Java we can avoid the various headaches and overhead that supporting a cross-published Scala artifact can bring. - This will enable us to support new lightweight and specific
LogStoreartifacts in the future. For example, for our goal to support S3 multi-cluster writes, we aim to have the DynamoDBLogStore (with its unique AWS SDK dependency) as its own artifact. This ensures that the specific AWS dependency isn't brought into other artifacts (e.g.delta-contribs).
How to Contribute
- For any of the
LogStores below, please comment on the issue letting us know you'd like to work on it. - Leave the Scala file alone for now, and create the corresponding Java file inside of
storage/src/main/java/io/delta/storage. Refactor theLogStorehere. - Add a new test suite to
core/src/test/scala/org/apache/spark/sql/delta/LogStoreSuite.scala, much likePublicHDFSLogStoreSuite. - Submit your PR for review.
- See this PR as an example.
Project Status
| LogStore | Issue | PR | Status |
|---|---|---|---|
| Initial setup. | N/A | #925 | DONE |
| HadoopFileSystemLogStore and HDFSLogStore | N/A | #933 | DONE |
| S3SingleDriverLogStore | #952 | #995 | DONE |
| AzureLogStore | #953 | #1003 | DONE |
| DelegatingLogStore | #954 | #1041 | DONE |
| LocalLogStore | #955 | #1002 | DONE |
| GCSLogStore | #956 | #1024 | DONE |
| S3DynamoDBLogStore | #339 | #1023 | DONE |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers