You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the official project plan tracking the work to refactor Delta's LogStore classes to a new artifact delta-storage, and in Java (instead of Scala). The Delta LogStore is a general interface for all critical file system operations required to read and write the Delta log.
There are a variety of reasons for this initiative.
Reduce code duplication. Currently, both the Delta Lake OSS and Delta Standalone libraries require access to this interface. However, without any separate LogStore artifact to depend on, any implementation needs to be duplicated accross both of these repos. We'd like to avoid that.
Remove the Apache Spark™ dependency. Currently, the LogStore interface that the delta-core and delta-contribs artifacts use is contained within delta-core. This means any downstream dependencies will inherintely have to depend on Spark. As Delta Standalone is distinctly Spark-less, the current dependency hierarchy won't work.
No redundant Scala cross publishing. These LogStore implementations don't use any fancy Scala language features, and by re-writing the relatively lighweight implementations in Java we can avoid the various headaches and overhead that supporting a cross-published Scala artifact can bring.
This will enable us to support new lightweight and specific LogStore artifacts in the future. For example, for our goal to support S3 multi-cluster writes, we aim to have the DynamoDBLogStore (with its unique AWS SDK dependency) as its own artifact. This ensures that the specific AWS dependency isn't brought into other artifacts (e.g. delta-contribs).
How to Contribute
For any of the LogStores below, please comment on the issue letting us know you'd like to work on it.
Leave the Scala file alone for now, and create the corresponding Java file inside of storage/src/main/java/io/delta/storage. Refactor the LogStore here.
Add a new test suite to core/src/test/scala/org/apache/spark/sql/delta/LogStoreSuite.scala, much like PublicHDFSLogStoreSuite.
Overview and Requirements
Hi everyone - help is wanted!
This is the official project plan tracking the work to refactor Delta's
LogStore
classes to a new artifactdelta-storage
, and in Java (instead of Scala). The DeltaLogStore
is a general interface for all critical file system operations required to read and write the Delta log.There are a variety of reasons for this initiative.
LogStore
artifact to depend on, any implementation needs to be duplicated accross both of these repos. We'd like to avoid that.LogStore
interface that thedelta-core
anddelta-contribs
artifacts use is contained withindelta-core
. This means any downstream dependencies will inherintely have to depend on Spark. As Delta Standalone is distinctly Spark-less, the current dependency hierarchy won't work.LogStore
implementations don't use any fancy Scala language features, and by re-writing the relatively lighweight implementations in Java we can avoid the various headaches and overhead that supporting a cross-published Scala artifact can bring.LogStore
artifacts in the future. For example, for our goal to support S3 multi-cluster writes, we aim to have the DynamoDBLogStore (with its unique AWS SDK dependency) as its own artifact. This ensures that the specific AWS dependency isn't brought into other artifacts (e.g.delta-contribs
).How to Contribute
LogStore
s below, please comment on the issue letting us know you'd like to work on it.storage/src/main/java/io/delta/storage
. Refactor theLogStore
here.core/src/test/scala/org/apache/spark/sql/delta/LogStoreSuite.scala
, much likePublicHDFSLogStoreSuite
.Project Status
The text was updated successfully, but these errors were encountered: