Skip to content

[SPARK-42277][CORE] Use RocksDB for spark.history.store.hybridStore.diskBackend by default#39845

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-42277
Closed

[SPARK-42277][CORE] Use RocksDB for spark.history.store.hybridStore.diskBackend by default#39845
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-42277

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 1, 2023

What changes were proposed in this pull request?

This PR aims to use RocksDB for spark.history.store.hybridStore.diskBackend by default instead of LevelDB.

The last update of LevelDB is about over 10 years.

Why are the changes needed?

RocksDB is maintained well and works on all environments where Apache Spark supports while LevelDB is not maintained at all in these days and doesn't work in some environment like the following.

$ SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=$HOME/data/history" sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/dongjoon/APACHE/spark-merge/logs/spark-dongjoon-org.apache.spark.deploy.history.HistoryServer-1-coffee.local.out
failed to launch: nice -n 0 /Users/dongjoon/APACHE/spark-merge/bin/spark-class org.apache.spark.deploy.history.HistoryServer
        at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
        at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
        at org.apache.spark.util.kvstore.LevelDB.<init>(LevelDB.java:88)
        at org.apache.spark.status.KVUtils$.open(KVUtils.scala:98)
        at org.apache.spark.status.KVUtils$.$anonfun$createKVStore$1(KVUtils.scala:144)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.status.KVUtils$.createKVStore(KVUtils.scala:127)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:136)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
        ... 7 more

Does this PR introduce any user-facing change?

Since spark.history.store.hybridStore.enabled is false by default. There is no behavior change.
In case of spark.history.store.hybridStore.enabled=true, History Server will rebuild the new hybrid store simply.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun
Copy link
Member Author

cc @mridulm and @tgravescs

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang .

Could you review this too, @viirya and @sunchao ?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you, @sunchao !

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

dongjoon-hyun added a commit that referenced this pull request Feb 1, 2023
…diskBackend` by default

### What changes were proposed in this pull request?

This PR aims to use `RocksDB` for `spark.history.store.hybridStore.diskBackend` by default instead of `LevelDB`.

The last update of `LevelDB` is about over 10 years.
- `all except aarch64`: https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8 (Oct 17, 2013)
- `aarch64-only`: https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8 (Aug 28, 2019)

### Why are the changes needed?

RocksDB is maintained well and works on all environments where Apache Spark supports while LevelDB is not maintained at all in these days and doesn't work in some environment like the following.
```
$ SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=$HOME/data/history" sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/dongjoon/APACHE/spark-merge/logs/spark-dongjoon-org.apache.spark.deploy.history.HistoryServer-1-coffee.local.out
failed to launch: nice -n 0 /Users/dongjoon/APACHE/spark-merge/bin/spark-class org.apache.spark.deploy.history.HistoryServer
        at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
        at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
        at org.apache.spark.util.kvstore.LevelDB.<init>(LevelDB.java:88)
        at org.apache.spark.status.KVUtils$.open(KVUtils.scala:98)
        at org.apache.spark.status.KVUtils$.$anonfun$createKVStore$1(KVUtils.scala:144)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.status.KVUtils$.createKVStore(KVUtils.scala:127)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:136)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
        ... 7 more
```

### Does this PR introduce _any_ user-facing change?

Since `spark.history.store.hybridStore.enabled` is `false` by default. There is no behavior change.
In case of `spark.history.store.hybridStore.enabled=true`, History Server will rebuild the new hybrid store simply.

### How was this patch tested?

Pass the CIs.

Closes #39845 from dongjoon-hyun/SPARK-42277.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 39c9945)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member Author

Merged to master/3.4. Thank you all!

@dongjoon-hyun dongjoon-hyun deleted the SPARK-42277 branch February 1, 2023 22:59
@tgravescs
Copy link
Contributor

so if user is upgrading and want to reuse the disk contents, they have to explicitly set to leveldb, correct? might be good to have release notes about this.

@dongjoon-hyun
Copy link
Member Author

Yes, that's right. Thank you, @tgravescs . I added releasenotes tag additionally to SPARK-42277.

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…diskBackend` by default

### What changes were proposed in this pull request?

This PR aims to use `RocksDB` for `spark.history.store.hybridStore.diskBackend` by default instead of `LevelDB`.

The last update of `LevelDB` is about over 10 years.
- `all except aarch64`: https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8 (Oct 17, 2013)
- `aarch64-only`: https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8 (Aug 28, 2019)

### Why are the changes needed?

RocksDB is maintained well and works on all environments where Apache Spark supports while LevelDB is not maintained at all in these days and doesn't work in some environment like the following.
```
$ SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=$HOME/data/history" sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/dongjoon/APACHE/spark-merge/logs/spark-dongjoon-org.apache.spark.deploy.history.HistoryServer-1-coffee.local.out
failed to launch: nice -n 0 /Users/dongjoon/APACHE/spark-merge/bin/spark-class org.apache.spark.deploy.history.HistoryServer
        at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
        at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
        at org.apache.spark.util.kvstore.LevelDB.<init>(LevelDB.java:88)
        at org.apache.spark.status.KVUtils$.open(KVUtils.scala:98)
        at org.apache.spark.status.KVUtils$.$anonfun$createKVStore$1(KVUtils.scala:144)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.status.KVUtils$.createKVStore(KVUtils.scala:127)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:136)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86)
        ... 7 more
```

### Does this PR introduce _any_ user-facing change?

Since `spark.history.store.hybridStore.enabled` is `false` by default. There is no behavior change.
In case of `spark.history.store.hybridStore.enabled=true`, History Server will rebuild the new hybrid store simply.

### How was this patch tested?

Pass the CIs.

Closes apache#39845 from dongjoon-hyun/SPARK-42277.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 39c9945)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants