Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server #39202

Closed
wants to merge 4 commits into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Support configurable(spark.history.store.serializer) serializer for the KVStore in History server. The value can be JSON or PROTOBUF.

Why are the changes needed?

Support the fast and compact protobuf serializer in Spark history server(SHS), so that:

  • Improve the performance of KVStore IO
  • Make it possible for SHS to read the live UI rocksdb instance.

Does this PR introduce any user-facing change?

Yes, introduce a new configuration spark.history.store.serializer for setting the serializer for the KVStore in History server, which can be JSON or PROTOBUF

How was this patch tested?

UT

@gengliangwang
Copy link
Member Author

Copy link
Contributor

@techaddict techaddict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of queries, looks good

@@ -111,7 +122,7 @@ private[spark] object KVUtils extends Logging {
// The default serializer is slow since it is using JSON+GZip encoding.
Some(new KVStoreProtobufSerializer())
} else {
None
Some(serializerForHistoryServer(conf))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: If it is always non empty, change from Option ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just did some refactoring. Please take another look

<td>
Serializer for writing/reading in-memory UI objects to/from disk-based KV Store; JSON or PROTOBUF.
JSON serializer is the only choice before Spark 3.4.0, thus it is the default value.
PROTOBUF serializer is fast and compact, and it is the default serializer for disk-based KV store of live UI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users cannot leverage/access the live db right ? (Unlike history db, which is used across restarts).
Do we want to expose the impl details ? If we change it tomorrow to something else, it should be completely transparent to users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated

@gengliangwang gengliangwang changed the title [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server [WIP][SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server Dec 25, 2022
}

val LOCAL_STORE_SERIALIZER = ConfigBuilder("spark.history.store.serializer")
.doc("Serializer for writing/reading in-memory UI objects to/from disk-based KV Store; " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disk-based KV Store; , ; or :?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It follows

  val HYBRID_STORE_DISK_BACKEND = ConfigBuilder("spark.history.store.hybridStore.diskBackend")
    .doc("Specifies a disk-based store used in hybrid store; LEVELDB or ROCKSDB.")

@LuciferYang
Copy link
Contributor

Make it possible for SHS to read the live UI rocksdb instance.

Will this be supported in the future or has it been supported now?

@gengliangwang gengliangwang changed the title [WIP][SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server Dec 27, 2022
@gengliangwang
Copy link
Member Author

I think SHS can read the written RocksDB after this PR, if the file path/DB backend/serializer configurations are set properly.

.doc("Serializer for writing/reading in-memory UI objects to/from disk-based KV Store; " +
"JSON or PROTOBUF. JSON serializer is the only choice before Spark 3.4.0, thus it is the " +
"default value. PROTOBUF serializer is fast and compact, and it is the default " +
"serializer for disk-based KV store of live UI.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this config. SHS only reads the rocksdb files, but not write them. So it should support whatever format of the rocksdb values. Is there a way to encode the format in rocksdb as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHS writes RocksDB files during replaying event logs. The default serializer is JSON+GZIP, which is slower than the Protobuf serializer. Imagine that there are 300GB of event logs to replay...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see

@mridulm
Copy link
Contributor

mridulm commented Dec 29, 2022

+CC @thejdeep, @shardulm94

@gengliangwang
Copy link
Member Author

@techaddict @mridulm @LuciferYang @cloud-fan thanks for the review.
Merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants