Skip to content

[SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider#41410

Closed
anishshri-db wants to merge 2 commits into
apache:masterfrom
anishshri-db:task/SPARK-43902
Closed

[SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider#41410
anishshri-db wants to merge 2 commits into
apache:masterfrom
anishshri-db:task/SPARK-43902

Conversation

@anishshri-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

Why are the changes needed?

Small change to use lighter weight check for key existence

Does this PR introduce any user-facing change?

No

How was this patch tested?

Ran all relevant unit tests

  • RocksDBSuite
  • RocksDBStateStoreSuite
  • RocksDBStateStoreIntegrationSuite
[info] Run completed in 41 seconds, 621 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

[info] Run completed in 54 seconds, 675 milliseconds.
[info] Total number of tests run: 73
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 73, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

[info] Run completed in 14 seconds, 892 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

…s while tracking metrics using RocksDB state store provider
@anishshri-db anishshri-db changed the title [SPARK-43902] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider May 31, 2023
@anishshri-db
Copy link
Copy Markdown
Contributor Author

cc - @HeartSaVioR - PTAL - Thx

@anishshri-db
Copy link
Copy Markdown
Contributor Author

@siying - could you please take a look too ? From the comment it seems that keyMayExist is a lighter check than get. Is there more nuance here ?

https://github.com/facebook/rocksdb/blob/68a9cd21f2f6af4c6ab8724a19fbd4ea8ae89bdd/include/rocksdb/db.h#L851

@anishshri-db
Copy link
Copy Markdown
Contributor Author

Checked the results here and basically it seems like with high overwrite rate, the perf actually becomes worse

================================================================================================
put rows
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
putting 10000 rows (10000 rows to overwrite - rate 100):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------------------
In-memory                                                            8              9           1          1.3         769.8       1.0X
RocksDB (trackTotalNumberOfRows: true)                              88             90           1          0.1        8804.6       0.1X
RocksDB (trackTotalNumberOfRows: false)                             19             20           0          0.5        1901.4       0.4X

OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
putting 10000 rows (5000 rows to overwrite - rate 50):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------------
In-memory                                                          8              9           1          1.3         775.3       1.0X
RocksDB (trackTotalNumberOfRows: true)                            63             65           1          0.2        6310.2       0.1X
RocksDB (trackTotalNumberOfRows: false)                           20             21           0          0.5        1982.8       0.4X

OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
putting 10000 rows (1000 rows to overwrite - rate 10):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------------
In-memory                                                          7              8           1          1.4         727.4       1.0X
RocksDB (trackTotalNumberOfRows: true)                            41             42           0          0.2        4113.9       0.2X
RocksDB (trackTotalNumberOfRows: false)                           20             20           0          0.5        1967.1       0.4X

OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
putting 10000 rows (0 rows to overwrite - rate 0):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------------
In-memory                                                      7              8           1          1.4         722.2       1.0X
RocksDB (trackTotalNumberOfRows: true)                        35             36           0          0.3        3533.8       0.2X
RocksDB (trackTotalNumberOfRows: false)                       19             20           0          0.5        1929.7       0.4X

With low overwrite rate, the value is ~same as current. So we are not seeing the perf gain for that case either. Closing PR as not required. Current model seems to be the best, perf-wise. Thx

@HeartSaVioR
Copy link
Copy Markdown
Contributor

Thanks for the check and update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants