[SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3#2595
Conversation
…ild.gradle to allow for running on spark3 and spark2
…ate running various JMH suites via copying commands
holdenk
left a comment
There was a problem hiding this comment.
One quick question, but thanks for working on this :)
|
@kbendick got this error When i ran on the pr |
So just like the current set up (as far as I'm aware), you have to specify the actual test to be run. So something like the following, as mentioned in the existing code here I will test simply |
I tried running When I run an individual test, as recommended in the comments of each test (or above), it runs just fine to completion. I think it's more an issue with gradle JVM settings and then running out of memory or other resources, possibly with the number of forked threads when running all of the benchmarks for all of the requested iterations. TLDR: I don't think we should run |
Further follow up - @RussellSpitzer and I were both able to get the tests to run to completion, but users should not run This patch does not change the behavior of the jmh test suite with respect to ability to run to completion from its previous state. It's still possible either way, just rather hard given the number of tests and the limitations of Gradle's JVM (even with the common expanded memory parameters). |
|
Hi @kbendick , sorry for the delayed review. Both spark2 and spark3 jmh commands works well in my laptop with your PR. For Spark3, I can do with both jdk8 and jdk11. Looks good to me overall. One question, is it possible to let gradle create dirs( |
I'm admittedly not great with Gradle. I tried to figure out how to get this to work, but couldn't (though I didn't spend too much time on it and probably could get closer if I spent more time on it later). I could look into it further, but the current state in master is that if a user runs the commands as executed, they get a folder not found error. So I figured this was still an upgrade. By default, if no output path is specified, the results get placed in a So given that this is a user specified input that does deviate from the default, it might make more sense to simply leave as is (where the user gets an error and then creates the folder) as opposed to complicating the build files and the separation of tasks into Though I am partial to the gitignore as that ensures nobody uploads results from their laptop or some env that might be very resource constrained and not representative. Let me take another pass at it for a bit, but then maybe we can just revert those changes and leave it with the current behavior for the moment? That is, just failing if the user overrides the output file - which does provide a very clear If we really want to have Gradle create the folders (which I agree is a good idea in theory), would it be possible to simply remove the |
|
I spoke to @flyrain offline and he agrees that due to the age of this PR and the complexity it's going to add to the build file (and the subsequent discussion), we should attempt to change the build file to create the output directories in a subsequent PR (as this discussion has already gotten really huge and the build file changes will almost certainly invite more discussion). This is arguably not worse than the current state, though I'm happy to remove the |
Right now, we have JMH benchmarks to allow performance comparisons between the native spark sources / sinks vs the iceberg sources and sinks in spark.
However, these JMH test suites don't get registered for Spark 3.
I have moved the JMH test code to
spark/src/jmh/...and then added that to the src directories for those projects.Now, depending on which JDK one is using, you can run either Spark 2 or 3 tests (with JDK 8) or Spark 3 tests (with JDK 11).
It would still be possible to add specific benchmarks to Spark 3 or Spark 2 by placing them in
spark[2|3]/src/jmh/....For now, all of the code is left as is and has been copied over (with the command to run updated in each file). I've also added a
.gitkeepfile inspark2/benchmarkandspark3/benchmarkso that users can run the commands as is without having to create the folders.This closes #2590