PARQUET-1644: Clean up some benchmark code and docs. by RyanSkraba · Pull Request #672 · apache/parquet-java

RyanSkraba · 2019-08-29T16:36:35Z

Make sure you have checked all steps below.

Jira

My PR addresses the following Parquet Jira issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
- https://issues.apache.org/jira/browse/PARQUET-1644
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason: The benchmarking module is used for test purposes

Commits

My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does

nandorKollar · 2019-09-03T09:44:20Z

parquet-benchmarks/run.sh


 echo "Starting WRITE benchmarks"
-java -jar ${SCRIPT_PATH}/target/parquet-benchmarks.jar p*Write* "$@"
+java -jar ${SCRIPT_PATH}/target/parquet-benchmarks.jar org.apache.parquet.benchmarks.WriteBenchmarks "$@"


With this change, NestedNullWritingBenchmarks won't be executed in this benchmark suite, but I think that's fine. @gszadovszky do you agree?

I agree. When this script was created only ReadBenchmarks and WriteBenchmarks were exist. However, it might make sense to have a more descriptive name (e.g. runReadWrite.sh).

I combined the two existing run scripts into one with some very limited functionality to run predefined "suites" by keyword, but also to run all benchmarks.

What do you think?

RyanSkraba · 2019-09-03T13:11:08Z

Thanks for taking a look! I'm going to make the run.sh script a bit more descriptive to run "feature sets" so that there's a standard way to run all or some benchmarks. I don't want to go crazy with bash, but I think there's some value to have one entry point to running benchmarks in a standard way.

JMH isn't behaving as I would like for setting up and cleaning resources... I have a couple more fixes to make and I'll push.

Do not clean up resources after a benchmark, leave them for the next run.

RyanSkraba · 2019-09-06T09:39:48Z

For info, I made a change around general benchmarks setup/cleanup.

Before, the read files for ReadBenchmark were generated and handled differently than the read files for PageChecksumReadBenchmarks. I tried to make them consistent across benchmarks using the following logic:

When reading, ensure that any necessary file exists during @Setup but don't do any cleanup during the benchmark run.
When writing, ensure that the output file doesn't exist during @Setup
All cleanup needs to be done outside of the process running the benchmark.

As far as I can tell, this is the preferred way to use JMH when running "macro-benchmarks".

I didn't observe any actual functional differences after changing the setup strategy -- the total user time to run the PageChecksumReadBenchmarks went down, for example, but the measured operation time stayed the same.

gszadovszky

Thanks for working on this. I have some minor issues, but like it overall.

parquet-benchmarks/run.sh

parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/FilteringBenchmarks.java

parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/ReadBenchmarks.java

nandorKollar

LGTM, please address Gabor's comments.

…ark-cleanup

RyanSkraba · 2019-09-24T09:14:54Z

Hello! I git merged master into the PR instead of rebasing -- If I understand correctly, these'll be squashed at merge time.

Thanks for the review, and my apologies for the late fixes!

Ryan Skraba added 2 commits August 29, 2019 18:26

PARQUET-1644: Clean up some benchmark code and docs.

eb38690

Fix typo in read benchmark.

e270c3a

nandorKollar reviewed Sep 3, 2019

View reviewed changes

Ryan Skraba added 3 commits September 5, 2019 18:09

Only set-up the required state for a benchmark.

98dd0b7

Do not clean up resources after a benchmark, leave them for the next run.

Add logger (same as parquet-cli).

dee9f88

Rewrite run.sh to run 'suites'.

4d8d2ec

Fix typo with extra arguments on clean.

7b02afd

gszadovszky requested changes Sep 9, 2019

View reviewed changes

nandorKollar approved these changes Sep 9, 2019

View reviewed changes

Ryan Skraba added 3 commits September 24, 2019 10:47

Merge remote-tracking branch 'origin/master' into PARQUET-1644-benchm…

d238aea

…ark-cleanup

Annotations on one line.

cd0073c

Auto-build and auto-clean around benchmark run.

d383c26

gszadovszky approved these changes Sep 24, 2019

View reviewed changes

gszadovszky merged commit 7c4d1ec into apache:master Sep 24, 2019

RyanSkraba deleted the PARQUET-1644-benchmark-cleanup branch September 24, 2019 12:29

RyanSkraba restored the PARQUET-1644-benchmark-cleanup branch September 24, 2019 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARQUET-1644: Clean up some benchmark code and docs.#672

PARQUET-1644: Clean up some benchmark code and docs.#672
gszadovszky merged 9 commits intoapache:masterfrom
RyanSkraba:PARQUET-1644-benchmark-cleanup

RyanSkraba commented Aug 29, 2019 •

edited

Loading

Uh oh!

nandorKollar Sep 3, 2019

Uh oh!

gszadovszky Sep 3, 2019

Uh oh!

RyanSkraba Sep 6, 2019

Uh oh!

RyanSkraba commented Sep 3, 2019

Uh oh!

RyanSkraba commented Sep 6, 2019

Uh oh!

gszadovszky left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nandorKollar left a comment

Uh oh!

RyanSkraba commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RyanSkraba commented Aug 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jira

Tests

Commits

Documentation

Uh oh!

nandorKollar Sep 3, 2019

Choose a reason for hiding this comment

Uh oh!

gszadovszky Sep 3, 2019

Choose a reason for hiding this comment

Uh oh!

RyanSkraba Sep 6, 2019

Choose a reason for hiding this comment

Uh oh!

RyanSkraba commented Sep 3, 2019

Uh oh!

RyanSkraba commented Sep 6, 2019

Uh oh!

gszadovszky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nandorKollar left a comment

Choose a reason for hiding this comment

Uh oh!

RyanSkraba commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RyanSkraba commented Aug 29, 2019 •

edited

Loading