Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8941] Implement simple DSL for load tests #10543

Merged

Conversation

pawelpasterz
Copy link
Contributor

@pawelpasterz pawelpasterz commented Jan 9, 2020

Implement simple DSL for load tests. Probably first iteration of this PR.

Changes:

  • config -- creates configuration map from given closure (example below)
  • templateConfig -- create reusable LoadTestConfig instance (no need to create whole map for every single test), specific settings can be changed for every configuration map made from template (title, input options etc)
  • fromTemplate -- creates configuration map from given template, specific settings can be changed
  • validation for every returned map is performed (ex Flink runner requires environmentConfig to be set, test can not be launched without runner etc)
def loadTestConfigurations = { datasetName ->
    final def template = templateConfig {
        test 'apache_beam.testing.load_tests.group_by_key_test:GroupByKeyTest.testGroupByKey'
        dataflow()
        pipelineOptions {
            python()
            project 'apache-beam-testing'
            tempLocation 'gs://temp-storage-for-perf-tests/loadtests'
            publishToBigQuery true
            metricsDataset datasetName
            numWorkers 5
            autoscalingAlgorithm "NONE"
            specificParameters([
                fanout: 1,
                iterations: 1
            ]}
        }
    }
    [
            fromTemplate(template) {
                title 'GroupByKey Python Load test 2GB of 10B records'
                pipelineOptions {
                    jobName 'load-tests-python-dataflow-batch-gbk-1-'
                    metricsTable 'python_dataflow_batch_gbk_1'
                    inputOptions {
                        numRecords 200000000
                        keySize 1
                        valueSize 9
                    }

                }
            },
            fromTemplate(template) {
                title 'GroupByKey Python Load test 2GB of 100kB records'
                pipelineOptions {
                    jobName 'load-tests-python-dataflow-batch-gbk-3-'
                    metricsTable 'python_dataflow_batch_gbk_3'
                    inputOptions {
                        numRecords 2000
                        keySize 100000
                        valueSize 900000
                    }

                }
            },
            fromTemplate(template) {
                title 'GroupByKey Python Load test fanout 4 times with 2GB 10-byte records total'
                pipelineOptions {
                    jobName 'load-tests-python-dataflow-batch-gbk-4-'
                    metricsTable 'python_dataflow_batch_gbk_4'
                    inputOptions {
                        numRecords 5000000
                        keySize 10
                        valueSize 90
                    }
                    specificParameters([
                        fanout: 5
                    ]}
                }
            }
    ]
}
//compatible with current, map based implementation
loadTestsBuilder.loadTests(delegate, CommonTestProperties.SDK.PYTHON, loadTestConfigurations(datasetName), "GBK", "batch")

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@pawelpasterz
Copy link
Contributor Author

@kamilwu @mwalenia
Let me know what do you think. Thanks!

@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch 2 times, most recently from 29804b0 to 57fd867 Compare January 10, 2020 07:30
Copy link
Contributor

@kamilwu kamilwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pawelpasterz for your contribution! I've left some comments. In general, I believe this is a huge step forward in making testing infrastructure more friendly and less error-prone. It's good that we can abort seed job in case of some simple errors: type mismatch and lack of a required parameter.

.test-infra/jenkins/LoadTestConfig.groovy Show resolved Hide resolved
.test-infra/jenkins/LoadTestConfig.groovy Outdated Show resolved Hide resolved
@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch 2 times, most recently from 58488af to 2561b48 Compare January 13, 2020 12:34
@kamilwu
Copy link
Contributor

kamilwu commented Jan 13, 2020

Please avoid squashing unreviewed changes and force-pushing the branch. Instead, create an additional "fixup" commit: https://beam.apache.org/contribute/#make-the-reviewers-job-easier

@pawelpasterz
Copy link
Contributor Author

pawelpasterz commented Jan 14, 2020

@kamilwu I've implemented changes you requested. I think all possible parameters for load tests are included now. Let me know what do you think.
And thank for for pointing me redundant squashing.

@kamilwu
Copy link
Contributor

kamilwu commented Jan 16, 2020

Thanks, LGTM. Let's ask @kkucharc if she can help with the merge.

@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch from c08a8d5 to 1c90664 Compare January 16, 2020 12:28
Copy link
Contributor

@kkucharc kkucharc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pawelpasterz for this refactor :)
I left my comments.
Besides of that I've got two additional remarks:

  • it would be great to see how SMOKE test would change with this refactor
  • I am missing adding '_PR' suffix in tests which are phase triggered (please compare with this ). Should it be here?


//runners
void dataflow() { setRunnerAndUpdatePipelineOptions(Runner.DATAFLOW)}
void spark() { setRunnerAndUpdatePipelineOptions(Runner.SPARK) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any load tests in Spark or Flink (non-portable)? I am asking because those two options can be added later. My main concern is here if we are sure if someone will actually run test with Spark/Flink option if it won't fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I looked at CommonTestProperties class and picked Runner values. Since I am not that much familiar with code yet, my implementation (the one above) can be wrong. Should I limit it to dataflow and portable? (at least for the time being)

.test-infra/jenkins/LoadTestConfig.groovy Outdated Show resolved Hide resolved
.test-infra/jenkins/LoadTestConfig.groovy Outdated Show resolved Hide resolved
.test-infra/jenkins/LoadTestConfig.groovy Outdated Show resolved Hide resolved
@pawelpasterz
Copy link
Contributor Author

Hey @kkucharc I've made some changes and implemented smoke test. Let me know what do you think.
Regarding your question related with _PR suffix -- this dsl provides only configuration map which is then passed to LoadTestBuilder (as it is now). So there is no need to add suffix here.

@kkucharc
Copy link
Contributor

Thanks @pawelpasterz Let's check how it behaves on Jenkins 😄

@kkucharc
Copy link
Contributor

run seed job

@kkucharc
Copy link
Contributor

Run Java Load Tests DSL Smoke

@pawelpasterz
Copy link
Contributor Author

Hey @kkucharc , I fixed typo, should work now :) Please rerun seed job and smoke test, thanks!

@kkucharc
Copy link
Contributor

run seed job

@kkucharc
Copy link
Contributor

Run Java Load Tests DSL Smoke

@kkucharc
Copy link
Contributor

run seed job

@kkucharc
Copy link
Contributor

Run Java Load Tests DSL Smoke

1 similar comment
@kkucharc
Copy link
Contributor

Run Java Load Tests DSL Smoke

@kkucharc
Copy link
Contributor

run seed job

@kkucharc
Copy link
Contributor

Run Java Load Tests DSL Smoke

@kkucharc
Copy link
Contributor

@pawelpasterz looks like everything is fine now 🎉 Can you squash commits?

@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch from bfcbf75 to 8a5bee2 Compare January 22, 2020 13:21
@pawelpasterz
Copy link
Contributor Author

@kkucharc it is done

@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch from 8a5bee2 to 56644b1 Compare January 27, 2020 08:42
[BEAM-8941] review changes

[BEAM-8941] add missing non-primitive copies

[BEAM-8941] implement map for non-required parameters

[BEAM-8941] update docs
@pawelpasterz pawelpasterz force-pushed the BEAM-8941-create-simple-dsl-for-load-test branch from 56644b1 to 801a6bb Compare January 27, 2020 12:50
@kkucharc kkucharc merged commit 1753b43 into apache:master Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants