Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6908] Refactor Python performance test groovy file for easy configuration #8518

Merged
merged 4 commits into from May 17, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
158 changes: 120 additions & 38 deletions .test-infra/jenkins/job_PerformanceTests_Python.groovy
Expand Up @@ -18,46 +18,128 @@

import CommonJobProperties as commonJobProperties

// This job runs the Beam Python performance tests on PerfKit Benchmarker.
job('beam_PerformanceTests_Python'){
// Set default Beam job properties.
commonJobProperties.setTopLevelMainJobProperties(delegate)

// Run job in postcommit every 6 hours, don't trigger every push.
commonJobProperties.setAutoJob(
delegate,
'H */6 * * *')

// Allows triggering this build against pull requests.
commonJobProperties.enablePhraseTriggeringFromPullRequest(
delegate,
'Python SDK Performance Test',
'Run Python Performance Test')

def pipelineArgs = [
project: 'apache-beam-testing',
staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it',
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
]

class PerformanceTestConfigurations {
// Name of the Jenkins job
String jobName
// Description of the Jenkins job
String jobDescription
// Phrase to trigger this Jenkins job
String jobTriggerPhrase
// Frequency of the job build, default to every 6 hours
String buildSchedule = 'H */6 * * *'
// A benchmark defined flag, will pass to benchmark as "--benchmarkName"
String benchmarkName = 'beam_integration_benchmark'
// A benchmark defined flag, will pass to benchmark as "--bigqueryTable"
String resultTable
// A benchmark defined flag, will pass to benchmark as "--beam_it_class"
String itClass
// A benchmark defined flag, will pass to benchmark as "--beam_it_module".
// It's a Gradle project that defines 'integrationTest' task. This task is executed by Perfkit
// Beam benchmark launcher and can be added by enablePythonPerformanceTest() defined in
// BeamModulePlugin.
String itModule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of this? I am confused by how this is actually used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a benchmark flag defined in here. Basically it's the path of Gradle module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know how to set this correctly? It seems not intuitive...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename it something like itGradleModule if helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but how do we know which gradle module to select? I see that you used a different value for Py2 and Py3 benchmark, how did you pick those specific ones? How does a person writing an new benchmark decides how to fill this value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people should know how Perfkit beam_integration_benchmark works before configuring in Jenkins. Probably we need better document for that, and also happy to sync with you offline for more details.

For you question, beam_integration_benchmark uses Gradle task integrationTest which can be enabled through enablePythonPerformanceTest. So beam_it_module is the Gradle project where integrationTest located.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, let's add a comment here:
Gradle project that defines 'runIntegrationTest' task. This task is executed by Perfkit Beam benchmark launcher.
This task can be added by enablePythonPerformanceTest() defined in BeamModulePlugin.

// A benchmark defined flag, will pass to benchmark as "--beam_python_sdk_location".
// It's the location of Python SDK distribution archive which is required for TestDataflowRunner.
String pythonSdkLocation = ''
// A benchmark defined flag, will pass to benchmark as "--beam_runner"
String runner = 'TestDataflowRunner'
// A benchmark defined flag, will pass to benchmark as "--beam_it_timeout"
Integer itTimeoutSec = 1200
// A benchmark defined flag, will pass to benchmark as "--beam_it_args"
Map extraPipelineArgs
}

// Common pipeline args for Dataflow job.
def dataflowPipelineArgs = [
project : 'apache-beam-testing',
staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it',
temp_location : 'gs://temp-storage-for-end-to-end-tests/temp-it',
]


// Configurations of each Jenkins job.
def testConfigurations = [
new PerformanceTestConfigurations(
jobName : 'beam_PerformanceTests_WordCountIT_Py27',
jobDescription : 'Python SDK Performance Test - Run WordCountIT in Py27',
jobTriggerPhrase : 'Run Python27 WordCountIT Performance Test',
resultTable : 'beam_performance.wordcount_py27_pkb_results',
itClass : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
itModule : 'sdks/python',
extraPipelineArgs : dataflowPipelineArgs + [
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
],
),
new PerformanceTestConfigurations(
jobName : 'beam_PerformanceTests_WordCountIT_Py35',
jobDescription : 'Python SDK Performance Test - Run WordCountIT in Py35',
jobTriggerPhrase : 'Run Python35 WordCountIT Performance Test',
resultTable : 'beam_performance.wordcount_py35_pkb_results',
itClass : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding right that each performance test configuration can run only one IT? We might want to always include 'wordcount' in configuration flags or always omit if the suite will later include other benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can run multiple ITs if you want. itClass will be passed to classname of this function and eventually to -Dtests of the Gradle invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but we will not get one test reading per each test, instead we will run both tests, and get a total runtime, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. This flag defines what tests run in a Gradle execution, and the benchmark will evaluate whole Gradle execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case it's one test per configuration, so we may want to call it "WordCount" benchmark, instead of generic 'Performance test'.
Also, where do we specify the input for the WC pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg. done

itModule : 'sdks/python/test-suites/dataflow/py35',
extraPipelineArgs : dataflowPipelineArgs + [
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output'
],
)
]


for (testConfig in testConfigurations) {
createPythonPerformanceTestJob(testConfig)
}


private void createPythonPerformanceTestJob(PerformanceTestConfigurations testConfig) {
// This job runs the Beam Python performance tests on PerfKit Benchmarker.
job(testConfig.jobName) {
// Set default Beam job properties.
commonJobProperties.setTopLevelMainJobProperties(delegate)

// Run job in postcommit, don't trigger every push.
commonJobProperties.setAutoJob(
delegate,
testConfig.buildSchedule)

// Allows triggering this build against pull requests.
commonJobProperties.enablePhraseTriggeringFromPullRequest(
delegate,
testConfig.jobDescription,
testConfig.jobTriggerPhrase)

def argMap = [
beam_sdk : 'python',
benchmarks : testConfig.benchmarkName,
bigquery_table : testConfig.resultTable,
beam_it_class : testConfig.itClass,
beam_it_module : testConfig.itModule,
beam_prebuilt : 'true', // Python benchmark don't need to prebuild repo before running
beam_python_sdk_location: getSDKLocationFromModule(testConfig.pythonSdkLocation,
testConfig.itModule),
beam_runner : testConfig.runner,
beam_it_timeout : testConfig.itTimeoutSec.toString(),
beam_it_args : joinPipelineArgs(testConfig.extraPipelineArgs),
]

commonJobProperties.buildPerformanceTest(delegate, argMap)
}
}


// Helper function to join pipeline args from a map.
private static String joinPipelineArgs(Map pipelineArgs) {
def pipelineArgList = []
pipelineArgs.each({
key, value -> pipelineArgList.add("--$key=$value")
})
def pipelineArgsJoined = pipelineArgList.join(',')

def argMap = [
beam_sdk : 'python',
benchmarks : 'beam_integration_benchmark',
bigquery_table : 'beam_performance.wordcount_py_pkb_results',
beam_it_class : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
beam_it_module : 'sdks/python',
beam_prebuilt : 'true', // skip beam prebuild
beam_python_sdk_location : 'build/apache-beam.tar.gz',
beam_runner : 'TestDataflowRunner',
beam_it_timeout : '1200',
beam_it_args : pipelineArgsJoined,
]

commonJobProperties.buildPerformanceTest(delegate, argMap)
return pipelineArgList.join(',')
}


// Get relative path of sdk location based on itModule if the location is not provided.
private static String getSDKLocationFromModule(String pythonSDKLocation, String itModule) {
if (!pythonSDKLocation && itModule.startsWith("sdks/python")) {
return (itModule.substring("sdks/python".length()) + "/build/apache-beam.tar.gz").substring(1)
}
return pythonSDKLocation
}
63 changes: 0 additions & 63 deletions .test-infra/jenkins/job_Performancetests_Python35.groovy

This file was deleted.