New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-6908] Refactor Python performance test groovy file for easy configuration #8518
Changes from all commits
bfb0bbe
8cde119
116ba25
f16c2fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,46 +18,128 @@ | |
|
||
import CommonJobProperties as commonJobProperties | ||
|
||
// This job runs the Beam Python performance tests on PerfKit Benchmarker. | ||
job('beam_PerformanceTests_Python'){ | ||
// Set default Beam job properties. | ||
commonJobProperties.setTopLevelMainJobProperties(delegate) | ||
|
||
// Run job in postcommit every 6 hours, don't trigger every push. | ||
commonJobProperties.setAutoJob( | ||
delegate, | ||
'H */6 * * *') | ||
|
||
// Allows triggering this build against pull requests. | ||
commonJobProperties.enablePhraseTriggeringFromPullRequest( | ||
delegate, | ||
'Python SDK Performance Test', | ||
'Run Python Performance Test') | ||
|
||
def pipelineArgs = [ | ||
project: 'apache-beam-testing', | ||
staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it', | ||
temp_location: 'gs://temp-storage-for-end-to-end-tests/temp-it', | ||
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output' | ||
] | ||
|
||
class PerformanceTestConfigurations { | ||
// Name of the Jenkins job | ||
String jobName | ||
// Description of the Jenkins job | ||
String jobDescription | ||
// Phrase to trigger this Jenkins job | ||
String jobTriggerPhrase | ||
// Frequency of the job build, default to every 6 hours | ||
String buildSchedule = 'H */6 * * *' | ||
// A benchmark defined flag, will pass to benchmark as "--benchmarkName" | ||
String benchmarkName = 'beam_integration_benchmark' | ||
// A benchmark defined flag, will pass to benchmark as "--bigqueryTable" | ||
String resultTable | ||
// A benchmark defined flag, will pass to benchmark as "--beam_it_class" | ||
String itClass | ||
// A benchmark defined flag, will pass to benchmark as "--beam_it_module". | ||
// It's a Gradle project that defines 'integrationTest' task. This task is executed by Perfkit | ||
// Beam benchmark launcher and can be added by enablePythonPerformanceTest() defined in | ||
// BeamModulePlugin. | ||
String itModule | ||
// A benchmark defined flag, will pass to benchmark as "--beam_python_sdk_location". | ||
// It's the location of Python SDK distribution archive which is required for TestDataflowRunner. | ||
String pythonSdkLocation = '' | ||
// A benchmark defined flag, will pass to benchmark as "--beam_runner" | ||
String runner = 'TestDataflowRunner' | ||
// A benchmark defined flag, will pass to benchmark as "--beam_it_timeout" | ||
Integer itTimeoutSec = 1200 | ||
// A benchmark defined flag, will pass to benchmark as "--beam_it_args" | ||
Map extraPipelineArgs | ||
} | ||
|
||
// Common pipeline args for Dataflow job. | ||
def dataflowPipelineArgs = [ | ||
project : 'apache-beam-testing', | ||
staging_location: 'gs://temp-storage-for-end-to-end-tests/staging-it', | ||
temp_location : 'gs://temp-storage-for-end-to-end-tests/temp-it', | ||
] | ||
|
||
|
||
// Configurations of each Jenkins job. | ||
def testConfigurations = [ | ||
new PerformanceTestConfigurations( | ||
jobName : 'beam_PerformanceTests_WordCountIT_Py27', | ||
jobDescription : 'Python SDK Performance Test - Run WordCountIT in Py27', | ||
jobTriggerPhrase : 'Run Python27 WordCountIT Performance Test', | ||
resultTable : 'beam_performance.wordcount_py27_pkb_results', | ||
itClass : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', | ||
itModule : 'sdks/python', | ||
extraPipelineArgs : dataflowPipelineArgs + [ | ||
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output' | ||
], | ||
), | ||
new PerformanceTestConfigurations( | ||
jobName : 'beam_PerformanceTests_WordCountIT_Py35', | ||
jobDescription : 'Python SDK Performance Test - Run WordCountIT in Py35', | ||
jobTriggerPhrase : 'Run Python35 WordCountIT Performance Test', | ||
resultTable : 'beam_performance.wordcount_py35_pkb_results', | ||
itClass : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is my understanding right that each performance test configuration can run only one IT? We might want to always include 'wordcount' in configuration flags or always omit if the suite will later include other benchmarks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it can run multiple ITs if you want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, but we will not get one test reading per each test, instead we will run both tests, and get a total runtime, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. This flag defines what tests run in a Gradle execution, and the benchmark will evaluate whole Gradle execution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So in this case it's one test per configuration, so we may want to call it "WordCount" benchmark, instead of generic 'Performance test'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sg. done |
||
itModule : 'sdks/python/test-suites/dataflow/py35', | ||
extraPipelineArgs : dataflowPipelineArgs + [ | ||
output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output' | ||
], | ||
) | ||
] | ||
|
||
|
||
for (testConfig in testConfigurations) { | ||
createPythonPerformanceTestJob(testConfig) | ||
} | ||
|
||
|
||
private void createPythonPerformanceTestJob(PerformanceTestConfigurations testConfig) { | ||
// This job runs the Beam Python performance tests on PerfKit Benchmarker. | ||
job(testConfig.jobName) { | ||
// Set default Beam job properties. | ||
commonJobProperties.setTopLevelMainJobProperties(delegate) | ||
|
||
// Run job in postcommit, don't trigger every push. | ||
commonJobProperties.setAutoJob( | ||
delegate, | ||
testConfig.buildSchedule) | ||
|
||
// Allows triggering this build against pull requests. | ||
commonJobProperties.enablePhraseTriggeringFromPullRequest( | ||
delegate, | ||
testConfig.jobDescription, | ||
testConfig.jobTriggerPhrase) | ||
|
||
def argMap = [ | ||
beam_sdk : 'python', | ||
benchmarks : testConfig.benchmarkName, | ||
bigquery_table : testConfig.resultTable, | ||
beam_it_class : testConfig.itClass, | ||
beam_it_module : testConfig.itModule, | ||
beam_prebuilt : 'true', // Python benchmark don't need to prebuild repo before running | ||
beam_python_sdk_location: getSDKLocationFromModule(testConfig.pythonSdkLocation, | ||
testConfig.itModule), | ||
beam_runner : testConfig.runner, | ||
beam_it_timeout : testConfig.itTimeoutSec.toString(), | ||
beam_it_args : joinPipelineArgs(testConfig.extraPipelineArgs), | ||
] | ||
|
||
commonJobProperties.buildPerformanceTest(delegate, argMap) | ||
} | ||
} | ||
|
||
|
||
// Helper function to join pipeline args from a map. | ||
private static String joinPipelineArgs(Map pipelineArgs) { | ||
def pipelineArgList = [] | ||
pipelineArgs.each({ | ||
key, value -> pipelineArgList.add("--$key=$value") | ||
}) | ||
def pipelineArgsJoined = pipelineArgList.join(',') | ||
|
||
def argMap = [ | ||
beam_sdk : 'python', | ||
benchmarks : 'beam_integration_benchmark', | ||
bigquery_table : 'beam_performance.wordcount_py_pkb_results', | ||
beam_it_class : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', | ||
beam_it_module : 'sdks/python', | ||
beam_prebuilt : 'true', // skip beam prebuild | ||
beam_python_sdk_location : 'build/apache-beam.tar.gz', | ||
beam_runner : 'TestDataflowRunner', | ||
beam_it_timeout : '1200', | ||
beam_it_args : pipelineArgsJoined, | ||
] | ||
|
||
commonJobProperties.buildPerformanceTest(delegate, argMap) | ||
return pipelineArgList.join(',') | ||
} | ||
|
||
|
||
// Get relative path of sdk location based on itModule if the location is not provided. | ||
private static String getSDKLocationFromModule(String pythonSDKLocation, String itModule) { | ||
if (!pythonSDKLocation && itModule.startsWith("sdks/python")) { | ||
return (itModule.substring("sdks/python".length()) + "/build/apache-beam.tar.gz").substring(1) | ||
} | ||
return pythonSDKLocation | ||
} |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of this? I am confused by how this is actually used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a benchmark flag defined in here. Basically it's the path of Gradle module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know how to set this correctly? It seems not intuitive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can rename it something like
itGradleModule
if helps.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but how do we know which gradle module to select? I see that you used a different value for Py2 and Py3 benchmark, how did you pick those specific ones? How does a person writing an new benchmark decides how to fill this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think people should know how Perfkit
beam_integration_benchmark
works before configuring in Jenkins. Probably we need better document for that, and also happy to sync with you offline for more details.For you question,
beam_integration_benchmark
uses Gradle taskintegrationTest
which can be enabled throughenablePythonPerformanceTest
. Sobeam_it_module
is the Gradle project whereintegrationTest
located.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, let's add a comment here:
Gradle project that defines 'runIntegrationTest' task. This task is executed by Perfkit Beam benchmark launcher.
This task can be added by enablePythonPerformanceTest() defined in BeamModulePlugin.