New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10675] Add Python GBK Load Tests for streaming on Dataflow #12612
Conversation
Run Seed Job |
Run Load Tests Python GBK Dataflow Batch |
Run Load Tests Python GBK Dataflow Streaming |
Run Load Tests Python GBK reiterate Dataflow Batch |
Run Load Tests Python GBK reiterate Dataflow Streaming |
e9c4510
to
f67712b
Compare
Run Seed Job |
Run Load Tests Python GBK Dataflow Batch |
Run Load Tests Python GBK Dataflow Streaming |
Run Load Tests Python GBK reiterate Dataflow Batch |
Run Load Tests Python GBK reiterate Dataflow Streaming |
Run Python PreCommit |
R: @tysonjh Could you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some minor comments.
] | ||
], | ||
[ | ||
title : 'GroupByKey Python Load test: 2GB of 100B records', | ||
test : 'apache_beam.testing.load_tests.group_by_key_test', | ||
runner : CommonTestProperties.Runner.DATAFLOW, | ||
pipelineOptions: [ | ||
job_name : 'load-tests-python-dataflow-batch-gbk-2-' + now, | ||
job_name : 'load-tests-python-dataflow-${mode}-gbk-2-' + now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job_name : 'load-tests-python-dataflow-${mode}-gbk-2-' + now, | |
job_name : "load-tests-python-dataflow-${mode}-gbk-2-${now}", |
def addStreamingOptions(test) { | ||
test.pipelineOptions << [streaming: null, experiments: 'use_runner_v2', | ||
enable_streaming_engine: null ] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to explain what these settings are? It's unexpected to see that 'streaming: null' or 'enable_streaming_engine: null' somehow enables streaming, or why 'use_runner_v2' is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed --enable_streaming_engine
, since it is now being added automatically when using use_runner_v2
: #12585
def batchLoadTestJob = { scope, triggeringContext -> | ||
scope.description('Runs Python GBK reiterate load tests on Dataflow runner in batch mode') | ||
commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240) | ||
def addStreamingOptions(test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please add a comment.
batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) | ||
} | ||
CronJobBuilder.cronJob('beam_LoadTests_Python_GBK_reiterate_Dataflow_Batch', | ||
'H 14 * * *', this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the methodology for picking the time to trigger these? Is it documented anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, each test suite (GBK, ParDo, IO tests, etc.) should has its own, unique time in order no to flood Jenkins with many tests that are triggered at the same time. When adding a new test suite, a contributor has to take a look at what time slots are already occupied and avoid using them.
I think this is not documented. I'll add some information here: https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide
@@ -22,119 +22,138 @@ import InfluxDBCredentialsHelper | |||
|
|||
def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC')) | |||
|
|||
def loadTestConfigurations = { datasetName -> | |||
// TODO(BEAM-10774): Skipping some cases because they are too slow. | |||
def STREAMING_TESTS_TO_SKIP = [1, 2, 4, 5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkucharc made a good point here in PR#12435 about using indices for ignoring tests. I'm more inclined towards the approach @kkucharc is taking by excluding using the job_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, an argument that @kkucharc made in PR#12435 is convincing. I'll exclude those cases by using the job_name.
Codecov Report
@@ Coverage Diff @@
## master #12612 +/- ##
==========================================
- Coverage 34.47% 34.33% -0.14%
==========================================
Files 684 692 +8
Lines 81483 81958 +475
Branches 9180 9264 +84
==========================================
+ Hits 28090 28144 +54
- Misses 52965 53391 +426
+ Partials 428 423 -5
Continue to review full report at Codecov.
|
Run Seed Job |
Run Load Tests Python GBK Dataflow Streaming |
Run Load Tests Python GBK Dataflow Batch |
Run Load Tests Python GBK reiterate Dataflow Streaming |
Run Load Tests Python GBK reiterate Dataflow Batch |
@tysonjh Thanks! Could you verify my fixes before I merge? |
…properly returned from Ungroup transform
b1c9673
to
e6bbfe2
Compare
enable_streaming_engine is now being added automatically when running with 'use_runner_v2'
e6bbfe2
to
11a1a3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Thanks! |
Add Python GBK Load Tests for streaming on Dataflow. Cases 1, 2, 4 and 5 are temporarily excluded because their execution time is too long.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.