Skip to content

Conversation

@kamilwu
Copy link
Contributor

@kamilwu kamilwu commented Jun 11, 2019


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 11, 2019

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 12, 2019

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 12, 2019

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 12, 2019

Run Python Load Tests Combine Dataflow Batch

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 12, 2019

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 12, 2019

Run Python Load Tests Combine Dataflow Batch

@kamilwu kamilwu marked this pull request as ready for review June 13, 2019 07:28
@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 13, 2019

R: @kkucharc

Copy link
Contributor

@kkucharc kkucharc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @kamilwu . I left some comments. Adding cc: @pabloem


self.top_count = self.pipeline.get_option('top_count')
if self.top_count is None:
self.fail('You should set \"--topCount\" option to use TOP combiners')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have typo here topCount -> top_count

self.top_count = self.pipeline.get_option('top_count')
if self.top_count is None:
self.fail('You should set \"--topCount\" option to use TOP combiners')
self.top_count = int(self.top_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing this and above construction to just try int() and then catch Type and Value exception? (for cases it's None or not a number value)

input_options : '\'{"num_records": 20000000,' +
'"key_size": 10,' +
'"value_size": 90,' +
'"bundle_size_distribution_type": "const",' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle_size_distribution_type, bundle_size_distribution_param and force_initial_num_bundles have default values -> we don't have to specify them here. WDYT about removing it from here?

publish_to_big_query: true,
metrics_dataset : 'load_test',
metrics_table : 'python_dataflow_batch_combine_5',
input_options : '\'{"num_records": 20000000,' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is also 20000000/8 = 2500000

publish_to_big_query: true,
metrics_dataset : 'load_test',
metrics_table : 'python_dataflow_batch_combine_4',
input_options : '\'{"num_records": 20000000,' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In proposal there is 20000000/4 = 5000000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right. I didn't notice.

'"bundle_size_distribution_type": "const",' +
'"bundle_size_distribution_param": 1,' +
'"force_initial_num_bundles": 1}\'',
max_num_workers : 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that max_num_workers and num_workers change autoscaling_algorithm to NONE, but maybe it would be good to be sure here and specify autoscaling_algorithm: "NONE" just in case, WDYT?

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 17, 2019

@kkucharc Thanks for review, I've just pushed my fixes.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 17, 2019

Run Python PreCommit

@kamilwu
Copy link
Contributor Author

kamilwu commented Jun 24, 2019

R: @iemejia Could you take a look?

@iemejia
Copy link
Member

iemejia commented Jul 1, 2019

@kamilwu sorry for the delayed review, it looks good, but can you please get the execution of the tests running. I will merge just afterwards.

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run seed job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Python Load Tests Combine Dataflow Batch

@kamilwu kamilwu force-pushed the combine-jenkins branch from ae21bc2 to 439656b Compare July 2, 2019 07:48
@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Seed Job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Portable_Python PreCommit

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Python Load Tests Combine Dataflow Batch

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Python PreCommit

@kamilwu kamilwu force-pushed the combine-jenkins branch from 439656b to 884d1f2 Compare July 2, 2019 13:43
@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run seed job

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

Run Python Load Tests Combine Dataflow Batch

@kamilwu
Copy link
Contributor Author

kamilwu commented Jul 2, 2019

@iemejia It's fine, thanks for your review. PR is now ready to be merged.

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iemejia iemejia merged commit 7f83599 into apache:master Jul 2, 2019
@iemejia
Copy link
Member

iemejia commented Jul 2, 2019

Thanks @kamilwu it was quite hard to make jenkins happy but finally.

@kamilwu kamilwu deleted the combine-jenkins branch July 2, 2019 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants