Skip to content

Conversation

@stankiewicz
Copy link
Contributor

@stankiewicz stankiewicz commented Jan 28, 2020

Splitting CassandraIO source into multiple sources works fast as it uses one connection pool to Cassandra cluster but after that dataflow.worker.WorkerCustomSources is calling CassandraSource.getEstimatedSizeBytes for each source which setups and tears down connection to Cassandra cluster to calculate same size of table. This optimization introduces caching of size internally just to avoid additional queries.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

… table

Splitting CassandraIO source into multiple sources works fast as it uses one connection pool to Cassandra cluster but after that dataflow.worker.WorkerCustomSources is calling CassandraSource.getEstimatedSizeBytes for each source which setups and tears down connection to Cassandra cluster to calculate same size of table.
@stankiewicz stankiewicz requested a review from boyuanzz January 28, 2020 15:24
@boyuanzz
Copy link
Contributor

Retest it please

@boyuanzz
Copy link
Contributor

Retest it please

@boyuanzz
Copy link
Contributor

Retest it please

Copy link
Contributor

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care this! Could you please also add unit tests to CassandraIOTest?

@boyuanzz
Copy link
Contributor

Retest it please

Copy link
Contributor Author

@stankiewicz stankiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests and comments and fixed sizing logic (sum of split sizes roughly equals size of original source)

@boyuanzz
Copy link
Contributor

Retest it please

@stankiewicz
Copy link
Contributor Author

retest it please

1 similar comment
@boyuanzz
Copy link
Contributor

retest it please

@boyuanzz
Copy link
Contributor

retest this please

1 similar comment
@boyuanzz
Copy link
Contributor

retest this please

@boyuanzz
Copy link
Contributor

Java_Examples_Dataflow is broken probably because of dataflow service.
Please fix Spotless.

@stankiewicz
Copy link
Contributor Author

retest this please

1 similar comment
@boyuanzz
Copy link
Contributor

retest this please

@boyuanzz
Copy link
Contributor

Run Spotless PreCommit

2 similar comments
@boyuanzz
Copy link
Contributor

Run Spotless PreCommit

@boyuanzz
Copy link
Contributor

Run Spotless PreCommit

@boyuanzz
Copy link
Contributor

All tests passed. I'll go ahead to merge this PR.
Thanks for your contribution!

@boyuanzz boyuanzz merged commit 94ca187 into apache:master Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants