[BEAM-7577] Allow ValueProviders in Datastore Query filters by EDjur · Pull Request #8950 · apache/beam

EDjur · 2019-06-26T12:35:50Z

I have a use case where I need to supply Datastore Query filters at runtime. This PR allows the usage of ValueProviders when constructing the Datastore Query and converts them to their expected str-equivalents when running in a pipeline in _to_client_query().

Related Jira ticket: https://issues.apache.org/jira/browse/BEAM-7577

I have tested this by building my local version of beam and using it in the sdk_location flag when running a Dataflow job on GCP.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Gearpump	Samza
Go	---	---	---	---
Java
Python	---		---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

EDjur · 2019-06-26T15:36:16Z

R: @aaltay Appreciate any feedback and comments!

aaltay · 2019-06-26T18:53:40Z

R: @udim for reviewing related to Datastore
R: @azurezyq for ValueProvider related review.

Thank you @EDjur for sending this.

sdks/python/apache_beam/io/gcp/datastore/v1new/types.py

sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py

EDjur · 2019-06-27T13:45:28Z

Thanks for the feedback! Will push an update once I've run tests locally.

Edit: Looks like my local pylint didn't catch the same issues as the one in Jenkins. Should be fixed now.

…ation. Fixed tests

EDjur · 2019-07-01T13:02:38Z

@udim Made some edits based on your comments, let me know what you think!

EDjur · 2019-07-03T14:03:56Z

I've noticed that a small change might be needed in datastoreio.py or alternatively in query_splitter.py in order to use this together with ReadFromDatastore. Specifically, the validate_split function in query_splitter.py is causing issues when using value providers as a filter:

  for filter in query.filters:
    if filter[1] in ['<', '<=', '>', '>=']:
      raise SplitNotPossibleError('Query cannot have any inequality filters.')

Since this function is run before the query is converted to a client_query by calling the _to_client_query method, filter here will be of type ValueProvider, which does not support indexing, therefore raising a TypeError.

I'm thinking that we should perhaps evaluate the values of our ValueProvider-filter before calculating the split. But this means we cannot evaluate in _to_client_query, which I thought was a neat solution that wasn't particularly hacky.

For context, the flow is essentially the expand method in ReadFromDatastore that calls the SplitQuery before Read, and Read is what causes the _to_client_query method to be called.

Question is basically where the best place is to evaluate these filters.

@udim What's your take on this?

Edit: Will explore this again after fixing the other issue first.

udim · 2019-07-24T01:29:54Z

run python 2 postcommit

udim · 2019-07-24T01:30:09Z

run python 3.5 postcommit

udim · 2019-07-24T21:18:13Z

I've noticed that a small change might be needed in datastoreio.py or alternatively in query_splitter.py in order to use this together with ReadFromDatastore. Specifically, the validate_split function in query_splitter.py is causing issues when using value providers as a filter:
  for filter in query.filters:
    if filter[1] in ['<', '<=', '>', '>=']:
      raise SplitNotPossibleError('Query cannot have any inequality filters.')
Since this function is run before the query is converted to a client_query by calling the _to_client_query method, filter here will be of type ValueProvider, which does not support indexing, therefore raising a TypeError.

I'm thinking that we should perhaps evaluate the values of our ValueProvider-filter before calculating the split. But this means we cannot evaluate in _to_client_query, which I thought was a neat solution that wasn't particularly hacky.

For context, the flow is essentially the expand method in ReadFromDatastore that calls the SplitQuery before Read, and Read is what causes the _to_client_query method to be called.

Question is basically where the best place is to evaluate these filters.

@udim What's your take on this?

Edit: Will explore this again after fixing the other issue first.

I would put

self.filters = self._set_runtime_filters(filters)

in Query.__init__. I believe that solves both issues.

udim

I accidentally approved this thinking the code was reverted to using Iterable[Tuple[ValueProvider, ValueProvider, ValueProvider]].

EDjur · 2019-07-25T11:17:32Z

I accidentally approved this thinking the code was reverted to using Iterable[Tuple[ValueProvider, ValueProvider, ValueProvider]].

Yep noticed, I will revert the changes now :)

EDjur · 2019-07-25T12:51:11Z

I would put
self.filters = self._set_runtime_filters(filters)
in Query.__init__. I believe that solves both issues.

Will this not raise an error due to calling .get() on ValueProvider from a non-runtime context? As the Query is instantiated before executing the pipeline?

One solution could be to just check in the query splitter if it is a ValueProvider and then execute .get() on it if it is.

EDjur · 2019-07-27T12:00:55Z

Run Python PreCommit

udim · 2019-08-02T00:34:31Z

Will this not raise an error due to calling .get() on ValueProvider from a non-runtime context? As the Query is instantiated before executing the pipeline?

Yes, you're right. (I haven't written code that uses ValueProviders and it's trickier than it seems.)

udim · 2019-08-02T00:35:23Z

run python 3.5 postcommit

EDjur · 2019-08-02T07:26:17Z

(I haven't written code that uses ValueProviders and it's trickier than it seems.)

My thoughts exactly 😬

Cheers for the code duplication fix!

Elias added 4 commits June 26, 2019 14:29

[BEAM-7577] Allow ValueProviders in Datastore Query filters

2e9ab08

[BEAM-7577] Fixed types_test logic

993fbb9

[BEAM-7577] Removed unnecessary test case

f5aa919

[BEAM-7577] Fixed Py27 linting issues

32256e8

udim requested changes Jun 27, 2019

View reviewed changes

sdks/python/apache_beam/io/gcp/datastore/v1new/types.py Show resolved Hide resolved

sdks/python/apache_beam/io/gcp/datastore/v1new/types.py Show resolved Hide resolved

sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py Show resolved Hide resolved

Elias added 3 commits June 27, 2019 16:13

[BEAM-7577] Simplified ValueProvider value extraction. Added document…

463f7f9

…ation. Fixed tests

[BEAM-7577] Fixed linting issues

31cce7d

[BEAM-7577] Changed typoed print function to logging

6dc5fe8

udim approved these changes Jul 24, 2019

View reviewed changes

udim requested changes Jul 24, 2019

View reviewed changes

Elias added 2 commits July 26, 2019 14:06

Reverted changes. Added ValueProvider support for query splitter check

a475a7d

Fixed query splitter logic

539a4eb

reduced minor code duplication

749a8cc

udim approved these changes Aug 2, 2019

View reviewed changes

udim merged commit 7a0bc8d into apache:master Aug 2, 2019

Conversation

EDjur commented Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

EDjur commented Jun 26, 2019

Uh oh!

aaltay commented Jun 26, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EDjur commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EDjur commented Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EDjur commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

udim commented Jul 24, 2019

Uh oh!

udim commented Jul 24, 2019

Uh oh!

udim commented Jul 24, 2019

Uh oh!

udim left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EDjur commented Jul 25, 2019

Uh oh!

EDjur commented Jul 25, 2019

Uh oh!

EDjur commented Jul 27, 2019

Uh oh!

udim commented Aug 2, 2019

Uh oh!

udim commented Aug 2, 2019

Uh oh!

EDjur commented Aug 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EDjur commented Jun 26, 2019 •

edited

Loading

EDjur commented Jun 27, 2019 •

edited

Loading

EDjur commented Jul 1, 2019 •

edited

Loading

EDjur commented Jul 3, 2019 •

edited

Loading

udim left a comment •

edited

Loading

EDjur commented Aug 2, 2019 •

edited

Loading