Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12790] Add projection pushdown to JDBC SchemaIO. #15373

Closed
wants to merge 1 commit into from

Conversation

ibzib
Copy link
Contributor

@ibzib ibzib commented Aug 23, 2021

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@ibzib
Copy link
Contributor Author

ibzib commented Aug 24, 2021

R: @apilloud

if (i > 0) {
query.append(", ");
}
query.append(fieldsAccessed.get(i).getFieldName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the field name is FROM (or another reserved keyword)?

}
}
query.append(" FROM ");
query.append(getLocation());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing bug, but what happens when the table name is WHERE?

return String.format("SELECT * FROM %s", getLocation());
}

// Build query from field access descriptor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdbcUtil.generateStatement does something similar for INSERT INTO. There is also a generateWriteStatement method below in this file. They all seem to have the same bugs. It would be nice if there was some code reuse across these.

@ibzib
Copy link
Contributor Author

ibzib commented Sep 3, 2021

@apilloud Escaping strings is difficult, since JDBC has to work with various incompatible SQL implementations. The answers to this Stack Overflow question suggest that we can learn the database's quoting syntax via the JDBC Connection. The good news is that we can access the Connection through the PreparedStatement in JdbcIO's StatementPreparator. The bad news is there's no way to change the query, aside from setting parameters, which cannot be used for column and table names.

try (PreparedStatement statement =
connection.prepareStatement(
query.get(), ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)) {
statement.setFetchSize(fetchSize);
parameterSetter.setParameters(context.element(), statement);
try (ResultSet resultSet = statement.executeQuery()) {

So to get this to work I think we would have to add a method to JdbcIO that takes a functional interface that takes a Connection and returns a fresh PreparedStatement.

@codecov
Copy link

codecov bot commented Sep 4, 2021

Codecov Report

Merging #15373 (c4e0b4a) into master (9228136) will increase coverage by 0.07%.
The diff coverage is n/a.

❗ Current head c4e0b4a differs from pull request most recent head 9a4cdfb. Consider uploading reports for the commit 9a4cdfb to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15373      +/-   ##
==========================================
+ Coverage   83.69%   83.76%   +0.07%     
==========================================
  Files         440      442       +2     
  Lines       59950    60050     +100     
==========================================
+ Hits        50173    50302     +129     
+ Misses       9777     9748      -29     
Impacted Files Coverage Δ
...n/apache_beam/ml/gcp/recommendations_ai_test_it.py 56.81% <0.00%> (-12.95%) ⬇️
sdks/python/apache_beam/io/source_test_utils.py 88.47% <0.00%> (-1.39%) ⬇️
.../apache_beam/io/gcp/datastore/v1new/datastoreio.py 86.45% <0.00%> (-0.99%) ⬇️
sdks/python/apache_beam/io/localfilesystem.py 91.47% <0.00%> (-0.78%) ⬇️
sdks/python/apache_beam/io/gcp/bigquery_tools.py 86.67% <0.00%> (-0.74%) ⬇️
sdks/python/apache_beam/runners/direct/executor.py 96.25% <0.00%> (-0.54%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.51% <0.00%> (-0.25%) ⬇️
sdks/python/apache_beam/dataframe/frames.py 94.87% <0.00%> (-0.07%) ⬇️
setup.py 0.00% <0.00%> (ø)
.../python/apache_beam/io/gcp/resource_identifiers.py 100.00% <0.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9228136...9a4cdfb. Read the comment docs.

@ibzib
Copy link
Contributor Author

ibzib commented Nov 1, 2021

Closing this since string escaping has unresolved blockers.

@ibzib ibzib closed this Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants