Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6769] b64 encode bytes in test when writing to bigquery #8047

Merged
merged 1 commit into from May 2, 2019

Conversation

@Juta
Copy link
Contributor

Juta commented Mar 13, 2019

This PR updates the way of writing bytes to bigquery by using base-64 encoded strings instead of bytes in the big_query_query_to_table_it_test.py

Discussion: https://issues.apache.org/jira/browse/BEAM-6769

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@Juta Juta force-pushed the Juta:bq-io branch from e396ed0 to b348403 Mar 13, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Mar 13, 2019

Run Python PostCommit

@Juta Juta force-pushed the Juta:bq-io branch 3 times, most recently from e7d3110 to dd230d9 Mar 13, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Mar 13, 2019

Run Python PostCommit

(b'dec=', datetime.date(3000, 12, 31),
datetime.time(23, 59, 59, 999999),),
(b'\xab\xac\xad', datetime.date(2000, 1, 1), datetime.time(0, 0),)]
NEW_TYPES_OUTPUT_EXPECTED = [(base64.b64encode(row[0]).decode('utf-8'),)

This comment has been minimized.

Copy link
@tvalentyn

tvalentyn Mar 13, 2019

Contributor

base64.b64encode here makes the test work around the limitation in IO. I think we need to understand first what is the intended behavior of the IO, we can continue the conversation on the associated Jira.

This comment has been minimized.

Copy link
@Juta

Juta Mar 14, 2019

Author Contributor

This was my mistake. The test now succeeds without encoding the NEW_TYPES_OUTPUT_EXPECTED

What the test does:

  • test setup: write data to bq using bq client using base-64
  • test: read this data and write it to a new table (reads data in base-64 and writes it in base-64, this is why the test can succeed)
  • matcher: queries data using bq client, this returns bytes and is than matched to the bytes in NEW_TYPES_OUTPUT_EXPECTED
@Juta Juta force-pushed the Juta:bq-io branch 3 times, most recently from 4bd7e5c to 2216384 Mar 14, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Mar 14, 2019

Run Python PostCommit

@Juta Juta force-pushed the Juta:bq-io branch from 2216384 to b486690 Mar 15, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Mar 15, 2019

Run Python PostCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Mar 18, 2019

Run Python PreCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 2, 2019

@tvalentyn can you take another look at this pr?

Copy link
Contributor

tvalentyn left a comment

Thank you, sorry for a delay on this. Left a few minor comments.

(datetime.date(2000, 1, 1),),
(datetime.date(2011, 1, 1),),
(datetime.date(3000, 12, 31),)]
(b'xyw=', datetime.date(2011, 1, 1), datetime.time(23, 59, 59, 999999),),

This comment has been minimized.

Copy link
@tvalentyn

tvalentyn Apr 6, 2019

Contributor

Let's remove the = sign after xyw and others here to avoid confusion that these strings need to be base64-encoded (odd-length base64 encoded strings are padded with =).
Also, let's replace one of the usecases with a utf-8-decodable byte-sequence. This use-case will help to make sure that this sequence doesn't get accidentally decoded somewhere. For example, we can use b'\xe4\xbd\xa0\xe5\xa5\xbd'.

]
for row in table_data:

This comment has been minimized.

Copy link
@tvalentyn

tvalentyn Apr 6, 2019

Contributor

Let's add a comment here that a bigquery client requires the user to base64-encode bytes sequences before passing them to BQ. If you have a link handy, please include it.
Thank you.

@Juta Juta force-pushed the Juta:bq-io branch from b486690 to 9bb45f4 Apr 10, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 10, 2019

Run Python PostCommit

1 similar comment
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 10, 2019

Run Python PostCommit

]
# the bigquery client expects byte values to be base-64 encoded

This comment has been minimized.

Copy link
@tvalentyn

tvalentyn Apr 10, 2019

Contributor

You could also clarify this to API Tools bigquery client and add that this is no longer necessary in google-cloud-bigquery client as you have identified.

@Juta Juta force-pushed the Juta:bq-io branch 2 times, most recently from 19d06f1 to aaab06d Apr 15, 2019
@Juta Juta force-pushed the Juta:bq-io branch from aaab06d to a53180e Apr 15, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 15, 2019

Run Python PostCommit

1 similar comment
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 16, 2019

Run Python PostCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 16, 2019

Run Python PreCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 16, 2019

Run Python PostCommit

2 similar comments
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 18, 2019

Run Python PostCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 23, 2019

Run Python PostCommit

@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 23, 2019

@tvalentyn PTAL

@tvalentyn

This comment has been minimized.

Copy link
Contributor

tvalentyn commented Apr 26, 2019

Thanks, @Juta. This PR only updates the test right? Let's modify the PR name and description accordingly.

@Juta Juta changed the title [BEAM-6769] write bytes to bigquery in python 2 [BEAM-6769] b64 encode bytes in test when writing to bigquery Apr 29, 2019
@Juta

This comment has been minimized.

Copy link
Contributor Author

Juta commented Apr 30, 2019

@tvalentyn PTAL

@tvalentyn

This comment has been minimized.

Copy link
Contributor

tvalentyn commented May 2, 2019

LGTM, thank you!

@tvalentyn

This comment has been minimized.

Copy link
Contributor

tvalentyn commented May 2, 2019

@pabloem could you please help merge? thank you!

@pabloem pabloem merged commit 615013b into apache:master May 2, 2019
6 checks passed
6 checks passed
Portable_Python ("Run Portable_Python PreCommit") SUCCESS
Details
Python ("Run Python PreCommit") SUCCESS
Details
Python SDK PostCommit Tests SUCCESS
Details
Python SDK PostCommit Tests on Python 3 SUCCESS
Details
Python_PVR_Flink ("Run Python_PVR_Flink PreCommit") SUCCESS
Details
RAT ("Run RAT PreCommit") SUCCESS
Details
@pabloem

This comment has been minimized.

Copy link
Member

pabloem commented May 2, 2019

Thanks y'all for adding this test case to make sure we handle bytes well : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.