New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-5186] Adding numeric support to BQ Sink #6255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -160,6 +161,13 @@ | |||
MAX_RETRIES = 3 | |||
|
|||
|
|||
def decimal_default_encoder(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call this 'default_encoder'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -160,6 +161,13 @@ | |||
MAX_RETRIES = 3 | |||
|
|||
|
|||
def decimal_default_encoder(obj): | |||
if isinstance(obj, decimal.Decimal): | |||
return str(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these string values properly get written to a BQ column of NUMERIC type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they do. We need to convert to string because neither Python nor JSON numbers support the precision that NUMERIC has. BQ is able to handle this.
@@ -1156,6 +1167,8 @@ def _convert_cell_value_to_dict(self, value, field): | |||
# when querying, the repeated and/or record fields are flattened | |||
# unless we pass the flatten_results flag as False to the source | |||
return self.convert_row_to_dict(value, field) | |||
elif field.type == 'NUMERIC': | |||
return decimal.Decimal(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the documentation in bigquery.py to include information regarding NUMERIC type (for sync and source).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some docs.
@@ -57,6 +58,21 @@ def test_row_as_dict(self): | |||
test_value = {'s': 'abc', 'i': 123, 'f': 123.456, 'b': True} | |||
self.assertEqual(test_value, coder.decode(coder.encode(test_value))) | |||
|
|||
def test_decimal_in_row_as_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also try out BQ jobs that read from/write to a BQ column of NUMERIC type with Dataflow/Direct runners (two runners operate in slightly different paths).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this and it works well.
@@ -160,6 +161,13 @@ | |||
MAX_RETRIES = 3 | |||
|
|||
|
|||
def decimal_default_encoder(obj): | |||
if isinstance(obj, decimal.Decimal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to support following (bigquery.BigQueryWrapper._convert_cell_value_to_dict) to support DirectRunner.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already updated that (this is now in line 1170). - Sorry, I misunderstood. I've added that check.
LGTM. Thanks. Please self merge. |
r: @chamikaramj
LMK if you think i should add an e2e test for this.