Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5186] Adding numeric support to BQ Sink #6255

Merged
merged 3 commits into from Aug 24, 2018

Conversation

pabloem
Copy link
Member

@pabloem pabloem commented Aug 20, 2018

r: @chamikaramj

LMK if you think i should add an e2e test for this.


Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -160,6 +161,13 @@
MAX_RETRIES = 3


def decimal_default_encoder(obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this 'default_encoder'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -160,6 +161,13 @@
MAX_RETRIES = 3


def decimal_default_encoder(obj):
if isinstance(obj, decimal.Decimal):
return str(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these string values properly get written to a BQ column of NUMERIC type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do. We need to convert to string because neither Python nor JSON numbers support the precision that NUMERIC has. BQ is able to handle this.

@@ -1156,6 +1167,8 @@ def _convert_cell_value_to_dict(self, value, field):
# when querying, the repeated and/or record fields are flattened
# unless we pass the flatten_results flag as False to the source
return self.convert_row_to_dict(value, field)
elif field.type == 'NUMERIC':
return decimal.Decimal(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation in bigquery.py to include information regarding NUMERIC type (for sync and source).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some docs.

@@ -57,6 +58,21 @@ def test_row_as_dict(self):
test_value = {'s': 'abc', 'i': 123, 'f': 123.456, 'b': True}
self.assertEqual(test_value, coder.decode(coder.encode(test_value)))

def test_decimal_in_row_as_dict(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also try out BQ jobs that read from/write to a BQ column of NUMERIC type with Dataflow/Direct runners (two runners operate in slightly different paths).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this and it works well.

@@ -160,6 +161,13 @@
MAX_RETRIES = 3


def decimal_default_encoder(obj):
if isinstance(obj, decimal.Decimal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to support following (bigquery.BigQueryWrapper._convert_cell_value_to_dict) to support DirectRunner.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1122

Copy link
Member Author

@pabloem pabloem Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already updated that (this is now in line 1170). - Sorry, I misunderstood. I've added that check.

@chamikaramj
Copy link
Contributor

LGTM. Thanks.

Please self merge.

@pabloem pabloem merged commit 7eba10c into apache:master Aug 24, 2018
@pabloem pabloem deleted the bq-numeric-py branch August 24, 2018 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants