Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Support for BigNumeric BQ data type #161

Closed
dhaval-d opened this issue Dec 8, 2020 · 7 comments
Closed

Bug: Support for BigNumeric BQ data type #161

dhaval-d opened this issue Dec 8, 2020 · 7 comments
Assignees
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@dhaval-d
Copy link
Contributor

dhaval-d commented Dec 8, 2020

Currently the tool doesn't support a BigNumeric data type(still not GA yet). However, the customers are using this data type as its enabled through alpha process for them.

Hence, we need this tool to support that data type.

Currently I get following exception:

Traceback (most recent call last):
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/bin/data-validation", line 8, in
sys.exit(main())
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/main.py", line 248, in main
run(args)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/main.py", line 227, in run
run_validations(args, config_managers)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/main.py", line 203, in run_validations
run_validation(config_manager, verbose=args.verbose)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/main.py", line 192, in run_validation
validator.execute()
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/data_validation.py", line 85, in execute
self.validation_builder, process_in_memory=True
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/data_validation.py", line 160, in _execute_validation
source_query = validation_builder.get_source_query()
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/validation_builder.py", line 211, in get_source_query
query = self.source_builder.compile(**source_config)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/query_builder/query_builder.py", line 238, in compile
table = clients.get_ibis_table(data_client, schema_name, table_name)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/data_validation/clients.py", line 86, in get_ibis_table
return client.table(table_name, database=schema_name)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/bigquery/client.py", line 406, in table
t = super().table(name, database=database)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/client.py", line 115, in table
schema = self._get_table_schema(qualified_name)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/bigquery/client.py", line 430, in _get_table_schema
return self.get_schema(table, database=dataset)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/bigquery/client.py", line 526, in get_schema
return sch.infer(bq_table)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/multipledispatch/dispatcher.py", line 278, in call
return func(*args, **kwargs)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/bigquery/client.py", line 80, in bigquery_schema
return sch.schema(fields)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/multipledispatch/dispatcher.py", line 278, in call
return func(*args, **kwargs)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/expr/schema.py", line 174, in schema_from_mapping
return Schema.from_dict(d)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/expr/schema.py", line 101, in from_dict
return Schema(*zip(*dictionary.items()))
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/expr/schema.py", line 31, in init
self.types = list(map(dt.dtype, types))
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/multipledispatch/dispatcher.py", line 278, in call
return func(*args, **kwargs)
File "/Users/dhavaldurve/pso-td2bq/apps/data_validation_wrapper/venv/lib/python3.7/site-packages/ibis/expr/datatypes.py", line 1543, in from_string
'{!r} cannot be parsed as a datatype'.format(value)
ibis.common.exceptions.IbisTypeError: 'BIGNUMERIC' cannot be parsed as a datatype

@dhaval-d dhaval-d added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Dec 8, 2020
@dhercher
Copy link
Collaborator

dhercher commented Dec 9, 2020

It looks like the BigQuery client had no issues. Ibis broke since the type is not in the list of BQ -> Ibis type lookups
https://github.com/ibis-project/ibis/blob/master/ibis/backends/bigquery/client.py#L30

A hacky fix is to add
-> data_validation/clients.py

from ibis.bigquery.client import BigQueryClient, _DTYPE_TO_IBIS_TYPE
_DTYPE_TO_IBIS_TYPE["BIGNUMERIC"] = dt.Decimal(38, 9)

This is not a proper solution, alternatively we can use text if that is the ideal final solution in Ibis.
_DTYPE_TO_IBIS_TYPE["BIGNUMERIC"] = dt.string

If we use numeric, then Ibis treats the value as such and can apply sum, mean, ... normally
If we use string then there is no worry around precision, but normal numeric functions won't be applied properly

@dhercher
Copy link
Collaborator

dhercher commented Dec 9, 2020

We are working to temporarily fix the issue using
#162

However, since numeric type is pulled from the source types we are likely to still hit issues during execution

@dhaval-d
Copy link
Contributor Author

dhaval-d commented Dec 9, 2020

When we try to use first approach mentioned [here](#161 (comment))

Following error is thrown:
google.api_core.exceptions.InvalidArgument: 400 request failed: BIGNUMERIC columns are not supported with the Arrow format

Full stack trace:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/main.py", line 263, in
main()
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/main.py", line 252, in main
run(args)
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/main.py", line 231, in run
run_validations(args, config_managers)
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/main.py", line 207, in run_validations
run_validation(config_manager, verbose=args.verbose)
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/main.py", line 196, in run_validation
validator.execute()
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/data_validation.py", line 85, in execute
self.validation_builder, process_in_memory=True
File "/Users/dhavaldurve/professional-services-data-validator/data_validation/data_validation.py", line 166, in _execute_validation
source_df = self.source_client.execute(source_query)
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/ibis/client.py", line 215, in execute
result = self._execute_query(query_ast, **kwargs)
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/ibis/bigquery/client.py", line 421, in _execute_query
return query.execute()
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/ibis/bigquery/client.py", line 182, in execute
result = self._fetch(cur)
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/ibis/bigquery/client.py", line 171, in _fetch
df = cursor.query.to_dataframe()
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 3383, in to_dataframe
date_as_object=date_as_object,
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery/table.py", line 1727, in to_dataframe
create_bqstorage_client=create_bqstorage_client,
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery/table.py", line 1545, in to_arrow
bqstorage_client=bqstorage_client
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery/table.py", line 1434, in _to_page_iterable
for item in bqstorage_download():
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 672, in _download_table_bqstorage
max_stream_count=requested_streams,
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/cloud/bigquery_storage_v1/gapic/big_query_read_client.py", line 293, in create_read_session
request, retry=retry, timeout=timeout, metadata=metadata
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in call
return wrapped_func(*args, **kwargs)
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/Users/dhavaldurve/professional-services-data-validator/venv/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 request failed: BIGNUMERIC columns are not supported with the Arrow format

@tswast
Copy link
Collaborator

tswast commented Dec 9, 2020

We had some discussion here googleapis/python-bigquery#367

pyarrow is about 2x as fast for concatenating multiple pages of results compared to concatenating data frames directly. I wonder if there's a Python object dtype for arrow that we can use in the meantime while we wait for the properly sized numeric columns to be available?

@tswast
Copy link
Collaborator

tswast commented Dec 9, 2020

Try setting google.cloud.bigquery._pandas_helpers.BQ_TO_ARROW_SCALARS["BIGNUMERIC"] = pyarrow.string
Long-term I'd want that to be a pyarrow.decimal256, but that's not available yet

@dhercher
Copy link
Collaborator

Leaving this ticket as a reminder to

  • Update reqs to support BigNumeric in Pandas, PyArrow, and BigQuery when they exist
  • Push type changes upstream to Ibis

This is non blocking, moving off Beta and to p2

@dhercher dhercher added priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) and removed Beta priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Jan 15, 2021
@dhercher
Copy link
Collaborator

BigNumeric is now supported as of 1.1.7, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants