feat: support RANGE in queries Part 2: Arrow #1868

Linchin · 2024-03-23T00:02:00Z

This PR supports getting RANGE values in queries as Arrow. This is the second part of the RANGE queries PR, and should not be merged until part 1 is merged. Part 1 #1884 supports RANGE as JSON.

Part of #1724🦕

Linchin · 2024-03-28T22:53:54Z

tests/unit/test_table.py

@@ -3503,7 +3503,10 @@ def test_to_dataframe_no_tqdm_no_progress_bar(self):
        user_warnings = [
            warning for warning in warned if warning.category is UserWarning
        ]
-        self.assertEqual(len(user_warnings), 0)
+        # Note: number of warnings is inconsistent across python versions


I removed the assertion for number of warnings, because we get different number of them with different python versions. Same for line 3540

Linchin · 2024-03-28T22:54:08Z

tests/unit/test_table.py

@@ -3534,7 +3537,10 @@ def test_to_dataframe_no_tqdm(self):
        user_warnings = [
            warning for warning in warned if warning.category is UserWarning
        ]
-        self.assertEqual(len(user_warnings), 1)
+        # Note: number of warnings is inconsistent across python versions


I removed the assertion for number of warnings, because we get different number of them with different python versions.

Linchin · 2024-03-29T23:01:13Z

Adding a csv file for system test due to RANGE is not supported in JSON with load jobs.

This reverts commit c46c65c.

shollyman

Thanks for putting this together. Mostly minor comments and questions.

shollyman · 2024-04-15T20:33:12Z

google/cloud/bigquery/_pandas_helpers.py

@@ -142,6 +142,12 @@ def bq_to_arrow_struct_data_type(field):
    return pyarrow.struct(arrow_fields)


+def bq_to_arrow_range_data_type(field):
+    element_type = field.element_type.upper()
+    arrow_element_type = _pyarrow_helpers.bq_to_arrow_scalars(element_type)()


do we need to do validation here? None-check?

Great point, I will add a None-check here

I added it, as well as the unit tests in test__pandas_helpers.py.

shollyman · 2024-04-15T20:36:27Z

google/cloud/bigquery/_pandas_helpers.py

@@ -274,6 +286,22 @@ def types_mapper(arrow_data_type):
        elif time_dtype is not None and pyarrow.types.is_time(arrow_data_type):
            return time_dtype

+        elif pyarrow.types.is_struct(arrow_data_type):


Do we need to handle structs more generally here, or is that logic elsewhere?

Good question! Indeed, our types mapper function doesn't seem to do any conversion for STRUCT or ARRAY. This function is used as the parameter types_mapper to pyarrow's Table.to_pandas(), allowing for customizable type mapping from pyarrow to pandas. I'm not entirely sure why we didn't provide struct/array mapping options, but I think it might be related to the fact that the fields of a struct can be of any type, so the conversion can become quite complicated.

shollyman · 2024-04-15T20:45:42Z

google/cloud/bigquery/table.py

+                # only supports upto pandas 1.3. If pandas.ArrowDtype is not
+                # present, we raise a warning and set range_date_dtype to None.
+                msg = (
+                    "Unable ro find class ArrowDtype in pandas, setting "


s/ro/to/ here and in the other two msgs below

Thanks for catching this!

shollyman · 2024-04-15T20:49:04Z

google/cloud/bigquery_v2/types/model.py

@@ -103,6 +103,7 @@ class Model(proto.Message):

    class ModelType(proto.Enum):
        r"""Indicates the type of the Model."""
+


these formatting changes are benign, but it's not clear why they're in this PR. Maybe pull out unrelated cleanups into a separate PR?

Indeed, I will revert these changes

shollyman · 2024-04-15T20:54:37Z

tests/unit/test_table.py

            df = row_iterator.to_dataframe(create_bqstorage_client=False)

-        user_warnings = [


Not familiar with these tests, but are we losing some existing signal here? Or did test expectations change somehow? Its not clear how this related to the PR intent.

Indeed, we are losing some signals here - we no longer validate the number of warnings here. With this PR we will have different length of warnings depending on the version of pandas. I deleted the warnings check, because I didn't want to have pandas version hard-coded in our tests (Should I be concerned about this?). Another alternative I can think of is to check self.assertTrue(len(user_warnings) in {length_with_older_pandas, length_with_newer_pandas})- this way we will lose some signal but less. What do you think?

If all the supported versions emit at least one UserWarning we could still do an assert on the count being nonzero, but if we don't think this is a useful validation I'm fine removing it as well.

shollyman

Left some followup comments and a very minor nit, otherwise LGTM. Thanks!

shollyman · 2024-04-17T19:24:04Z

google/cloud/bigquery/_helpers.py

    else:
-        raise ValueError(f"Unsupported range field type: {value}")
+        raise ValueError(f"Unsupported range field type: {field.element_type}")


nit: should we change this to indicate the element type is unsupported, rather than "field type"?

Indeed, I'll change it to be consistent with the name of the field.

shollyman · 2024-04-17T19:28:35Z

tests/unit/test_table.py

            df = row_iterator.to_dataframe(create_bqstorage_client=False)

-        user_warnings = [


If all the supported versions emit at least one UserWarning we could still do an assert on the count being nonzero, but if we don't think this is a useful validation I'm fine removing it as well.

Linchin · 2024-04-18T20:14:41Z

If all the supported versions emit at least one UserWarning we could still do an assert on the count being nonzero, but if we don't think this is a useful validation I'm fine removing it as well.

Indeed, I added back the test, and it will pass when len(warnings) == 0 or len(warnings) == 3, so we can cover both cases. I added a comment that explains why and when each value should be true. (Same for another warnings number check).

feat: support range in queries as dict

5dd6b24

Linchin requested a review from shollyman March 23, 2024 00:02

Linchin assigned shollyman Mar 23, 2024

Linchin requested review from a team as code owners March 23, 2024 00:02

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Mar 23, 2024

Linchin added 3 commits March 25, 2024 18:55

fix sys tests

74fb1d3

lint

a67e1aa

add arrow support

75a9855

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Mar 28, 2024

Linchin added 4 commits March 27, 2024 17:15

Merge branch 'main' into get-query-results-range

53635bc

Merge branch 'main' into get-query-results-range

5dfd65e

fix python 3.7 test error

73a5001

print dependencies in sys test

6a735ca

Linchin commented Mar 28, 2024

View reviewed changes

Linchin added 3 commits March 29, 2024 18:40

add unit test and docs

d54336a

fix unit test

8dc4ae5

add func docs

1b2d68f

Linchin added 8 commits March 30, 2024 00:01

add sys test for tabledata.list in arrow

6f93d8e

add sys test for tabledata.list as iterator

005d409

lint

839eafe

fix docs error

58a0e18

fix docstring

cc12e1b

fix docstring

691710c

fix docstring

6d5ce1b

docs

3ddfbf8

Linchin changed the title ~~feat: support RANGE in queries in json and arrow~~ feat: support RANGE in queries Part 2: Arrow Apr 3, 2024

Linchin added 3 commits April 3, 2024 16:06

Revert "move dtypes mapping code"

0be9fb6

This reverts commit c46c65c.

remove commented out assertions

b7f3779

Merge branch 'main' into get-query-results-range

edc8b5c

shollyman reviewed Apr 15, 2024

View reviewed changes

Linchin added 4 commits April 15, 2024 14:49

typo and formats

2a0d518

Merge branch 'main' into get-query-results-range

a0d01f7

add None-check for range_element_type and add unit tests

2c9782f

change test skip condition

40afa27

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 15, 2024

Linchin added 2 commits April 16, 2024 10:35

fix test error

203e0c0

change test skip condition

bb17b3b

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2024

change test skip condition

e58739a

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2024

Linchin added 2 commits April 16, 2024 12:38

change decorator order

c3db3c9

use a different way to construct test data

2211dd0

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2024

shollyman approved these changes Apr 17, 2024

View reviewed changes

Linchin added 3 commits April 18, 2024 12:25

fix error message and add warning number check

e2a9552

Merge branch 'main' into get-query-results-range

0357b6f

add warning number check and comments

4c20bd7

Linchin merged commit 5251b5d into googleapis:main Apr 18, 2024
18 checks passed

release-please bot mentioned this pull request Apr 18, 2024

chore(main): release 3.22.0 #1905

Merged

This was referenced May 14, 2024

May 13, 2024 kitta65/bq-extension-vscode#393

Closed

May 13, 2024 kitta65/prettier-plugin-bq#380

Closed

May 13, 2024 kitta65/bq2cst#362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support RANGE in queries Part 2: Arrow #1868

feat: support RANGE in queries Part 2: Arrow #1868

Linchin commented Mar 23, 2024 •

edited

Linchin Mar 28, 2024

Linchin Mar 28, 2024

Linchin commented Mar 29, 2024

shollyman left a comment

shollyman Apr 15, 2024

Linchin Apr 15, 2024

Linchin Apr 15, 2024

shollyman Apr 15, 2024

Linchin Apr 16, 2024

shollyman Apr 15, 2024

Linchin Apr 15, 2024

shollyman Apr 15, 2024

Linchin Apr 15, 2024

shollyman Apr 15, 2024

Linchin Apr 15, 2024

shollyman Apr 17, 2024

shollyman left a comment

shollyman Apr 17, 2024

Linchin Apr 18, 2024

shollyman Apr 17, 2024

Linchin commented Apr 18, 2024

		@@ -103,6 +103,7 @@ class Model(proto.Message):

		class ModelType(proto.Enum):
		r"""Indicates the type of the Model."""

		df = row_iterator.to_dataframe(create_bqstorage_client=False)

		user_warnings = [

feat: support RANGE in queries Part 2: Arrow #1868

feat: support RANGE in queries Part 2: Arrow #1868

Conversation

Linchin commented Mar 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Linchin commented Mar 29, 2024

shollyman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shollyman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Linchin commented Apr 18, 2024

Linchin commented Mar 23, 2024 •

edited