Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add union to test_all_types, and arrow and json R/W #7701

Merged
merged 50 commits into from
Jul 12, 2023

Conversation

Mause
Copy link
Member

@Mause Mause commented May 26, 2023

This started out adding support to the Python .description field for the union data type, but I realised that we didn't have it in the test_all_types table function.

  • Added support for R/W in Arrow, specifically as a Sparse Union. Reading Dense Unions will still fail
  • Added read support for Unions in JSON. Write support was already there, so I just followed the existing pattern
  • Corrected a number of locations where the Union tag type was incorrect - a uint8_t instead of a int8_t
  • Parquet doesn't support unions (see https://issues.apache.org/jira/browse/PARQUET-756)
  • Added support for reading unions into numpy/pandas (as object type)
  • R support was left on the table, I'm not familiar enough with R to do this without guidance
  • Basic Java union read support is there, but only stringly typed - I'll add full support when I added Struct support (soon)

@Mause Mause assigned Mytherin and lnkuiper and unassigned Mytherin and lnkuiper May 27, 2023
@Mause Mause requested review from lnkuiper and Mytherin May 27, 2023 11:14
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have left some comments about the JSONTransform implementation.

I'm also wondering about how other systems map from JSON to UNION. Do you have any idea? Perhaps we're the only system out there that does this. I always thought that reading JSON like this:

{"value": 42}
{"value": [1, 2, 3]}

Could be read using UNION(int INTEGER, int_list INT[]) without having to wrap them in an object with the keys "int" or "int_list".

But now that I see your implementation, I'm not sure what makes the most sense, especially since I'm not a heavy JSON user myself.

extension/json/json_functions/json_transform.cpp Outdated Show resolved Hide resolved
extension/json/json_functions/json_transform.cpp Outdated Show resolved Hide resolved
extension/json/json_functions/json_transform.cpp Outdated Show resolved Hide resolved
extension/json/json_functions/json_transform.cpp Outdated Show resolved Hide resolved
extension/json/json_functions/json_transform.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great.

As this is adding unions to the arrow conversion - could we add some union-specific arrow conversion tests in the Python client? In particular nested unions, unions with lists/structs inside them, unions inside lists/structs, all are not covered yet but could potentially cause problems.

@Mause Mause marked this pull request as draft June 22, 2023 07:47
@Mause Mause marked this pull request as ready for review June 22, 2023 07:47
@Mause Mause marked this pull request as draft June 22, 2023 09:06
@Mause Mause marked this pull request as ready for review June 22, 2023 09:07
@Mytherin Mytherin changed the base branch from feature to master July 4, 2023 13:16
@Mause Mause force-pushed the bugfix/python-union-description branch from d83de13 to 0791ba5 Compare July 5, 2023 14:08
@Mause Mause marked this pull request as draft July 5, 2023 14:10
@Mause Mause marked this pull request as ready for review July 6, 2023 07:50
@Mause
Copy link
Member Author

Mause commented Jul 7, 2023

Failures don't look related my changes?

@Mytherin
Copy link
Collaborator

Mytherin commented Jul 7, 2023

Could you merge with master?

@github-actions github-actions bot marked this pull request as draft July 7, 2023 08:53
@Mytherin Mytherin marked this pull request as ready for review July 7, 2023 10:38
@Mytherin Mytherin merged commit a684c42 into duckdb:master Jul 12, 2023
53 checks passed
@Maxxen
Copy link
Member

Maxxen commented Jul 12, 2023

👏👏👏

@Mause Mause deleted the bugfix/python-union-description branch July 13, 2023 03:55
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state

[Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz

Python TIMESTAMPTZ support

Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci

[CI] More CI reduction and clean-up

Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description

Add union to test_all_types, and arrow and json R/W
krlmlr pushed a commit to krlmlr/duckdb-r that referenced this pull request Sep 2, 2023
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

- Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support

- Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up

- Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W

- Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 5, 2023
- Merge pull request duckdb/duckdb#8307 from Tishj/chunk_scan_state: [Arrow] Add ChunkScanState interface to preserve chunk-offset when scanning

- Merge pull request duckdb/duckdb#8089 from pdet/basepython_tz: Python TIMESTAMPTZ support

- Merge pull request duckdb/duckdb#8052 from Mytherin/evenlessci: [CI] More CI reduction and clean-up

- Merge pull request duckdb/duckdb#7701 from Mause/bugfix/python-union-description: Add union to test_all_types, and arrow and json R/W

- Merge pull request duckdb/duckdb#8497 from samansmink/pending-execute-result-api-change: Add PendingExecutionResult::ALL_TASKS_BLOCKED
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants