Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize nested columns as JSON strings #9007

Merged
merged 1 commit into from Jan 28, 2020

Conversation

robdiciuccio
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

SQL Lab queries against databases with nested columns/results containing structs or maps are failing with the following error:

pyarrow.lib.ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: struct<...>

Serializing results to JSON is not the ideal solution, mainly for performance reasons, but there are pending fixes yet to be released in Arrow 1.0 that may improve the situation:
apache/arrow#6199

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

Ensure queries against databases containing nested columns (particularly arrays containing maps) are successful. Per #8978, the following query should succeed and produce the proper results:

SELECT id, json_agg(json_build_object('table_name',table_name,'database_id',database_id)) FROM (SELECT * FROM tables) AS tables GROUP BY id

ADDITIONAL INFORMATION

REVIEWERS

@lxhoang97 @graceguo-supercat @john-bodley @michellethomas @villebro

@codecov-io
Copy link

Codecov Report

Merging #9007 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9007   +/-   ##
=======================================
  Coverage   59.16%   59.16%           
=======================================
  Files         367      367           
  Lines       11680    11680           
  Branches     2863     2863           
=======================================
  Hits         6910     6910           
  Misses       4591     4591           
  Partials      179      179

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04be1d4...bad166a. Read the comment docs.

Copy link
Member

@nytai nytai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if pa.types.is_temporal(pa_data[i].type):
# TODO: revisit nested column serialization once Arrow 1.0 is released with:
# https://github.com/apache/arrow/pull/6199
# Related issue: #8978
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use url instead of issue #?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@craig-rueda craig-rueda merged commit 2fc3d84 into apache:master Jan 28, 2020
@craig-rueda craig-rueda deleted the rd/arrow-nested-serialization branch January 28, 2020 23:50
etr2460 pushed a commit to etr2460/incubator-superset that referenced this pull request Feb 1, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants