[BUGFIX]: Check datatype of results before converting to DataFrame #4108

marcusianlevine · 2017-12-22T16:48:59Z

The merged hotfix #2412 resolved an issue with Presto/Hive columns #2398 which are lists of dictionaries.

However, this introduced issue #3934 with traditional RDB datasources like MS SQL Server, which only return lists of lists. Wrapping such results in an outer list causes pandas to interpret the list as a single column.

This PR attempts to resolve that issue by introspecting the datatypes of the columns in the first row of the returned data. The type check may not be checking for exactly the right thing, but we likely need some sort of conditional to avoid wrapping a list of lists in an outer list

xrmx · 2017-12-23T12:20:40Z

Could you please clean the branch from other people commits?

xrmx

Better to add a test so we don't regress.

xrmx · 2017-12-23T12:20:55Z

superset/sql_lab.py

@@ -231,11 +231,12 @@ def handle_error(msg):

    # check whether the result set is comprised of lists or dict
    if data and len(data) > 0:


Yep you're right the len check is redundant

xrmx · 2017-12-23T12:26:26Z

superset/sql_lab.py

-            df_data = np.array(data)
-        else:
+        first_row = data[0]
+        first_row_types = set([type(c) for c in first_row])


Is this more readable?

any([isinstance(c, dict) for c in first_row])

Yeah I just took a stab at refactoring for clarity

marcusianlevine · 2017-12-23T15:10:35Z

@xrmx how would you recommend testing this? AFAIK the only reason for using list(data) instead of np.array(data) would be when reading certain nested column types from Hive or Presto

The existing sqllab_tests.py has queries in it and those tests seem to be passing, but we only test against MySQL, SQLite, and Postgres from what I can tell.

Any suggestions how we might simulate an embedded dict column from the test suite?

xrmx · 2017-12-23T16:22:55Z

The easiest way to test that code would be to move it to an helper and unit test it.

fix type checking fix conditional checks remove trailing whitespace and fix df_data fallback def actually remove trailing whitespace generalized type check to check all columns for dict refactor dict col check

marcusianlevine · 2017-12-24T00:54:39Z

Ok moved the df conversion logic to helper and wrote two simple unit tests.

Not sure the tests actually prevent regression because I'm mocking the query results...

add missing newlines another missing newline fix quotes more quote fixes

marcusianlevine · 2018-01-20T05:12:40Z

@xrmx do you think this is ready to merge?

…Frame (#4108)" This reverts commit 4bc5fe5.

…pache#4108) * conditional check on datatype of results before converting to df fix type checking fix conditional checks remove trailing whitespace and fix df_data fallback def actually remove trailing whitespace generalized type check to check all columns for dict refactor dict col check * move df conversion to helper and add unit test add missing newlines another missing newline fix quotes more quote fixes

marcusianlevine force-pushed the df-np-conversion branch from ca0f310 to 2d5e427 Compare December 22, 2017 21:04

marcusianlevine changed the title ~~Bugfix: Check datatype of results before converting to DataFrame~~ [BUGFIX]: Check datatype of results before converting to DataFrame Dec 23, 2017

xrmx suggested changes Dec 23, 2017

View reviewed changes

marcusianlevine force-pushed the df-np-conversion branch 2 times, most recently from 6959b35 to ea14efe Compare December 23, 2017 15:05

conditional check on datatype of results before converting to df

bd9aa75

fix type checking fix conditional checks remove trailing whitespace and fix df_data fallback def actually remove trailing whitespace generalized type check to check all columns for dict refactor dict col check

marcusianlevine force-pushed the df-np-conversion branch 2 times, most recently from 5db8d3e to 29be9c2 Compare December 24, 2017 00:52

move df conversion to helper and add unit test

be2a850

add missing newlines another missing newline fix quotes more quote fixes

marcusianlevine force-pushed the df-np-conversion branch from 8e44e76 to be2a850 Compare December 24, 2017 00:58

mistercrunch merged commit 4bc5fe5 into apache:master Jan 24, 2018

mistercrunch added a commit that referenced this pull request Jan 27, 2018

Revert "[BUGFIX]: Check datatype of results before converting to Data…

40eaf2f

…Frame (#4108)" This reverts commit 4bc5fe5.

marcusianlevine deleted the df-np-conversion branch February 3, 2018 03:10

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0 labels Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX]: Check datatype of results before converting to DataFrame #4108

[BUGFIX]: Check datatype of results before converting to DataFrame #4108

marcusianlevine commented Dec 22, 2017 •

edited

Loading

xrmx commented Dec 23, 2017

xrmx left a comment

xrmx Dec 23, 2017

marcusianlevine Dec 23, 2017

xrmx Dec 23, 2017

marcusianlevine Dec 23, 2017

marcusianlevine commented Dec 23, 2017

xrmx commented Dec 23, 2017

marcusianlevine commented Dec 24, 2017

marcusianlevine commented Jan 20, 2018

		@@ -231,11 +231,12 @@ def handle_error(msg):

		# check whether the result set is comprised of lists or dict
		if data and len(data) > 0:

[BUGFIX]: Check datatype of results before converting to DataFrame #4108

[BUGFIX]: Check datatype of results before converting to DataFrame #4108

Conversation

marcusianlevine commented Dec 22, 2017 • edited Loading

xrmx commented Dec 23, 2017

xrmx left a comment

Choose a reason for hiding this comment

xrmx Dec 23, 2017

Choose a reason for hiding this comment

marcusianlevine Dec 23, 2017

Choose a reason for hiding this comment

xrmx Dec 23, 2017

Choose a reason for hiding this comment

marcusianlevine Dec 23, 2017

Choose a reason for hiding this comment

marcusianlevine commented Dec 23, 2017

xrmx commented Dec 23, 2017

marcusianlevine commented Dec 24, 2017

marcusianlevine commented Jan 20, 2018

marcusianlevine commented Dec 22, 2017 •

edited

Loading