Issue in load_file
some datasets in Snowflake
#104
Labels
bug
Something isn't working
priority/high
High priority
product/python-sdk
Label describing products
Version:
astro==0.4.0
Problem
At the moment, we are unable to load the following dataset from Tate Gallery into Snowflake: https://github.com/tategallery/collection/blob/master/artwork_data.csv. The operation works using BQ and Postgres. I could not find any particular issue with the original dataset.
Exception:
How to reproduce
Download the dataset
artwork_data.csv.
Update the
tests/benchmark/config.json
file to include a dataset similar to:And a database that uses Snowflake.
From within the
tests/benchmark
folder, run:Initial analysis
The first step of
load_file
is to load the CSV to a Pandas data frame; In the case of this particular dataset, Pandas automagically assigns the following types per column:When analyzing the values within height, it is possible to see that there is a mixture of strings, floats, and nan:
Why doesn't this happen for BQ & Postgres?
Because they are currently using a different strategy to write from the data frame into the table in the database:
https://github.com/astro-projects/astro/blob/4e63302bc5c69401b10568598c4ff738e21563f5/src/astro/utils/load_dataframe.py#L60-L95
The text was updated successfully, but these errors were encountered: