No more data_types.extract_pyarrow_schema_from_pandas #181

JPFrancoia · 2020-04-15T07:54:48Z

I noticed that in the API for version > 1.0, the module data_types disappeared. Which basically means it's not possible to get the schema of a dataframe with: data_types.extract_pyarrow_schema_from_pandas.

This function was quite handy, I normally call it to get the schema of a dataframe (often imported from csv), before calling wr.glue.create_table.

wr.glue.create_table is now called wr.glue.create_csv_table, and it has a parameter columns_types which is the schema of the dataframe (the format of the schema changed, and is now a Dict[str, str]). But I can't generate this schema anymore, which means my custom crawlers don't work anymore.

Of course I could get this schema with pyarrow, but I was wondering if it was intentional to remove this feature, and if it would be back at some point. Ideally a function that returns the schema in the format that columns_typesexpect would be great.

Cheers

The text was updated successfully, but these errors were encountered:

add CSV tutorials #181

igorborgest · 2020-04-15T15:21:04Z

Hi @JPFrancoia!

The catalog.extract_athena_types was developed to replace the old data_types.extract_pyarrow_schema_from_pandas.

You can extract the columns_types and the partitions_types from your DataFrame and then create your CSV table with them.

I just added a new tutorial to address this case. What you think?

JPFrancoia · 2020-04-16T07:19:20Z

Ah perfect, sorry I missed the function in the doc. The tutorial is exactly the workflow I use. Also, the get_table_types function is nice, it can be used to compare the schema of previous crawler runs to the schema of the new files being crawled. Useful to detect new incoming schemas.

Once again, thanks for your help!

JPFrancoia added the bug Something isn't working label Apr 15, 2020

JPFrancoia assigned igorborgest Apr 15, 2020

igorborgest added a commit that referenced this issue Apr 15, 2020

add CSV tutorials #181

1c36f0b

igorborgest mentioned this issue Apr 15, 2020

add CSV tutorials #181 #183

Merged

igorborgest added a commit that referenced this issue Apr 15, 2020

Merge pull request #183 from awslabs/csv-tutorial

fdb843d

add CSV tutorials #181

JPFrancoia closed this as completed Apr 16, 2020

igorborgest added documentation Improvement or bugfixes on docs question Further information is requested and removed bug Something isn't working labels Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No more data_types.extract_pyarrow_schema_from_pandas #181

No more data_types.extract_pyarrow_schema_from_pandas #181

JPFrancoia commented Apr 15, 2020 •

edited

igorborgest commented Apr 15, 2020

JPFrancoia commented Apr 16, 2020

No more data_types.extract_pyarrow_schema_from_pandas #181

No more data_types.extract_pyarrow_schema_from_pandas #181

Comments

JPFrancoia commented Apr 15, 2020 • edited

igorborgest commented Apr 15, 2020

JPFrancoia commented Apr 16, 2020

JPFrancoia commented Apr 15, 2020 •

edited