Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more data_types.extract_pyarrow_schema_from_pandas #181

Closed
JPFrancoia opened this issue Apr 15, 2020 · 2 comments
Closed

No more data_types.extract_pyarrow_schema_from_pandas #181

JPFrancoia opened this issue Apr 15, 2020 · 2 comments
Assignees
Labels
documentation Improvement or bugfixes on docs question Further information is requested

Comments

@JPFrancoia
Copy link
Contributor

JPFrancoia commented Apr 15, 2020

I noticed that in the API for version > 1.0, the module data_types disappeared. Which basically means it's not possible to get the schema of a dataframe with: data_types.extract_pyarrow_schema_from_pandas.

This function was quite handy, I normally call it to get the schema of a dataframe (often imported from csv), before calling wr.glue.create_table.

wr.glue.create_table is now called wr.glue.create_csv_table, and it has a parameter columns_types which is the schema of the dataframe (the format of the schema changed, and is now a Dict[str, str]). But I can't generate this schema anymore, which means my custom crawlers don't work anymore.

Of course I could get this schema with pyarrow, but I was wondering if it was intentional to remove this feature, and if it would be back at some point. Ideally a function that returns the schema in the format that columns_typesexpect would be great.

Cheers

@JPFrancoia JPFrancoia added the bug Something isn't working label Apr 15, 2020
igorborgest added a commit that referenced this issue Apr 15, 2020
igorborgest added a commit that referenced this issue Apr 15, 2020
@igorborgest
Copy link
Contributor

Hi @JPFrancoia!

The catalog.extract_athena_types was developed to replace the old data_types.extract_pyarrow_schema_from_pandas.

You can extract the columns_types and the partitions_types from your DataFrame and then create your CSV table with them.

I just added a new tutorial to address this case. What you think?

@JPFrancoia
Copy link
Contributor Author

Ah perfect, sorry I missed the function in the doc. The tutorial is exactly the workflow I use. Also, the get_table_types function is nice, it can be used to compare the schema of previous crawler runs to the schema of the new files being crawled. Useful to detect new incoming schemas.

Once again, thanks for your help!

@igorborgest igorborgest added documentation Improvement or bugfixes on docs question Further information is requested and removed bug Something isn't working labels Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvement or bugfixes on docs question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants