Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Table.to_dataframe create real sparse frames #809

Merged
merged 4 commits into from
Mar 7, 2019

Conversation

cdiener
Copy link
Contributor

@cdiener cdiener commented Mar 7, 2019

So this is a proposed fix to #808. It basically specifies the fill value for sparse data explicitly. I also added a test to check for that in the future.

Before that the the matrix data was first converted to individual pandas.SparseSeries and passed to pandas.SparseDataFrame. This is now done directly without the intermediate allocations since SparseDataFrame accepts any scipy sparse matrix. If Table.matrix_data can be something else than a numpy matrix or scipy sparse matrix this might not work but for now all tests seem to pass. Happy to change that back to the old behavior if that was on purpose.

As a side effect to_dataframe is now much faster on large sparse data sets. Takes about 20s on my machine for the American Gut biom (was more than 30m before).

@wasade
Copy link
Member

wasade commented Mar 7, 2019 via email

Copy link
Member

@wasade wasade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems reasonable, able to add a mention to the ChangeLog by chance?

@cdiener
Copy link
Contributor Author

cdiener commented Mar 7, 2019

Was introduced in pandas 0.20.0, so should be fine with the current version requirement (>=0.20.0).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 86.132% when pulling bbf66b3 on cdiener:fix/sparse_to_dataframe into 2932695 on biocore:master.

@coveralls
Copy link

coveralls commented Mar 7, 2019

Coverage Status

Coverage remained the same at 86.438% when pulling f9676d8 on cdiener:fix/sparse_to_dataframe into 2932695 on biocore:master.

@ElDeveloper ElDeveloper merged commit ac2f835 into biocore:master Mar 7, 2019
@ElDeveloper
Copy link
Member

Looks great!

@wasade
Copy link
Member

wasade commented Mar 8, 2019

Thanks @cdiener and @ElDeveloper!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants