Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the category type for performance wins #30

Merged
merged 1 commit into from
Jan 19, 2017

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Dec 20, 2016

So just came across Categorical Data in pandas, and this blog post on how it dramatically improves performance on data with text categories.

This is not yet tested, but I think it makes sense. What do you think? Are there other places that need it?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Dec 21, 2016

travis is complaining because diff has been removed.

Our test show that this works but does not make a huge difference.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 81.257% when pulling 5295cf0 on SEMCOG:use_category into 6d2f3dc on UDST:master.

@janowicz
Copy link
Contributor

Good to know about the category data type. I can definitely see this being useful. Does this particular change still make sense even though there's not much performance difference?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 18, 2017

Sorry, just got back from a trip.

I don't think it matters much, but I think it makes sense the point of the function is to make categories that may as well be "Categorical."

@janowicz janowicz merged commit 554788a into UDST:master Jan 19, 2017
@Eh2406 Eh2406 deleted the use_category branch January 19, 2017 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants