Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewritten cdap processor #118

Merged
merged 14 commits into from
Oct 6, 2016
Merged

rewritten cdap processor #118

merged 14 commits into from
Oct 6, 2016

Conversation

toliwaga
Copy link
Contributor

@toliwaga toliwaga commented Oct 6, 2016

Completely rewritten cdap processor following strategy outlined in #116 reimplemented cdap as a series of batch vectorized calculations

@fscottfoti
Copy link
Contributor

Awesome! How much faster is it??

@toliwaga
Copy link
Contributor Author

toliwaga commented Oct 6, 2016

@fscottfoti - benchmarking it now. I think it is faster. Also, since the previous version didn't implement the max hhsize 5 cutoff, it ran out of memory trying to handle households with lots of members. For a household with 14 members, there are 3^14 alternatives, which makes for 4,782,969 columns or alternatives, which is a stretch for pandas!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 95.773% when pulling c826458 on cdap2 into 72034da on master.

@toliwaga toliwaga merged commit 83e4dad into master Oct 6, 2016
@bstabler bstabler deleted the cdap2 branch November 4, 2016 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants