New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up Aalen Additive Regression #421
Comments
Four days!? That is beyond inappropriate. I’ll look into this for the next release |
https://stats.stackexchange.com/questions/83272/fastest-way-to-run-ridge-regression-on-large-datasets-where-np makes me think that this is a problem with my BLAS/LAPACK. Do you want me to check what they are. |
What sort of debugging information would you need from me? How can I help you? |
The server I was running this on - I took the spec/versions I'm using from it. Let me know if that helps :)
|
In https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/fitters/aalen_additive_fitter.py#L188 is this correct. It seems to me it's predicting per row in table not per time event. |
the |
Ok I'll close that bugfix. Any idea how to speed this up Cameron? 4 days isn't adequate. |
For now, just sample down to a smaller number of observations |
I've tried that but not really suitable for my problem. But thanks :) Let me know if I can help in anyway :) |
Looking at the profile of the code, most of the time is spent in solving the least-squares problem. I've found some modest performance increases there (~50% faster for tall datasets), slated for the next release |
Ahh cool. Could you use gradient boosting?
…On Tue, 27 Mar 2018, 3:21 am Cameron Davidson-Pilon, < ***@***.***> wrote:
Looking at the profile of the code, most of the time is spent in solving
the least-squares problem. I've found some modest performance increases
there (~10%)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#421 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiAqJsUHNCOOqMtI0pHDDOjmYn3MWks5tiaJBgaJpZM4Siqbs>
.
|
I don't know what gradient boosting would solve. The LR step is part of the inference algorithm |
Just so I understand, are you saying that gradient boosting wouldn't work for the inference step? |
That's correct |
Ahh that makes sense. It looks like your solving step change is the right approach. I suspect there's other performance improvements. Looking forward to the new release :) |
Hi, I've been working on a project for a few months now and one problem I have is that it can take about 4 days to run on 340k rows, with about 6 features.
I know lifelines isn't necessarily designed for this and I've discovered that the ridge regression solve step is the biggest bottleneck - 60% of the compute time happens there.
Are there alternative algorithms I can use like mini-batch say? Rather than the ridge regression?
The text was updated successfully, but these errors were encountered: