Speeding up Aalen Additive Regression #421

springcoil · 2018-03-08T13:16:46Z

Hi, I've been working on a project for a few months now and one problem I have is that it can take about 4 days to run on 340k rows, with about 6 features.

I know lifelines isn't necessarily designed for this and I've discovered that the ridge regression solve step is the biggest bottleneck - 60% of the compute time happens there.

Are there alternative algorithms I can use like mini-batch say? Rather than the ridge regression?

CamDavidsonPilon · 2018-03-08T13:53:02Z

Four days!? That is beyond inappropriate. I’ll look into this for the next release

springcoil · 2018-03-08T13:55:25Z

https://stats.stackexchange.com/questions/83272/fastest-way-to-run-ridge-regression-on-large-datasets-where-np makes me think that this is a problem with my BLAS/LAPACK. Do you want me to check what they are.

springcoil · 2018-03-08T13:56:14Z

What sort of debugging information would you need from me? How can I help you?

springcoil · 2018-03-08T14:18:49Z

The server I was running this on - I took the spec/versions I'm using from it. Let me know if that helps :)

NumPy version:     1.11.3
Python version:    2.7.12 | packaged by conda-forge | (default, Sep  8 2016, 14:22:31) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Platform:          linux2-x86_64
AMD/Intel CPU?     True
VML available?     False
Number of threads used by default: 8 (out of 24 detected cores)

>>> np.__config__.show() 

lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/net/DataScience_public/jupyter_virtual_envs/python2/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/net/DataScience_public/jupyter_virtual_envs/python2/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/net/DataScience_public/jupyter_virtual_envs/python2/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/net/DataScience_public/jupyter_virtual_envs/python2/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_mkl_info:
  NOT AVAILABLE

springcoil · 2018-03-08T16:28:45Z

In https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/fitters/aalen_additive_fitter.py#L188 is this correct. It seems to me it's predicting per row in table not per time event.

CamDavidsonPilon · 2018-03-08T23:38:43Z

It seems to me it's predicting per row in table not per time event.

the for loop iterates over the times https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/fitters/aalen_additive_fitter.py#L170

springcoil · 2018-03-12T13:23:49Z

Ok I'll close that bugfix. Any idea how to speed this up Cameron? 4 days isn't adequate.

CamDavidsonPilon · 2018-03-12T13:30:29Z

For now, just sample down to a smaller number of observations

springcoil · 2018-03-12T18:37:52Z

I've tried that but not really suitable for my problem. But thanks :) Let me know if I can help in anyway :)

CamDavidsonPilon · 2018-03-27T02:21:51Z

Looking at the profile of the code, most of the time is spent in solving the least-squares problem. I've found some modest performance increases there (~50% faster for tall datasets), slated for the next release

springcoil · 2018-03-27T04:45:10Z

Ahh cool. Could you use gradient boosting?

…

On Tue, 27 Mar 2018, 3:21 am Cameron Davidson-Pilon, < ***@***.***> wrote: Looking at the profile of the code, most of the time is spent in solving the least-squares problem. I've found some modest performance increases there (~10%) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#421 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiAqJsUHNCOOqMtI0pHDDOjmYn3MWks5tiaJBgaJpZM4Siqbs> .

CamDavidsonPilon · 2018-03-28T17:29:40Z

I don't know what gradient boosting would solve. The LR step is part of the inference algorithm

springcoil · 2018-03-30T13:19:36Z

Just so I understand, are you saying that gradient boosting wouldn't work for the inference step?

CamDavidsonPilon · 2018-03-30T13:42:27Z

That's correct

springcoil · 2018-03-30T13:54:40Z

Ahh that makes sense. It looks like your solving step change is the right approach. I suspect there's other performance improvements. Looking forward to the new release :)

springcoil mentioned this issue Mar 8, 2018

BUGFIX: Removing unnecessary indentation in Aalen Additve fitter #423

Closed

CamDavidsonPilon added the performance label Mar 21, 2018

CamDavidsonPilon mentioned this issue Jan 10, 2019

v0.17.0 #604

Merged

CamDavidsonPilon closed this as completed in #604 Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up Aalen Additive Regression #421

Speeding up Aalen Additive Regression #421

springcoil commented Mar 8, 2018

CamDavidsonPilon commented Mar 8, 2018

springcoil commented Mar 8, 2018

springcoil commented Mar 8, 2018

springcoil commented Mar 8, 2018 •

edited

springcoil commented Mar 8, 2018

CamDavidsonPilon commented Mar 8, 2018

springcoil commented Mar 12, 2018

CamDavidsonPilon commented Mar 12, 2018

springcoil commented Mar 12, 2018 •

edited

CamDavidsonPilon commented Mar 27, 2018 •

edited

springcoil commented Mar 27, 2018 via email

CamDavidsonPilon commented Mar 28, 2018

springcoil commented Mar 30, 2018

CamDavidsonPilon commented Mar 30, 2018

springcoil commented Mar 30, 2018

Speeding up Aalen Additive Regression #421

Speeding up Aalen Additive Regression #421

Comments

springcoil commented Mar 8, 2018

CamDavidsonPilon commented Mar 8, 2018

springcoil commented Mar 8, 2018

springcoil commented Mar 8, 2018

springcoil commented Mar 8, 2018 • edited

springcoil commented Mar 8, 2018

CamDavidsonPilon commented Mar 8, 2018

springcoil commented Mar 12, 2018

CamDavidsonPilon commented Mar 12, 2018

springcoil commented Mar 12, 2018 • edited

CamDavidsonPilon commented Mar 27, 2018 • edited

springcoil commented Mar 27, 2018 via email

CamDavidsonPilon commented Mar 28, 2018

springcoil commented Mar 30, 2018

CamDavidsonPilon commented Mar 30, 2018

springcoil commented Mar 30, 2018

springcoil commented Mar 8, 2018 •

edited

springcoil commented Mar 12, 2018 •

edited

CamDavidsonPilon commented Mar 27, 2018 •

edited