Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametric regression models may have been misreporting results pre-24.0 #931

Closed
CamDavidsonPilon opened this issue Jan 21, 2020 · 4 comments · Fixed by #932 or #939
Closed

Parametric regression models may have been misreporting results pre-24.0 #931

CamDavidsonPilon opened this issue Jan 21, 2020 · 4 comments · Fixed by #932 or #939

Comments

@CamDavidsonPilon
Copy link
Owner

CamDavidsonPilon commented Jan 21, 2020

TLDR: if you have been using a custom parametric regression model, or GeneralizedGammaRegressionFitter, you should update your code ASAP. Users of parametric AFT models are fine.


It was discovered today that the parameters estimated in a parametric regression model could be not associated to the correct label (i.e. variable name). In practice, when a user hit print_summary, they would be displayed with the correct estimates but with the wrong label beside it. This problem would also extend to any prediction or plotting methods.

The root problem was a reordering of a dictionary in the autograd flatten function. Ex:

from autograd.misc import flatten
d = {'b': 1., 'a': 2., 'c': 0.}
array, unflatten = flatten(d)
print(array)
# ordered like a, b, c

lifelines should have performed an unflatten on the result to recover the original ordering, but it didn't.

It was first discovered in the documentation's example of the three parameter Cure model (of which there are no tests, since it was an example). Thankfully, almost all lifelines implemented parametric regression fitters have tests against other survival libraries, and they are fine. However, GeneralizedGammaRegressionFitter does not have tests against other libraries, and could be reporting incorrect values.

@CamDavidsonPilon
Copy link
Owner Author

I've added some peripheral tests, and an assert in the code, that should prevent this from happening again.

@CamDavidsonPilon
Copy link
Owner Author

I think this problem has reappeared but now with the .variance_matrix_, which will effect the std. errors reported.

@CamDavidsonPilon CamDavidsonPilon changed the title Parametric regression models may have been misreporting results Parametric regression models may have been misreporting results pre-24.0 Jul 20, 2020
@StefanoBandera123
Copy link

Hi Cameron, I am running your example on customer churn https://github.com/CamDavidsonPilon/lifelines/blob/master/examples/Customer%20Churn.ipynb .

It seems I get this type of error when I call the fitter:
cph = CoxPHFitter().fit(churn_data, 'tenure', 'Churn', strata=strata_cols)

"ValueError: could not convert string to float: 'Female' "

How shoud I change the code?

@CamDavidsonPilon
Copy link
Owner Author

Hi @StefanoBandera123 - should be fixed now

@CamDavidsonPilon CamDavidsonPilon unpinned this issue Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants