Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Initial Guess to Regression Fitters #661

Closed
bacalfa opened this issue Mar 5, 2019 · 12 comments · Fixed by #663
Closed

Provide Initial Guess to Regression Fitters #661

bacalfa opened this issue Mar 5, 2019 · 12 comments · Fixed by #663

Comments

@bacalfa
Copy link

bacalfa commented Mar 5, 2019

It would be great to be able to provide an initial guess point (warm-start) to the regression fitters, such as WeibullAFTFitter. I'm referring to this line:

init_values = np.zeros((n_params,))

I've been comparing this particular fitter to R's survreg, and for some datasets, their solutions don't agree at all. I'd like to provide the same initial values to both codes and hopefully get the same solution.

@CamDavidsonPilon
Copy link
Owner

Yikes, I'm not pleased that surveg and lifelines give different results. Are you able to share an example dataset here (or privately over email)?

It's easy enough to let users provide initial values, I can add that to the next release shortly.

@bacalfa
Copy link
Author

bacalfa commented Mar 5, 2019

Thanks for the quick response! Unfortunately, I can't share the data publicly nor privately. But I'll be happy to report the status after I'm able to use initial points. :) By the way, I had convergence issues with survreg before if I didn't provide any initial point in some cases. So achieving convergence can be sensitive to initial points.

@CamDavidsonPilon
Copy link
Owner

Can you describe your dataset more?

  1. What are the dimensions of it?

Some information about the fit, too:

  1. Does lifelines provide a smaller log-likelihood than R survreg? (you can see the log-likelihood using .print_summary() in lifelines)
  2. Are any warnings displayed when .fit is called?

@CamDavidsonPilon
Copy link
Owner

flexsurvreg does some smarter initializations using summary statistics. From their docs:

If not specified, default initial values are chosen from a simple summary of the
survival or censoring times, for example the mean is often used to initialize scale
parameters. See the object flexsurv.dists for the exact methods used. If the
likelihood surface may be uneven, it is advised to run the optimisation starting
from various different initial values to ensure convergence to the true global
maximum.

@bacalfa
Copy link
Author

bacalfa commented Mar 6, 2019

Thanks! There were some problems with the code I was using. Will report after the experiments are finished. But it's an intercept-only model (no covariates) with datasets that range from fewer than 10 to nearly 100 points (most are suspensions).

@CamDavidsonPilon
Copy link
Owner

What about using the simpler WeibullFitter, which naturally takes no covariates? It will probably be faster too.

@bacalfa
Copy link
Author

bacalfa commented Mar 6, 2019

That could be done. It's just that I wanted to compare the AFT fitter with survreg. :) I have other tests with covariates as well.

@CamDavidsonPilon
Copy link
Owner

CamDavidsonPilon commented Mar 6, 2019

@bacalfa, if you update to 0.20.0 (on PyPI now), please try out the new defaults (i.e don't provide initial_point) to compare against R.

(0.20.0 is python3 only, and has some updated dependencies too FYI)

@bacalfa
Copy link
Author

bacalfa commented Mar 6, 2019

I'm using Anaconda, and the latest version seems to be 0.19.5.

On a related note, survreg offers the option to fix the scale parameter (e.g., scale=1) and to pass parameters to the optimizer via survreg.control. Those are interesting features as well.... :)

@CamDavidsonPilon
Copy link
Owner

I just updated conda

@bacalfa
Copy link
Author

bacalfa commented Mar 19, 2019

By the way, WeibullAFTFitter was more robust than survreg on my test cases. :) Even without any special initialization. But having the ability to initialize the decision variables is important.

@CamDavidsonPilon
Copy link
Owner

woohoo! Thanks for reporting back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants