Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogNormalFitter has strange properties #622

Closed
CamDavidsonPilon opened this issue Jan 31, 2019 · 1 comment · Fixed by #624
Closed

LogNormalFitter has strange properties #622

CamDavidsonPilon opened this issue Jan 31, 2019 · 1 comment · Fixed by #624

Comments

@CamDavidsonPilon
Copy link
Owner

CamDavidsonPilon commented Jan 31, 2019

I've spent far too much time trying to debug the LNF model. Trouble is, it's unstable for some inputs. Example:

  1. As the following script shows, the model is only stable for ~0.06 < sigma < ~3.5. Anything outside this range will cause the dreaded Desired error not necessarily achieved due to precision loss in minimize.
from lifelines import LogNormalFitter
from lifelines.utils import ConvergenceError

MU = np.linspace(-10, 10, 5)
SIGMA_ = np.linspace(0.0001, 6, 25)
R = np.zeros((5, 25))

for i, mu_ in enumerate(MU):
    for j, sigma_ in enumerate(SIGMA_):
        try:
            N = 20000
            print(mu_)
            print(sigma_)

            X, C = np.exp(sigma_ * np.random.randn(N) + mu_), np.exp(np.random.randn(N) + mu_)
            E = X <= C
            T = np.minimum(X, C)

            LogNormalFitter().fit(T, E)
            R[i, j] = 1
        except ConvergenceError:
            R[i, j] = 0

plt.matshow(R)
plt.xticks(np.arange(25), SIGMA_)
plt.yticks(np.arange(5), MU)

screen shot 2019-01-31 at 2 24 30 pm

  1. AFAIK, the log-likelihood and the gradients are computed correctly, though it would be useful to have a second set of eyes on them. Even scipy's check_gradients seems to confirm:
print(check_grad(_negative_log_likelihood, gradient_function, [0, 0], log(T), E))
  1. When there is no censorship, the model converges for all values...

  2. Adding a penalizer doesn't seem to help.

  3. When I power transform the durations by the inverse standard deviation of the log(T), this seems to help convergence - however I can't get back the original parameters.

  4. Not specifying the gradient function, jac seems to help! That is:

minimize(_negative_log_likelihood, init, args=(log(T), E), method='BFGS')

converges, but

minimize(_negative_log_likelihood, init, args=(log(T), E), method='BFGS', jac=gradient_function)

seems to fail

@CamDavidsonPilon
Copy link
Owner Author

I removed the scaling in the exp domain, and this seemed to really help. Only very small values of sigma will cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant