Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BetaGeoBetaBinomFitter.fit() not converging #259

Open
cereusperuv opened this issue Mar 21, 2019 · 21 comments
Open

BetaGeoBetaBinomFitter.fit() not converging #259

cereusperuv opened this issue Mar 21, 2019 · 21 comments
Labels
version issues code changing and breaking when versions change and so forth...

Comments

@cereusperuv
Copy link

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!

Raised by BaseFitter._fit():

ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.

print(output) in BaseFitter._fit():

      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])

Autograd warnings:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)

Numpy warning:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
@CamDavidsonPilon
Copy link
Owner

Hey @cereusperuv, thanks for the report. Is is possible for you to send the dataset over (email is fine for me), or can you answer the following:

  1. what is the length of the dataset?
  2. What is the maximum value of frequency?
  3. Also, does adding a small penalizer help? BetaGeoBetaBinomFitter(penalizer=0.001)?

@cereusperuv
Copy link
Author

Hi @CamDavidsonPilon and thanks for your prompt reply. I will send you my dataset shortly, just have to edit it a tad first. Meanwhile, I can tell you that

  1. The pre-processed dataset on form (id, frequency, recency, T) has length 24308.
  2. The maximum frequency is 152.
  3. A small penalizer does not alter the situation. I have tried the values 0.001, 0.01, 0.1, 1.

@CamDavidsonPilon
Copy link
Owner

CamDavidsonPilon commented Mar 21, 2019

Thanks.

Try grouping values together, ex:

df_ = df.groupby(["frequency", "recency", "periods"]).size().reset_index()

BetaGeoBetaBinomFitter().fit(df_['frequency'], df_['recency'], df_['periods'], weights=df_[0])

@cereusperuv
Copy link
Author

cereusperuv commented Mar 21, 2019

Grouping leads to a never ending stream of new autograd warning messages of the type

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: overflow encountered in power
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in multiply
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in multiply
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in subtract
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\numpy\numpy_vjps.py:59: RuntimeWarning: invalid value encountered in multiply
  lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:14: RuntimeWarning: invalid value encountered in subtract
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * ans * (psi(b) - psi(a + b))))
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:20: RuntimeWarning: invalid value encountered in double_scalars
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * (psi(b) - psi(a + b))))
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:20: RuntimeWarning: invalid value encountered in subtract
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * (psi(b) - psi(a + b))))

And here's the print(output):

      fun: nan
 hess_inv: array([[ 0.79984663,  0.03997259, -0.20695784,  0.07178192],
       [ 0.03997259,  2.34117064, -0.12746917,  1.19970841],
       [-0.20695784, -0.12746917,  0.80712602, -0.07767407],
       [ 0.07178192,  1.19970841, -0.07767407,  2.0667213 ]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 113
      nit: 2
     njev: 113
   status: 2
  success: False
        x: array([ 101.70315342, 1768.09221547, -118.58555294, 1572.85915566])

@CamDavidsonPilon
Copy link
Owner

Yea, I can reproduce the error locally. I'm playing with it now. If you are stuck, try downgrading to lifetimes 0.10.1

@cereusperuv
Copy link
Author

cereusperuv commented Mar 21, 2019

Grouping and penalizing (0.001) makes it work without any warnings, but the results are non-sensical. It also takes forever to fit (didn't time it, but something like 20 minutes on my laptop). BG-NBD fits in less than a second. Thanks, I'll try downgrading.

Edit: Changed "Pareto-NBD" above to "BG-NBD"

@cereusperuv
Copy link
Author

@CamDavidsonPilon Sorry for taking so long to follow up on this, I have been busy with other work.

There seems to be stability issues with the BG-BB autograd optimizer. For some of my data it does not converge, for other data it converges but to a nonsensical solution (alpha = beta = delta = gamma), and for some data it converges to a plausible solution but which is just a bad fit that heavily under-estimates repeat-buying (or rather over-estimates the number of non-returning customers). Note that the data I have used is proper TX data from my workplace where the only difference between cohorts is acquisition month. Note also that the BetaGeoFitter() generates a great fit in a fraction of a second for these datasets.

First of all, I was able to make some progress by changing periodicity from 'D' to 'W'. With ’D’, I never observed any convergence at all. So everything I discuss here is based on that.

Some patterns that I have observed:

  • The model converges better the newer the cohort is.
  • It converges better on ungrouped data (without using 'weights')
  • Adding a penalizer has an effect, and there seems to be a sweet spot where, if convergent, the result is optimum, although far from optimal.

I will now move on and build a simpler version of the BG-BB without the autograd optimizer, and also look into how to suitably parallelize the computations in our Hadoop system. I will follow this thread though and would of course be delighted if there is a resolution.

On another note, plot_period_transactions() generated error for me when used with fit BG-BB model. The solution that worked was to modify row 169 in beta_geo_beta_binom_fitter.py from np.array(sum([n_] * n_cust for (n_, n_cust) in zip(n_periods, weights))) to np.array(sum([[n_] * n_cust for (n_, n_cust) in zip(n_periods, weights)], [])). But maybe I am not supposed to use plot_period_transactions() with BG-BB-model? Another issue is that when used with grouped data, the weights are not accounted for in line 63.

@psygo psygo added the version issues code changing and breaking when versions change and so forth... label Jul 12, 2019
@amrishan
Copy link
Contributor

Not facing the convergence issue when ModifiedBetaGeoFitter is used instead of BetaGeoFitter. Is adding 1 to freq in the equation fixed this issue. I need your thoughts.
modifiedBetaGeoFitter
image
BetaGeoFitter
image

@pallenmar
Copy link

pallenmar commented Nov 20, 2019

A friend and I were working through the BG/BB model as well. We wrote a separate version of the log-likelihood which fixes some of the stability issues; I think the

    def _loglikelihood(params, x, tx, T):
        warnings.simplefilter(action="ignore", category=FutureWarning)

        """Log likelihood for optimizer."""
        alpha, beta, gamma, delta = params

        betaln_ab = betaln(alpha, beta)
        betaln_gd = betaln(gamma, delta)

        A = betaln(alpha + x, beta + T - x) - betaln_ab + betaln(gamma, delta + T) - betaln_gd

        B = 1e-15 * np.ones_like(T)
        recency_T = T - tx - 1

        for j in np.arange(recency_T.max() + 1):
            ix = recency_T >= j
            B = B + ix * betaf(alpha + x, beta + tx - x + j) * betaf(gamma + 1, delta + tx + j)

        B = log(B) - betaln_gd - betaln_ab
        return logaddexp(A, B)

The calculation of B at the very end can cause instability (in our case, it was that having super large values of the parameters allowed for arbitrarily large (more) negative log likelihood (i.e., large values of the parameters gave better and better log likelihood). We found that the following fixed this, where we moved the subtraction of betaln_ab and betaln_gd into the for loop and took and exponential.

def loglikelihood(params, x, tx, T):

    """Log likelihood for optimizer."""
    alpha, beta, gamma, delta = params

    betaln_ab = betaln(alpha, beta)
    betaln_gd = betaln(gamma, delta)
    A1 = betaln(alpha + x, beta + T - x)
    A2 = betaln(gamma, delta + T)
    A = A1 - betaln_ab + A2 - betaln_gd

    B = 0 * np.ones_like(T)
    recency_T = T - tx - 1
    
    for j in np.arange(recency_T.max() + 1):
        ix = recency_T >= j
        B1 = betaln(alpha + x, beta + tx - x + j)
        B2 = betaln(gamma+ 1, delta + tx + j)
        B = B + ix * np.exp(B1 - betaln_gd + B2 - betaln_ab)

    log_B = log(B)
    answer = logaddexp(A, log_B)

    return answer

In addition, I'm not 100% sure but I think doing the log_params in the negative log likelihood, and then taking its exponent causes the printout of something like overflow errors.

@cereusperuv
Copy link
Author

cereusperuv commented Jan 2, 2020

@pallenmar Sorry for my inactivity, have been working on other projects. Today I tried moving the ab/gd-log-betas inside the for loop like you suggested but I saw no change in convergence with my data. The result is still totally dependent on the amount of regularization. Thanks for your help though!

@Trollgeir
Copy link

What's the current status of this issue?

@cereusperuv
Copy link
Author

cereusperuv commented May 14, 2020

@Trollgeir Still unresolved. Not working actively on it anymore.

copybara-service bot pushed a commit to GoogleCloudPlatform/cloud-for-marketing that referenced this issue Aug 20, 2021
Beta-Geometric/Beta-Bernoulli model is a special variant of the
beta-binomial model without the binomial coefficient. It is particularly
efficient for discrete-time analyses. It is considered "experimental" as
the associated "Fitter" in the lifetimes python package is buggy (see
CamDavidsonPilon/lifetimes#259). An override
is provided in lifetimes_ext which utilizes exponentials and natural
logarithms to reduce the observed convergence instability. See
custom_beta_geo_beta_binom_fitter.py for details.
BG/BB literature: https://brucehardie.com/papers/020/fader_et_al_mksc_10.pdf

Change-Id: Icc2701b42bc720635e5d96c9c809c48c0f93e945
GitOrigin-RevId: 9765a1b45087a8b0c05b7eb6070b32d64a81b017
@aniroxxsc
Copy link

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!

Raised by BaseFitter._fit():

ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.

print(output) in BaseFitter._fit():

      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])

Autograd warnings:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)

Numpy warning:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Try using summary_data_from_transaction_data function
refer this page --> https://lifetimes.readthedocs.io/en/latest/Quickstart.html
under heading "Example using transactional datasets"

from lifetimes.datasets import load_transaction_data
from lifetimes.utils import summary_data_from_transaction_data

transaction_data = load_transaction_data()
print(transaction_data.head())
"""
date id
0 2014-03-08 00:00:00 0
1 2014-05-21 00:00:00 1
2 2014-03-14 00:00:00 2
3 2014-04-09 00:00:00 2
4 2014-05-21 00:00:00 2
"""

summary = summary_data_from_transaction_data(transaction_data, 'id', 'date', observation_period_end='2014-12-31')

print(summary.head())
"""
frequency recency T
id
0 0.0 0.0 298.0
1 0.0 0.0 224.0
2 6.0 142.0 292.0
3 0.0 0.0 147.0
4 2.0 9.0 183.0
"""

bgf.fit(summary['frequency'], summary['recency'], summary['T'])

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

@diligejy
Copy link

diligejy commented Jul 13, 2022

Have you ever tested with a non-zero recency dataframe?
I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

@diligejy
Copy link

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!
Raised by BaseFitter._fit():
ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.
print(output) in BaseFitter._fit():

      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])

Autograd warnings:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)

Numpy warning:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Try using summary_data_from_transaction_data function refer this page --> https://lifetimes.readthedocs.io/en/latest/Quickstart.html under heading "Example using transactional datasets"

from lifetimes.datasets import load_transaction_data from lifetimes.utils import summary_data_from_transaction_data

transaction_data = load_transaction_data() print(transaction_data.head()) """ date id 0 2014-03-08 00:00:00 0 1 2014-05-21 00:00:00 1 2 2014-03-14 00:00:00 2 3 2014-04-09 00:00:00 2 4 2014-05-21 00:00:00 2 """

summary = summary_data_from_transaction_data(transaction_data, 'id', 'date', observation_period_end='2014-12-31')

print(summary.head()) """ frequency recency T id 0 0.0 0.0 298.0 1 0.0 0.0 224.0 2 6.0 142.0 292.0 3 0.0 0.0 147.0 4 2.0 9.0 183.0 """

bgf.fit(summary['frequency'], summary['recency'], summary['T'])

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

I have resolved using this solution

@alphaB787
Copy link

alphaB787 commented Jul 25, 2022

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

Hi,
Thank you. Your idea worked fine for me after I added 1 to recency and T columns but I had to add penalizer as well.
The problem I'm facing now is that plots aren't good.
The actual output are scaled to the right. (the actual bar starts from 1 but the predict one started from 0)

How did you overcome this problem please
predicting
?

@diligejy
Copy link

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

Hi, Thank you. Your idea worked fine for me after I added 1 to recency and T columns but I had to add penalizer as well. The problem I'm facing now is that plots aren't good. The actual output are scaled to the right. (the actual bar starts from 1 but the predict one started from 0)

How did you overcome this problem please predicting ?

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

@alphaB787
Copy link

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

Thank you for the reply.

No, I haven't used that function. I create the data frame manually.
Could tell me which method you mentioned earlier?

Because the code you sent, it looks same as the summary function given by the library. Maybe I couldn't notice the differences :D

@diligejy
Copy link

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

Thank you for the reply.

No, I haven't used that function. I create the data frame manually. Could tell me which method you mentioned earlier?

Because the code you sent, it looks same as the summary function given by the library. Maybe I couldn't notice the differences :D

you can refer to this link

If you see code in the method at this link

@humbertaco
Copy link

I stumbled upon this same issue some time ago. Downgrading to earlier versions (those that do not use autograd) seem to solve the issue.

Also, a search on Google brought me to this stack overflow page, where it is suggested to use the Nelder-Mead method in the minimize function for another problem. I run a quick test with lifetimes and indeed the error is gone, although I have not tested the consequences of such change. Is anyone aware of the implications that changing the minimize method could have?

@ColtAllen
Copy link

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

As previously pointed out, this is not recommended. Those zero-value customers may still be alive and it will throw off the statistical assumptions of the model if they are removed. In fact, it isn't recommended to filter on any of the RFM values except T (the length of the observation period).

@diligejy:

print(summary.head())
frequency recency T       id 
0.0       0.0     298.0    0
0.0       0.0     224.0    1
6.0       142.0   292.0    2 
0.0       0.0     147.0    3 
2.0       9.0     183.0    4 

These are large T values. In this paper it goes into detail the causes of these same numerical errors for the BG-NBD model, and suggests an alternative time unit for T. I've encountered situations where a model trains just fine over shorter time periods but falls apart with the same data when trained on longer time horizons.

Run your data back through calibration_and_holdout_data or summary_data_from_transaction_data and change the freq parameter from 'D' to something like weeks or months, then retrain and let me know how that goes:

https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units

That same paper also provides suggestions on how to reformulate the log-likelihood for the BG-NBD and ParetoNBD models to fix these issues. Lifetimes is no longer being actively maintained, but I've forked this library and have been rebuilding it on a new modeling backend, incorporating the suggestions in that paper. Currently the library in the Beta; if anyone is interested in contributing let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version issues code changing and breaking when versions change and so forth...
Projects
None yet
Development

No branches or pull requests