BetaGeoBetaBinomFitter.fit() not converging #259

cereusperuv · 2019-03-21T08:40:07Z

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!

Raised by BaseFitter._fit():

ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.

print(output) in BaseFitter._fit():

      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])

Autograd warnings:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)

Numpy warning:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

The text was updated successfully, but these errors were encountered:

CamDavidsonPilon · 2019-03-21T13:11:28Z

Hey @cereusperuv, thanks for the report. Is is possible for you to send the dataset over (email is fine for me), or can you answer the following:

what is the length of the dataset?
What is the maximum value of frequency?
Also, does adding a small penalizer help? BetaGeoBetaBinomFitter(penalizer=0.001)?

cereusperuv · 2019-03-21T13:20:45Z

Hi @CamDavidsonPilon and thanks for your prompt reply. I will send you my dataset shortly, just have to edit it a tad first. Meanwhile, I can tell you that

The pre-processed dataset on form (id, frequency, recency, T) has length 24308.
The maximum frequency is 152.
A small penalizer does not alter the situation. I have tried the values 0.001, 0.01, 0.1, 1.

CamDavidsonPilon · 2019-03-21T13:44:45Z

Thanks.

Try grouping values together, ex:

df_ = df.groupby(["frequency", "recency", "periods"]).size().reset_index()

BetaGeoBetaBinomFitter().fit(df_['frequency'], df_['recency'], df_['periods'], weights=df_[0])

cereusperuv · 2019-03-21T14:01:19Z

Grouping leads to a never ending stream of new autograd warning messages of the type

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: overflow encountered in power
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in multiply
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in multiply
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in subtract
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\numpy\numpy_vjps.py:59: RuntimeWarning: invalid value encountered in multiply
  lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:14: RuntimeWarning: invalid value encountered in subtract
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * ans * (psi(b) - psi(a + b))))
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:20: RuntimeWarning: invalid value encountered in double_scalars
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * (psi(b) - psi(a + b))))
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\scipy\special.py:20: RuntimeWarning: invalid value encountered in subtract
  lambda ans, a, b: unbroadcast_f(b, lambda g: g * (psi(b) - psi(a + b))))

And here's the print(output):

      fun: nan
 hess_inv: array([[ 0.79984663,  0.03997259, -0.20695784,  0.07178192],
       [ 0.03997259,  2.34117064, -0.12746917,  1.19970841],
       [-0.20695784, -0.12746917,  0.80712602, -0.07767407],
       [ 0.07178192,  1.19970841, -0.07767407,  2.0667213 ]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 113
      nit: 2
     njev: 113
   status: 2
  success: False
        x: array([ 101.70315342, 1768.09221547, -118.58555294, 1572.85915566])

CamDavidsonPilon · 2019-03-21T14:25:09Z

Yea, I can reproduce the error locally. I'm playing with it now. If you are stuck, try downgrading to lifetimes 0.10.1

cereusperuv · 2019-03-21T14:28:57Z

Grouping and penalizing (0.001) makes it work without any warnings, but the results are non-sensical. It also takes forever to fit (didn't time it, but something like 20 minutes on my laptop). BG-NBD fits in less than a second. Thanks, I'll try downgrading.

Edit: Changed "Pareto-NBD" above to "BG-NBD"

cereusperuv · 2019-04-08T12:56:08Z

@CamDavidsonPilon Sorry for taking so long to follow up on this, I have been busy with other work.

There seems to be stability issues with the BG-BB autograd optimizer. For some of my data it does not converge, for other data it converges but to a nonsensical solution (alpha = beta = delta = gamma), and for some data it converges to a plausible solution but which is just a bad fit that heavily under-estimates repeat-buying (or rather over-estimates the number of non-returning customers). Note that the data I have used is proper TX data from my workplace where the only difference between cohorts is acquisition month. Note also that the BetaGeoFitter() generates a great fit in a fraction of a second for these datasets.

First of all, I was able to make some progress by changing periodicity from 'D' to 'W'. With ’D’, I never observed any convergence at all. So everything I discuss here is based on that.

Some patterns that I have observed:

The model converges better the newer the cohort is.
It converges better on ungrouped data (without using 'weights')
Adding a penalizer has an effect, and there seems to be a sweet spot where, if convergent, the result is optimum, although far from optimal.

I will now move on and build a simpler version of the BG-BB without the autograd optimizer, and also look into how to suitably parallelize the computations in our Hadoop system. I will follow this thread though and would of course be delighted if there is a resolution.

On another note, plot_period_transactions() generated error for me when used with fit BG-BB model. The solution that worked was to modify row 169 in beta_geo_beta_binom_fitter.py from np.array(sum([n_] * n_cust for (n_, n_cust) in zip(n_periods, weights))) to np.array(sum([[n_] * n_cust for (n_, n_cust) in zip(n_periods, weights)], [])). But maybe I am not supposed to use plot_period_transactions() with BG-BB-model? Another issue is that when used with grouped data, the weights are not accounted for in line 63.

amrishan · 2019-11-13T19:16:33Z

Not facing the convergence issue when ModifiedBetaGeoFitter is used instead of BetaGeoFitter. Is adding 1 to freq in the equation fixed this issue. I need your thoughts.
modifiedBetaGeoFitter

BetaGeoFitter

pallenmar · 2019-11-20T22:52:56Z

A friend and I were working through the BG/BB model as well. We wrote a separate version of the log-likelihood which fixes some of the stability issues; I think the

    def _loglikelihood(params, x, tx, T):
        warnings.simplefilter(action="ignore", category=FutureWarning)

        """Log likelihood for optimizer."""
        alpha, beta, gamma, delta = params

        betaln_ab = betaln(alpha, beta)
        betaln_gd = betaln(gamma, delta)

        A = betaln(alpha + x, beta + T - x) - betaln_ab + betaln(gamma, delta + T) - betaln_gd

        B = 1e-15 * np.ones_like(T)
        recency_T = T - tx - 1

        for j in np.arange(recency_T.max() + 1):
            ix = recency_T >= j
            B = B + ix * betaf(alpha + x, beta + tx - x + j) * betaf(gamma + 1, delta + tx + j)

        B = log(B) - betaln_gd - betaln_ab
        return logaddexp(A, B)

The calculation of B at the very end can cause instability (in our case, it was that having super large values of the parameters allowed for arbitrarily large (more) negative log likelihood (i.e., large values of the parameters gave better and better log likelihood). We found that the following fixed this, where we moved the subtraction of betaln_ab and betaln_gd into the for loop and took and exponential.

def loglikelihood(params, x, tx, T):

    """Log likelihood for optimizer."""
    alpha, beta, gamma, delta = params

    betaln_ab = betaln(alpha, beta)
    betaln_gd = betaln(gamma, delta)
    A1 = betaln(alpha + x, beta + T - x)
    A2 = betaln(gamma, delta + T)
    A = A1 - betaln_ab + A2 - betaln_gd

    B = 0 * np.ones_like(T)
    recency_T = T - tx - 1
    
    for j in np.arange(recency_T.max() + 1):
        ix = recency_T >= j
        B1 = betaln(alpha + x, beta + tx - x + j)
        B2 = betaln(gamma+ 1, delta + tx + j)
        B = B + ix * np.exp(B1 - betaln_gd + B2 - betaln_ab)

    log_B = log(B)
    answer = logaddexp(A, log_B)

    return answer

In addition, I'm not 100% sure but I think doing the log_params in the negative log likelihood, and then taking its exponent causes the printout of something like overflow errors.

cereusperuv · 2020-01-02T09:26:09Z

@pallenmar Sorry for my inactivity, have been working on other projects. Today I tried moving the ab/gd-log-betas inside the for loop like you suggested but I saw no change in convergence with my data. The result is still totally dependent on the amount of regularization. Thanks for your help though!

Trollgeir · 2020-05-13T21:39:32Z

What's the current status of this issue?

cereusperuv · 2020-05-14T15:27:35Z

@Trollgeir Still unresolved. Not working actively on it anymore.

Beta-Geometric/Beta-Bernoulli model is a special variant of the beta-binomial model without the binomial coefficient. It is particularly efficient for discrete-time analyses. It is considered "experimental" as the associated "Fitter" in the lifetimes python package is buggy (see CamDavidsonPilon/lifetimes#259). An override is provided in lifetimes_ext which utilizes exponentials and natural logarithms to reduce the observed convergence instability. See custom_beta_geo_beta_binom_fitter.py for details. BG/BB literature: https://brucehardie.com/papers/020/fader_et_al_mksc_10.pdf Change-Id: Icc2701b42bc720635e5d96c9c809c48c0f93e945 GitOrigin-RevId: 9765a1b45087a8b0c05b7eb6070b32d64a81b017

aniroxxsc · 2022-04-20T06:52:02Z

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!

Raised by BaseFitter._fit():

ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.

print(output) in BaseFitter._fit():

      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])

Autograd warnings:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)

Numpy warning:

C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Try using summary_data_from_transaction_data function
refer this page --> https://lifetimes.readthedocs.io/en/latest/Quickstart.html
under heading "Example using transactional datasets"

from lifetimes.datasets import load_transaction_data
from lifetimes.utils import summary_data_from_transaction_data

transaction_data = load_transaction_data()
print(transaction_data.head())
"""
date id
0 2014-03-08 00:00:00 0
1 2014-05-21 00:00:00 1
2 2014-03-14 00:00:00 2
3 2014-04-09 00:00:00 2
4 2014-05-21 00:00:00 2
"""

summary = summary_data_from_transaction_data(transaction_data, 'id', 'date', observation_period_end='2014-12-31')

print(summary.head())
"""
frequency recency T
id
0 0.0 0.0 298.0
1 0.0 0.0 224.0
2 6.0 142.0 292.0
3 0.0 0.0 147.0
4 2.0 9.0 183.0
"""

bgf.fit(summary['frequency'], summary['recency'], summary['T'])

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

diligejy · 2022-07-13T02:10:16Z

Have you ever tested with a non-zero recency dataframe?
I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

diligejy · 2022-07-13T04:08:06Z

Any ideas what might be going on? Or more likely, what I've done wrong... Main parts of the error below. Thanks!
Raised by BaseFitter._fit():
ConvergenceError: The model did not converge. Try adding a larger penalizer to see if that helps convergence.
print(output) in BaseFitter._fit():
      fun: nan
 hess_inv: array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
      jac: array([nan, nan, nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 1
      nit: 0
     njev: 1
   status: 2
  success: False
        x: array([0.1, 0.1, 0.1, 0.1])
Autograd warnings:
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in log
  return f_raw(*args, **kwargs)
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\autograd\tracer.py:48: RuntimeWarning: invalid value encountered in logaddexp
  return f_raw(*args, **kwargs)
Numpy warning:
C:\Tools\Anaconda3\envs\TestEnv\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Try using summary_data_from_transaction_data function refer this page --> https://lifetimes.readthedocs.io/en/latest/Quickstart.html under heading "Example using transactional datasets"

from lifetimes.datasets import load_transaction_data from lifetimes.utils import summary_data_from_transaction_data

transaction_data = load_transaction_data() print(transaction_data.head()) """ date id 0 2014-03-08 00:00:00 0 1 2014-05-21 00:00:00 1 2 2014-03-14 00:00:00 2 3 2014-04-09 00:00:00 2 4 2014-05-21 00:00:00 2 """

summary = summary_data_from_transaction_data(transaction_data, 'id', 'date', observation_period_end='2014-12-31')

print(summary.head()) """ frequency recency T id 0 0.0 0.0 298.0 1 0.0 0.0 224.0 2 6.0 142.0 292.0 3 0.0 0.0 147.0 4 2.0 9.0 183.0 """

bgf.fit(summary['frequency'], summary['recency'], summary['T'])

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

I have resolved using this solution

alphaB787 · 2022-07-25T11:24:32Z

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

Hi,
Thank you. Your idea worked fine for me after I added 1 to recency and T columns but I had to add penalizer as well.
The problem I'm facing now is that plots aren't good.
The actual output are scaled to the right. (the actual bar starts from 1 but the predict one started from 0)

How did you overcome this problem please

?

diligejy · 2022-07-26T00:33:54Z

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

Hi, Thank you. Your idea worked fine for me after I added 1 to recency and T columns but I had to add penalizer as well. The problem I'm facing now is that plots aren't good. The actual output are scaled to the right. (the actual bar starts from 1 but the predict one started from 0)

How did you overcome this problem please ?

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

alphaB787 · 2022-07-26T08:15:41Z

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

Thank you for the reply.

No, I haven't used that function. I create the data frame manually.
Could tell me which method you mentioned earlier?

Because the code you sent, it looks same as the summary function given by the library. Maybe I couldn't notice the differences :D

diligejy · 2022-07-26T11:20:43Z

Have you tried using summary_data_from_transaction_data function provided by the library? You should use this function first to get more accurate results. This is because there is other processing logic inside this function. If you cannot use this function, it would be better to use the method I mentioned earlier.

Thank you for the reply.

No, I haven't used that function. I create the data frame manually. Could tell me which method you mentioned earlier?

Because the code you sent, it looks same as the summary function given by the library. Maybe I couldn't notice the differences :D

you can refer to this link

If you see code in the method at this link

humbertaco · 2022-07-26T15:57:12Z

I stumbled upon this same issue some time ago. Downgrading to earlier versions (those that do not use autograd) seem to solve the issue.

Also, a search on Google brought me to this stack overflow page, where it is suggested to use the Nelder-Mead method in the minimize function for another problem. I run a quick test with lifetimes and indeed the error is gone, although I have not tested the consequences of such change. Is anyone aware of the implications that changing the minimize method could have?

ColtAllen · 2022-08-06T00:37:37Z

Have you ever tested with a non-zero recency dataframe? I used to have the same issue, so I’ve tried dataframe with recency greater than 0, then it has been resolved.

As previously pointed out, this is not recommended. Those zero-value customers may still be alive and it will throw off the statistical assumptions of the model if they are removed. In fact, it isn't recommended to filter on any of the RFM values except T (the length of the observation period).

@diligejy:

print(summary.head())
frequency recency T       id 
0.0       0.0     298.0    0
0.0       0.0     224.0    1
6.0       142.0   292.0    2 
0.0       0.0     147.0    3 
2.0       9.0     183.0    4

These are large T values. In this paper it goes into detail the causes of these same numerical errors for the BG-NBD model, and suggests an alternative time unit for T. I've encountered situations where a model trains just fine over shorter time periods but falls apart with the same data when trained on longer time horizons.

Run your data back through calibration_and_holdout_data or summary_data_from_transaction_data and change the freq parameter from 'D' to something like weeks or months, then retrain and let me know how that goes:

https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units

That same paper also provides suggestions on how to reformulate the log-likelihood for the BG-NBD and ParetoNBD models to fix these issues. Lifetimes is no longer being actively maintained, but I've forked this library and have been rebuilding it on a new modeling backend, incorporating the suggestions in that paper. Currently the library in the Beta; if anyone is interested in contributing let me know.

NudnikShpilkis mentioned this issue Jul 1, 2019

Fixed periodicity in rfm utils #287

Open

psygo added the version issues code changing and breaking when versions change and so forth... label Jul 12, 2019

psygo added this to the Correct Convergence Errors on Version 0.11.1 milestone Jul 12, 2019

Trollgeir mentioned this issue May 14, 2020

plot_period_transactions failed for BetaGeoBetaBinomFitter #354

Open

yukioandre mentioned this issue Oct 15, 2020

How to solve AttributeError: 'BetaGeoFitter' object has no attribute 'summary' when using v0.10.1? #401

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BetaGeoBetaBinomFitter.fit() not converging #259

BetaGeoBetaBinomFitter.fit() not converging #259

cereusperuv commented Mar 21, 2019

CamDavidsonPilon commented Mar 21, 2019

cereusperuv commented Mar 21, 2019

CamDavidsonPilon commented Mar 21, 2019 •

edited

cereusperuv commented Mar 21, 2019 •

edited

CamDavidsonPilon commented Mar 21, 2019

cereusperuv commented Mar 21, 2019 •

edited

cereusperuv commented Apr 8, 2019

amrishan commented Nov 13, 2019

pallenmar commented Nov 20, 2019 •

edited

cereusperuv commented Jan 2, 2020 •

edited

Trollgeir commented May 13, 2020

cereusperuv commented May 14, 2020 •

edited

aniroxxsc commented Apr 20, 2022

diligejy commented Jul 13, 2022 •

edited

diligejy commented Jul 13, 2022

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

alphaB787 commented Jul 25, 2022 •

edited

diligejy commented Jul 26, 2022

alphaB787 commented Jul 26, 2022

diligejy commented Jul 26, 2022

humbertaco commented Jul 26, 2022

ColtAllen commented Aug 6, 2022

BetaGeoBetaBinomFitter.fit() not converging #259

BetaGeoBetaBinomFitter.fit() not converging #259

Comments

cereusperuv commented Mar 21, 2019

CamDavidsonPilon commented Mar 21, 2019

cereusperuv commented Mar 21, 2019

CamDavidsonPilon commented Mar 21, 2019 • edited

cereusperuv commented Mar 21, 2019 • edited

CamDavidsonPilon commented Mar 21, 2019

cereusperuv commented Mar 21, 2019 • edited

cereusperuv commented Apr 8, 2019

amrishan commented Nov 13, 2019

pallenmar commented Nov 20, 2019 • edited

cereusperuv commented Jan 2, 2020 • edited

Trollgeir commented May 13, 2020

cereusperuv commented May 14, 2020 • edited

aniroxxsc commented Apr 20, 2022

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

diligejy commented Jul 13, 2022 • edited

diligejy commented Jul 13, 2022

<lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

alphaB787 commented Jul 25, 2022 • edited

diligejy commented Jul 26, 2022

alphaB787 commented Jul 26, 2022

diligejy commented Jul 26, 2022

humbertaco commented Jul 26, 2022

ColtAllen commented Aug 6, 2022

CamDavidsonPilon commented Mar 21, 2019 •

edited

cereusperuv commented Mar 21, 2019 •

edited

cereusperuv commented Mar 21, 2019 •

edited

pallenmar commented Nov 20, 2019 •

edited

cereusperuv commented Jan 2, 2020 •

edited

cereusperuv commented May 14, 2020 •

edited

diligejy commented Jul 13, 2022 •

edited

alphaB787 commented Jul 25, 2022 •

edited