any offsets? #173

ric70x7 · 2018-06-12T22:21:12Z

Hi,
First of all, this is a great package!
Is it possible to declare an offset or exposure variable? Meaning: a regressor with coefficient fixed to 1.

dswah · 2018-06-12T23:06:18Z

@ric70x7 thanks for using the package :)

unfortunately, not really. :/
only the Poisson model (PoissonGAM) has an optional exposure argument in the fit method.

could you tell me a bit about why an offset or exposure might be useful? I would like to figure out how to prioritize such a feature.
what kind of model are you making?

Thanks :)

ric70x7 · 2018-06-13T19:30:27Z

That is perfect. Having exposure in the Poisson model solves my current problem. I'm interested in Point Processes for epidemiology.
Thanks for the quick response.

dswah · 2018-06-14T15:55:32Z

@ric70x7 great. let me know if this works for you.

ric70x7 · 2018-06-19T20:09:13Z

It is working well so far. Thanks.

dswah · 2018-06-19T20:52:35Z

great!!

ric70x7 · 2018-06-27T01:06:29Z

Hi again. I just noticed that there is an issue with the exposure parameter. When I declare it different from a constant across all observations I get -infinite Log Likelihoods. The fitted values seem to make sense anyway. Here is an example:

import pygam

def some_rate(x):
return 1.2 + np.cos(x/10.)

X = np.random.uniform(0, 100, 50)
population = np.random.uniform(10, 20, X.size)
Y = np.hstack([np.random.poisson(lam=some_rate(xi) * popi, size=1) for xi,popi in zip(X, population)])
gam = pygam.PoissonGAM()
gam.fit(X, Y, exposure=population) # this raises the waring: RuntimeWarning:invalid value encountered in double_scalars
gam.summary() # Log Likelihood is -inf

This doesn't happen if I make the exposure constant, for example:

gam.fit(X, Y, exposure=20 * np.ones_like(population))

dswah · 2018-06-28T09:19:09Z

oh wow. this is a great example. im taking a look.

dswah · 2018-06-29T14:09:23Z

@ric70x7 i see now that the issue comes from using exposure vs offset and the necessity of the Poisson observations to be integer.

when using exposure, we divide the counts by the exposure and model that (and weight each sample by the exposure to compensate for the actual variance). This turns the counts into rates, which results in an equivalent model, but now the exact likelihood cannot be evaluated because Poisson requires integer observations.

pyGAM rescales the rate into counts when computing the log-likelihood, but i forgot to round and cast to integer. small numerical errors caused the pmf function to believe these rescaled counts were non-integers...

anyway, this could all be avoided if everything were modeled with offsets...
but alas that requires a little more machinery.

dswah · 2018-06-29T14:09:48Z

im going to burn a new version. you should see this issue fixed in that version.

dswah · 2018-06-29T14:37:31Z

cool, @ric70x7 can you try the new version?
pip install -U pygam

ric70x7 · 2018-06-30T03:24:09Z

Yep, this works. Thanks!

FeysJan · 2018-09-28T15:06:41Z

Hi! I'm currently needing to include exposure in a binomial model. Is that implemented (I couldn't find it) and, if not, is there a work-around?

ric70x7 · 2018-09-28T19:52:34Z

Hi FeysJan. I don't know if it is implemented like that, but you can fit a binomial model using a LogisticGAM and define the weights according to the exposure and number of positive cases. For example, if you have 3 positives and 5 trials, this would be passed as y=1 with a weight of 3/5 and y = 0 with a weight of 2/5.

I wrote a function that organizes the data for you. Here is the link:
https://github.com/disarm-platform/disarm-gears/blob/develop/disarm_gears/util/binomial_to_bernoulli.py
(Just remove what you don't need, because it has other things that are needed for my current project)

It works like this:

import numpy as np
import pygam

num_data = 20
n_trials = np.random.randint(10, 15, num_data)
X = np.random.normal(0, .8, num_data)
X.sort()
rate = 3 + 5 * X
n_positive = (1/(1+np.exp(-rate)) * n_trials).astype(int)

y, w, newX = binomial_to_bernoulli(n_positive=n_positive, n_trials=n_trials, X=X)
m = pygam.LogisticGAM()
m.fit(X=newX[:,None], y=y, weights=w)

m.predict_mu(X) # this returns the fitted rate

FeysJan · 2018-10-01T14:40:08Z

Thanks Ricardo, I appreciate the code. I’m looking at what you sent me to see if that solves my problems. I had a hunch I may be able to convert to a LogisticGAM. Jan

…

On Sep 28, 2018, at 3:52 PM, Ricardo Andrade ***@***.***> wrote: Hi FeysJan. I don't know if it is implemented like that, but you can fit a binomial model using a LogisticGAM and define the weights according to the exposure and number of positive cases. For example, if you have 3 positives and 5 trials, this would be passed as y=1 with a weight of 3/5 and y = 0 with a weight of 2/5. I wrote a function that organizes the data for you. Here is the link: https://github.com/disarm-platform/disarm-gears/blob/develop/disarm_gears/util/binomial_to_bernoulli.py <https://github.com/disarm-platform/disarm-gears/blob/develop/disarm_gears/util/binomial_to_bernoulli.py> (Just remove what you don't need, because it has other things that are needed for my current project) It works like this: import numpy as np import pygam Some toy data num_data = 20 n_trials = np.random.randint(10, 15, num_data) X = np.random.normal(0, .8, num_data) X.sort() rate = 3 + 5 * X n_positive = (1/(1+np.exp(-rate)) * n_trials).astype(int) Turn your binomial observations into bernoulli observations y, w, newX = binomial_to_bernoulli(n_positive=n_positive, n_trials=n_trials, X=X) m = pygam.LogisticGAM() m.fit(X=newX[:,None], y=y, weights=w) m.predict_mu(X) # this returns the fitted rate — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#173 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ao_HTp8vSu1533X3li2B_q3n30fxXefyks5ufn4FgaJpZM4UlPuW>.

ric70x7 closed this as completed Jun 19, 2018

ric70x7 reopened this Jun 27, 2018

dswah mentioned this issue Jun 29, 2018

Poisson bug #179

Merged

2 tasks

dswah closed this as completed Jun 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any offsets? #173

any offsets? #173

ric70x7 commented Jun 12, 2018

dswah commented Jun 12, 2018

ric70x7 commented Jun 13, 2018

dswah commented Jun 14, 2018

ric70x7 commented Jun 19, 2018

dswah commented Jun 19, 2018

ric70x7 commented Jun 27, 2018

dswah commented Jun 28, 2018

dswah commented Jun 29, 2018

dswah commented Jun 29, 2018

dswah commented Jun 29, 2018

ric70x7 commented Jun 30, 2018

FeysJan commented Sep 28, 2018

ric70x7 commented Sep 28, 2018 •

edited

FeysJan commented Oct 1, 2018 via email

any offsets? #173

any offsets? #173

Comments

ric70x7 commented Jun 12, 2018

dswah commented Jun 12, 2018

ric70x7 commented Jun 13, 2018

dswah commented Jun 14, 2018

ric70x7 commented Jun 19, 2018

dswah commented Jun 19, 2018

ric70x7 commented Jun 27, 2018

dswah commented Jun 28, 2018

dswah commented Jun 29, 2018

dswah commented Jun 29, 2018

dswah commented Jun 29, 2018

ric70x7 commented Jun 30, 2018

FeysJan commented Sep 28, 2018

ric70x7 commented Sep 28, 2018 • edited

FeysJan commented Oct 1, 2018 via email

ric70x7 commented Sep 28, 2018 •

edited