Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any offsets? #173

Closed
ric70x7 opened this issue Jun 12, 2018 · 14 comments
Closed

any offsets? #173

ric70x7 opened this issue Jun 12, 2018 · 14 comments

Comments

@ric70x7
Copy link

ric70x7 commented Jun 12, 2018

Hi,
First of all, this is a great package!
Is it possible to declare an offset or exposure variable? Meaning: a regressor with coefficient fixed to 1.

@dswah
Copy link
Owner

dswah commented Jun 12, 2018

@ric70x7 thanks for using the package :)

unfortunately, not really. :/
only the Poisson model (PoissonGAM) has an optional exposure argument in the fit method.

could you tell me a bit about why an offset or exposure might be useful? I would like to figure out how to prioritize such a feature.
what kind of model are you making?

Thanks :)

@ric70x7
Copy link
Author

ric70x7 commented Jun 13, 2018

That is perfect. Having exposure in the Poisson model solves my current problem. I'm interested in Point Processes for epidemiology.
Thanks for the quick response.

@dswah
Copy link
Owner

dswah commented Jun 14, 2018

@ric70x7 great. let me know if this works for you.

@ric70x7
Copy link
Author

ric70x7 commented Jun 19, 2018

It is working well so far. Thanks.

@ric70x7 ric70x7 closed this as completed Jun 19, 2018
@dswah
Copy link
Owner

dswah commented Jun 19, 2018

great!!

@ric70x7
Copy link
Author

ric70x7 commented Jun 27, 2018

Hi again. I just noticed that there is an issue with the exposure parameter. When I declare it different from a constant across all observations I get -infinite Log Likelihoods. The fitted values seem to make sense anyway. Here is an example:

import pygam

def some_rate(x):
return 1.2 + np.cos(x/10.)

X = np.random.uniform(0, 100, 50)
population = np.random.uniform(10, 20, X.size)
Y = np.hstack([np.random.poisson(lam=some_rate(xi) * popi, size=1) for xi,popi in zip(X, population)])
gam = pygam.PoissonGAM()
gam.fit(X, Y, exposure=population) # this raises the waring: RuntimeWarning:invalid value encountered in double_scalars
gam.summary() # Log Likelihood is -inf

This doesn't happen if I make the exposure constant, for example:

gam.fit(X, Y, exposure=20 * np.ones_like(population))

@ric70x7 ric70x7 reopened this Jun 27, 2018
@dswah
Copy link
Owner

dswah commented Jun 28, 2018

oh wow. this is a great example. im taking a look.

@dswah dswah mentioned this issue Jun 29, 2018
2 tasks
@dswah
Copy link
Owner

dswah commented Jun 29, 2018

@ric70x7 i see now that the issue comes from using exposure vs offset and the necessity of the Poisson observations to be integer.

when using exposure, we divide the counts by the exposure and model that (and weight each sample by the exposure to compensate for the actual variance). This turns the counts into rates, which results in an equivalent model, but now the exact likelihood cannot be evaluated because Poisson requires integer observations.

pyGAM rescales the rate into counts when computing the log-likelihood, but i forgot to round and cast to integer. small numerical errors caused the pmf function to believe these rescaled counts were non-integers...

anyway, this could all be avoided if everything were modeled with offsets...
but alas that requires a little more machinery.

@dswah
Copy link
Owner

dswah commented Jun 29, 2018

im going to burn a new version. you should see this issue fixed in that version.

@dswah
Copy link
Owner

dswah commented Jun 29, 2018

cool, @ric70x7 can you try the new version?
pip install -U pygam

@ric70x7
Copy link
Author

ric70x7 commented Jun 30, 2018

Yep, this works. Thanks!

@dswah dswah closed this as completed Jun 30, 2018
@FeysJan
Copy link

FeysJan commented Sep 28, 2018

Hi! I'm currently needing to include exposure in a binomial model. Is that implemented (I couldn't find it) and, if not, is there a work-around?

@ric70x7
Copy link
Author

ric70x7 commented Sep 28, 2018

Hi FeysJan. I don't know if it is implemented like that, but you can fit a binomial model using a LogisticGAM and define the weights according to the exposure and number of positive cases. For example, if you have 3 positives and 5 trials, this would be passed as y=1 with a weight of 3/5 and y = 0 with a weight of 2/5.

I wrote a function that organizes the data for you. Here is the link:
https://github.com/disarm-platform/disarm-gears/blob/develop/disarm_gears/util/binomial_to_bernoulli.py
(Just remove what you don't need, because it has other things that are needed for my current project)

It works like this:

import numpy as np
import pygam

num_data = 20
n_trials = np.random.randint(10, 15, num_data)
X = np.random.normal(0, .8, num_data)
X.sort()
rate = 3 + 5 * X
n_positive = (1/(1+np.exp(-rate)) * n_trials).astype(int)

y, w, newX = binomial_to_bernoulli(n_positive=n_positive, n_trials=n_trials, X=X)
m = pygam.LogisticGAM()
m.fit(X=newX[:,None], y=y, weights=w)

m.predict_mu(X) # this returns the fitted rate

@FeysJan
Copy link

FeysJan commented Oct 1, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants