Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor terms #582

Merged
merged 20 commits into from
Oct 31, 2022
Merged

Refactor terms #582

merged 20 commits into from
Oct 31, 2022

Conversation

tomicapretto
Copy link
Collaborator

@tomicapretto tomicapretto commented Oct 29, 2022

Generally speaking, this PR aims to make Bambi internals more modular and extensible. To do that, I removed any logic specific to a given object from another given object. For example, there was Family-specific logic within response terms or the PyMC backend. See

bambi/bambi/terms.py

Lines 52 to 57 in e1503d2

elif isinstance(spec.family, Bernoulli):
# We've already checked the values are all 0 and 1
self.success = 1
self.data = term.design_matrix
else:
self.data = term.design_matrix

bambi/bambi/terms.py

Lines 65 to 77 in e1503d2

if isinstance(spec.family, Categorical):
name = self.name + "_dim"
self.coords[name] = [level for level in term.levels if level != self.reference]
elif isinstance(spec.family, Multinomial):
name = self.name + "_dim"
labels = extract_argument_names(self.name, list(extra_namespace))
if labels:
self.levels = labels
else:
self.levels = [str(level) for level in range(self.data.shape[1])]
labels = self.levels[1:]
self.coords[name] = labels
# TBD: Continue here when we add general multivariate responses.

bambi/bambi/backend/pymc.py

Lines 213 to 214 in e1503d2

elif isinstance(spec.family, (Categorical, Multinomial)):
self.mu += coef * predictor[:, np.newaxis]

# Handle some special cases
if isinstance(self.family, Beta):
# Beta distribution in PyMC uses alpha and beta, but we have mu and kappa.
# alpha = mu * kappa
# beta = (1 - mu) * kappa
alpha = kwargs["mu"] * kwargs["kappa"]
beta = (1 - kwargs["mu"]) * kwargs["kappa"]
return dist(self.name, alpha=alpha, beta=beta, observed=kwargs["observed"])
if isinstance(self.family, Binomial):
successes = kwargs["observed"][:, 0].squeeze()
trials = kwargs["observed"][:, 1].squeeze()
return dist(self.name, p=kwargs["p"], observed=successes, n=trials)
if isinstance(self.family, Gamma):
# Gamma distribution is specified using mu and sigma, but we request prior for alpha.
# We build sigma from mu and alpha.
sigma = kwargs["mu"] / (kwargs["alpha"] ** 0.5)
return dist(self.name, mu=kwargs["mu"], sigma=sigma, observed=kwargs["observed"])
if isinstance(self.family, Multinomial):
n = kwargs["observed"].sum(axis=1)
return dist(self.name, p=kwargs["p"], observed=kwargs["observed"], n=n)

This approach has been working so far, but if we plan to extend Bambi according to #544 without making it messier, we should stop handling specific cases in the same way we've been doing so far.

In addition, I implemented a couple of extra features/improvements that I'm going to describe below.

Refactor

  • Moved the terms.py module to its own directory, called terms.
    • Here we have one Python file per term type. They all inherit from the BaseTerm class.
  • Many attributes that were computed at creation time, and handled in very specific cases, are not computed at access time. In many cases, the determination of the value is determined by the Family, and not the term (see for example here) This allows us to add more family-specific logic without having to modify how terms work.
  • Terms require fewer objects to be initialized. Previously we were passing redundant information. Now we pass the formulae term and the prior. Responses also require the family.
  • Removed manipulation of distribution parameters from the backend. This is delegated to the family. As result, we removed family-specific logic from the backend, and we can extend Bambi in a much richer way. See here and here.
    • As a consequence, families gained optional methods. When these are present, they modify how the backend manipulates the parameters of the likelihood function or the linear predictor.

Features/improvements

  • The softmax function from aesara is now called with axis=-1. There's a warning saying that one needs to call it explicitly in the future. I opened Make it possible to pass arguments to link functions in PyMC backend #574 because of this. This PR fixes it, and I don't see the need to handle more general use cases.
  • The Likelihood class now accepts custom distribution functions. Until now, what users could do was pass the name of a distribution that would be looked up in the PyMC namespace (here and here). Now, you can create a Likelihood instance with a custom dist that will be used as the likelihood function. See here and here.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tomicapretto tomicapretto marked this pull request as ready for review October 29, 2022 18:22
@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2022

Codecov Report

Merging #582 (8dfed09) into main (e1503d2) will decrease coverage by 1.75%.
The diff coverage is 81.76%.

❗ Current head 8dfed09 differs from pull request most recent head e50dc6f. Consider uploading reports for the commit e50dc6f to get more accurate results

@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
- Coverage   87.23%   85.48%   -1.76%     
==========================================
  Files          29       36       +7     
  Lines        2429     2859     +430     
==========================================
+ Hits         2119     2444     +325     
- Misses        310      415     +105     
Impacted Files Coverage Δ
bambi/terms/utils.py 0.00% <0.00%> (ø)
bambi/terms/offset.py 66.66% <66.66%> (ø)
bambi/terms/base.py 72.88% <72.88%> (ø)
bambi/terms/response.py 77.04% <77.04%> (ø)
bambi/backend/terms.py 94.44% <86.66%> (-1.86%) ⬇️
bambi/terms/common.py 87.71% <87.71%> (ø)
bambi/families/multivariate.py 95.38% <91.11%> (-2.29%) ⬇️
bambi/families/univariate.py 88.61% <91.89%> (+1.40%) ⬆️
bambi/models.py 85.59% <94.11%> (-0.05%) ⬇️
bambi/terms/group_specific.py 95.40% <95.40%> (ø)
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tomicapretto
Copy link
Collaborator Author

I've run all the examples and they work well. Just updated a couple of them to fix some latex issues. Will merge after test suit finishes.

@tomicapretto tomicapretto merged commit c74fb3d into bambinos:main Oct 31, 2022
@tomicapretto tomicapretto deleted the refactor_terms branch October 26, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants