Refactor terms #582

tomicapretto · 2022-10-29T18:22:30Z

Generally speaking, this PR aims to make Bambi internals more modular and extensible. To do that, I removed any logic specific to a given object from another given object. For example, there was Family-specific logic within response terms or the PyMC backend. See

bambi/bambi/terms.py

Lines 52 to 57 in e1503d2

    
           elif isinstance(spec.family, Bernoulli): 
        
               # We've already checked the values are all 0 and 1 
        
               self.success = 1 
        
               self.data = term.design_matrix 
        
           else: 
        
               self.data = term.design_matrix

bambi/bambi/terms.py

Lines 65 to 77 in e1503d2

    
           if isinstance(spec.family, Categorical): 
        
               name = self.name + "_dim" 
        
               self.coords[name] = [level for level in term.levels if level != self.reference] 
        
           elif isinstance(spec.family, Multinomial): 
        
               name = self.name + "_dim" 
        
               labels = extract_argument_names(self.name, list(extra_namespace)) 
        
               if labels: 
        
                   self.levels = labels 
        
               else: 
        
                   self.levels = [str(level) for level in range(self.data.shape[1])] 
        
               labels = self.levels[1:] 
        
               self.coords[name] = labels 
        
           # TBD: Continue here when we add general multivariate responses.

bambi/bambi/backend/pymc.py

Lines 213 to 214 in e1503d2

    
           elif isinstance(spec.family, (Categorical, Multinomial)): 
        
               self.mu += coef * predictor[:, np.newaxis]

bambi/bambi/backend/terms.py

Lines 273 to 295 in e1503d2

    
           # Handle some special cases 
        
           if isinstance(self.family, Beta): 
        
               # Beta distribution in PyMC uses alpha and beta, but we have mu and kappa. 
        
               # alpha = mu * kappa 
        
               # beta = (1 - mu) * kappa 
        
               alpha = kwargs["mu"] * kwargs["kappa"] 
        
               beta = (1 - kwargs["mu"]) * kwargs["kappa"] 
        
               return dist(self.name, alpha=alpha, beta=beta, observed=kwargs["observed"]) 
        
           if isinstance(self.family, Binomial): 
        
               successes = kwargs["observed"][:, 0].squeeze() 
        
               trials = kwargs["observed"][:, 1].squeeze() 
        
               return dist(self.name, p=kwargs["p"], observed=successes, n=trials) 
        
           if isinstance(self.family, Gamma): 
        
               # Gamma distribution is specified using mu and sigma, but we request prior for alpha. 
        
               # We build sigma from mu and alpha. 
        
               sigma = kwargs["mu"] / (kwargs["alpha"] ** 0.5) 
        
               return dist(self.name, mu=kwargs["mu"], sigma=sigma, observed=kwargs["observed"]) 
        
           if isinstance(self.family, Multinomial): 
        
               n = kwargs["observed"].sum(axis=1) 
        
               return dist(self.name, p=kwargs["p"], observed=kwargs["observed"], n=n)

This approach has been working so far, but if we plan to extend Bambi according to #544 without making it messier, we should stop handling specific cases in the same way we've been doing so far.

In addition, I implemented a couple of extra features/improvements that I'm going to describe below.

Refactor

Moved the terms.py module to its own directory, called terms.
- Here we have one Python file per term type. They all inherit from the BaseTerm class.
Many attributes that were computed at creation time, and handled in very specific cases, are not computed at access time. In many cases, the determination of the value is determined by the Family, and not the term (see for example here) This allows us to add more family-specific logic without having to modify how terms work.
Terms require fewer objects to be initialized. Previously we were passing redundant information. Now we pass the formulae term and the prior. Responses also require the family.
Removed manipulation of distribution parameters from the backend. This is delegated to the family. As result, we removed family-specific logic from the backend, and we can extend Bambi in a much richer way. See here and here.
- As a consequence, families gained optional methods. When these are present, they modify how the backend manipulates the parameters of the likelihood function or the linear predictor.

Features/improvements

The softmax function from aesara is now called with axis=-1. There's a warning saying that one needs to call it explicitly in the future. I opened Make it possible to pass arguments to link functions in PyMC backend #574 because of this. This PR fixes it, and I don't see the need to handle more general use cases.
The Likelihood class now accepts custom distribution functions. Until now, what users could do was pass the name of a distribution that would be looked up in the PyMC namespace (here and here). Now, you can create a Likelihood instance with a custom dist that will be used as the likelihood function. See here and here.

…oods now

review-notebook-app · 2022-10-29T18:22:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2022-10-31T12:54:50Z

Codecov Report

Merging #582 (8dfed09) into main (e1503d2) will decrease coverage by 1.75%.
The diff coverage is 81.76%.

❗ Current head 8dfed09 differs from pull request most recent head e50dc6f. Consider uploading reports for the commit e50dc6f to get more accurate results

@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
- Coverage   87.23%   85.48%   -1.76%     
==========================================
  Files          29       36       +7     
  Lines        2429     2859     +430     
==========================================
+ Hits         2119     2444     +325     
- Misses        310      415     +105

Impacted Files	Coverage Δ
bambi/terms/utils.py	`0.00% <0.00%> (ø)`
bambi/terms/offset.py	`66.66% <66.66%> (ø)`
bambi/terms/base.py	`72.88% <72.88%> (ø)`
bambi/terms/response.py	`77.04% <77.04%> (ø)`
bambi/backend/terms.py	`94.44% <86.66%> (-1.86%)`	⬇️
bambi/terms/common.py	`87.71% <87.71%> (ø)`
bambi/families/multivariate.py	`95.38% <91.11%> (-2.29%)`	⬇️
bambi/families/univariate.py	`88.61% <91.89%> (+1.40%)`	⬆️
bambi/models.py	`85.59% <94.11%> (-0.05%)`	⬇️
bambi/terms/group_specific.py	`95.40% <95.40%> (ø)`
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

tomicapretto · 2022-10-31T16:01:04Z

I've run all the examples and they work well. Just updated a couple of them to fix some latex issues. Will merge after test suit finishes.

tomicapretto added 14 commits October 22, 2022 16:06

Initial ideas

7b61d0a

Some ideas on group-specific terms

722ca8c

Alias for hyperpriors

ba1a604

Use read-only properties for most attributes

0efdbdc

Start response term.

4beecaa

Design how coords and levels are handled

385fcbb

Move specific methods to families

bf5de21

change backend transformations

28a1ea7

Move transformations to the family

b4704ff

Finish changes. Fix link function axis issue. Can have custom likelih…

0f7cc89

…oods now

Use the new terms. Tests are completely broken

39b9831

Some fixes

d15b5ec

Fix more issues

7788dc2

Fix tests

59f9881

tomicapretto marked this pull request as ready for review October 29, 2022 18:22

tomicapretto added 4 commits October 29, 2022 15:26

remove unused files

5ea998f

Use type(None)

367caad

Remove censored test

2f6fc29

Add test for custom likelihood distribution

8dfed09

tomicapretto added 2 commits October 31, 2022 11:47

Update some notebooks

b942fbf

add negative binomial

e50dc6f

tomicapretto merged commit c74fb3d into bambinos:main Oct 31, 2022

tomicapretto deleted the refactor_terms branch October 26, 2023 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor terms #582

Refactor terms #582

tomicapretto commented Oct 29, 2022 •

edited

review-notebook-app bot commented Oct 29, 2022

codecov-commenter commented Oct 31, 2022 •

edited

tomicapretto commented Oct 31, 2022

	elif isinstance(spec.family, Bernoulli):
	# We've already checked the values are all 0 and 1
	self.success = 1
	self.data = term.design_matrix
	else:
	self.data = term.design_matrix

	if isinstance(spec.family, Categorical):
	name = self.name + "_dim"
	self.coords[name] = [level for level in term.levels if level != self.reference]
	elif isinstance(spec.family, Multinomial):
	name = self.name + "_dim"
	labels = extract_argument_names(self.name, list(extra_namespace))
	if labels:
	self.levels = labels
	else:
	self.levels = [str(level) for level in range(self.data.shape[1])]
	labels = self.levels[1:]
	self.coords[name] = labels
	# TBD: Continue here when we add general multivariate responses.

	elif isinstance(spec.family, (Categorical, Multinomial)):
	self.mu += coef * predictor[:, np.newaxis]

	# Handle some special cases
	if isinstance(self.family, Beta):
	# Beta distribution in PyMC uses alpha and beta, but we have mu and kappa.
	# alpha = mu * kappa
	# beta = (1 - mu) * kappa
	alpha = kwargs["mu"] * kwargs["kappa"]
	beta = (1 - kwargs["mu"]) * kwargs["kappa"]
	return dist(self.name, alpha=alpha, beta=beta, observed=kwargs["observed"])

	if isinstance(self.family, Binomial):
	successes = kwargs["observed"][:, 0].squeeze()
	trials = kwargs["observed"][:, 1].squeeze()
	return dist(self.name, p=kwargs["p"], observed=successes, n=trials)

	if isinstance(self.family, Gamma):
	# Gamma distribution is specified using mu and sigma, but we request prior for alpha.
	# We build sigma from mu and alpha.
	sigma = kwargs["mu"] / (kwargs["alpha"] ** 0.5)
	return dist(self.name, mu=kwargs["mu"], sigma=sigma, observed=kwargs["observed"])

	if isinstance(self.family, Multinomial):
	n = kwargs["observed"].sum(axis=1)
	return dist(self.name, p=kwargs["p"], observed=kwargs["observed"], n=n)

Refactor terms #582

Refactor terms #582

Conversation

tomicapretto commented Oct 29, 2022 • edited

review-notebook-app bot commented Oct 29, 2022

codecov-commenter commented Oct 31, 2022 • edited

Codecov Report

tomicapretto commented Oct 31, 2022

tomicapretto commented Oct 29, 2022 •

edited

codecov-commenter commented Oct 31, 2022 •

edited