Use `bayeux` to access a wide range of samplers #775

GStechschulte · 2024-02-04T14:31:16Z

I have been following @ColCarroll bayeux library and thought it would be interesting to see how Bambi could incorporate it to offer the users a wide range of samplers (more than nuts_blackjax and nuts_numpyro).

Edit: Now I access the samplers programmatically using the inference_method arg. This removes previously needed code for nuts_blackjax and nuts_numpyro. If a user passes an MCMC inference method other than the PyMC MCMC sampler mcmc, Bambi will use bayeux to call that sampler.

data = bmb.load_data("ANES")
clinton_data = data.loc[data["vote"].isin(["clinton", "trump"]), :]

model = bmb.Model("vote['clinton'] ~ party_id + party_id:age", clinton_data, family="bernoulli")
model.build()

idata = model.fit(inference_method="blackjax_hmc")

However, when cleaning the InferenceData, I am getting an xarray error

"name": "ValueError",
	"message": "('chain', 'draw') must be a permuted list of FrozenMappingWarningOnValuesAccess({'chain': 8, 'draw': 500, 'party_id_dim_0': 2, 'party_id:age_dim_0': 3}), unless `...` is included"

It seems xarray is not liking something that bayeux is doing with the InferenceData.

Another thought is that using bayeux with Bambi is so easy

model = bmb.Model()
model.build()

bx_model = bx.Model.from_pymc(model.backend.model)
bx_model.<some sampling func>

that maybe we just add documentation explaining how to use bayeux with Bambi to avoid overhead from Bambi's side?

To Do:

add additional tests in test_alternative_samplers.py
update docstrings referencing JAX based samplers
add docs on using alternative backends

ColCarroll · 2024-02-04T14:46:27Z

Really cool! A few suggestions --

bayeux could be invisible here, and you could access all the methods programatically. That's done here, but i can factor that out into a function that gives methods instead of strings -- currently it only adds a method if the underlying library, e.g. optax, is installed. i'm not sure how to avoid using a string at some point. You could have an api like model.fit.bx that initializes the bayeux.Model?
I'm happy to add a from_bambi constructor on the bayeux side to make your second option even easier.

ahartikainen · 2024-02-04T15:01:56Z

What kind of API bayex has? Could we enable support for external samplers if we define specific API we support (need)? (Users could create class for external samplers if needed?)

Of course that does not mean we could not have a text based support on certain libraries?

ColCarroll · 2024-02-04T15:17:21Z

bayeux is inspired by arviz, in that it just provides a representation of a model that is general enough for most samplers, but it does make the decision that it is specialized to JAX-based models (most of the algorithms use autodiff, vectorization is baked in, and automatic function inverses are also used). If you've got a sampler that accepts a JAX-based log density, you could use bayeux with it (or contribute it to bayeux!)

GStechschulte · 2024-02-05T14:47:31Z

Really cool! A few suggestions --

bayeux could be invisible here, and you could access all the methods programatically. That's done here, but i can factor that out into a function that gives methods instead of strings -- currently it only adds a method if the underlying library, e.g. optax, is installed. i'm not sure how to avoid using a string at some point. You could have an api like model.fit.bx that initializes the bayeux.Model?

I'm happy to add a from_bambi constructor on the bayeux side to make your second option even easier.

Thanks for the suggestions! That makes sense. I am liking the second option, but I will run some ideas past the others first before asking for the feature. Thanks!

review-notebook-app · 2024-02-05T21:07:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

GStechschulte · 2024-02-09T05:37:14Z

In this example, the error is because bayeux is appending _0 to party_id_dim of the posterior dims. This results in Bambi discarding all posterior dims because the dims in the PyMC model are inconsistent with the dims of the InferenceData returned by bayeux.

For example:

print(bayeux_idata.posterior.dims, pymc_idata.posterior.dims)

(FrozenMappingWarningOnValuesAccess({'chain': 8, 'draw': 500, 'party_id_dim_0': 2, 'party_id:age_dim_0': 3}),
 FrozenMappingWarningOnValuesAccess({'chain': 4, 'draw': 1000, 'party_id_dim': 2, 'party_id:age_dim': 3}))

GStechschulte · 2024-02-10T09:24:50Z

In this example, the error is because bayeux is appending _0 to party_id_dim of the posterior dims. This results in Bambi discarding all posterior dims because the dims in the PyMC model are inconsistent with the dims of the InferenceData returned by bayeux.

For example:
print(bayeux_idata.posterior.dims, pymc_idata.posterior.dims)
(FrozenMappingWarningOnValuesAccess({'chain': 8, 'draw': 500, 'party_id_dim_0': 2, 'party_id:age_dim_0': 3}),
FrozenMappingWarningOnValuesAccess({'chain': 4, 'draw': 1000, 'party_id_dim': 2, 'party_id:age_dim': 3}))

Update: I have added logic in the cleaning of idata to: (1) identify bayeux idata and to remove the trailing numeric suffix from the _dims, and (2) to rename the posterior dims to be consistent with the PyMC model coords.

Although this works for simple models, I haven't tried this logic with more complex models in Bambi such as HSGP or with models that have a large number of dims and or factors. Since the idata contains very "important data", I also think it could be worthwhile to not clean idata when the user calls samplers from bayeux at the moment in order to avoid unknown effects appearing in the inference data.

zwelitunyiswa · 2024-02-10T16:15:15Z

Would it be possible to allow access to the optimization methods from Bayeux as well via Bambi?

…

On Sat, Feb 10, 2024 at 4:25 AM Gabriel Stechschulte < ***@***.***> wrote: In this example, the error is because bayeux is appending _0 to party_id_dim of the posterior dims. This results in Bambi discarding <https://github.com/GStechschulte/bambi/blob/9f1d9d179071abbb4cc6255242132829aae80faf/bambi/backend/pymc.py#L261C4-L261C86> all posterior dims because the dims in the PyMC model are inconsistent with the dims of the InferenceData returned by bayeux. For example: print(bayeux_idata.posterior.dims, pymc_idata.posterior.dims) (FrozenMappingWarningOnValuesAccess({'chain': 8, 'draw': 500, 'party_id_dim_0': 2, 'party_id:age_dim_0': 3}), FrozenMappingWarningOnValuesAccess({'chain': 4, 'draw': 1000, 'party_id_dim': 2, 'party_id:age_dim': 3})) Update: I have added logic in the cleaning of idata to: (1) identify bayeux idata and to remove the trailing numeric suffix from the _dims, and (2) to rename the posterior dims to be consistent with the PyMC model coords. — Reply to this email directly, view it on GitHub <#775 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH3QQV3RMDQKAJ34J6FVNWLYS44HDAVCNFSM6AAAAABCY5BXBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWHE2TCMJUHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ColCarroll · 2024-02-11T02:55:17Z

I think you could -- @GStechschulte has a good outline here. If @tomicapretto thinks this is a reasonable idea in principle, I'd be happy to either collaborate on this to (programatically) get the bayeux inference methods in, or send a follow-up that generalizes it a bit, and allows optimization and VI.

tomicapretto · 2024-02-18T18:12:55Z

I think this is really cool, thanks @GStechschulte and thanks @ColCarroll for bayeux. I'm not sure I am aware of all the details, but what is the reason why bayeux is appending _dim_0 to dimension names? As far as I remember that was an xarray thing. Or is it that bayeux is not receiving dimension names from the PyMC model and thus it appends _dim_0?

Another thing, I see we're replacing blackjax, jax, jaxlib, and numpyro with bayeux. However, as far as I know bayeux does not install these dependencies, so doing pip install bambi[jax] won't give users access to JAX based samplers, right? (I'm not very familiar with bayeux so I may be wrong).

ColCarroll · 2024-02-18T21:50:48Z

bayeux will pull those in, but i agree it is better to be explicit and require dependencies (in case bayeux makes weird decisions).

I'll double check on the naming conventions!

ColCarroll · 2024-02-19T15:35:16Z

Oh right, yes: bayeux has no concept of the dimensions from pymc. That would have to be implemented as a post-processing step to rename the arviz dimensions.

tomicapretto · 2024-02-19T16:51:50Z

Oh right, yes: bayeux has no concept of the dimensions from pymc. That would have to be implemented as a post-processing step to rename the arviz dimensions.

Thanks for the answer, it makes much more sense now!

GStechschulte · 2024-02-19T17:07:39Z

@ColCarroll thanks for the information.

@tomicapretto I can apply this post processing step on Bambi's side.

tomicapretto · 2024-02-19T17:47:31Z

@ColCarroll thanks for the information.

@tomicapretto I can apply this post processing step on Bambi's side.

Sounds great, just let me know if you need help or a second opinion :)

GStechschulte · 2024-02-19T19:37:44Z

Two updates:

I added a processing step for when Bambi cleans the idata, it renames the idata dims and coordinates to match those of the underlying PyMC model.
I explicitly added JAX based sampler dependencies.

Regarding

I'd be happy to either collaborate on this to (programatically) get the bayeux inference methods in, or send a follow-up that generalizes it a bit, and allows optimization and VI.

@ColCarroll I'd be happy to collaborate and see how you would do this 👍🏼

bambi/backend/pymc.py

pyproject.toml

GStechschulte · 2024-02-20T21:15:44Z

@ColCarroll thanks a lot for the review! I will incorporate these in the coming days.

bambi/backend/pymc.py

GStechschulte · 2024-03-04T18:09:57Z

Ugh. pylint is making the CI fail. It says it cannot import bayeux. However, when I check the logs of the step "Install Bambi and all its dependencies", I can see that bayeux was installed.

ColCarroll · 2024-03-04T18:11:11Z

the package is named bayeux-ml -- i'm checking now that you got that one.

pyproject.toml

ColCarroll · 2024-03-07T02:21:46Z

Oh. Last failures are because bayeux uses Python 3.10-only features (union types with a pipe, like str | None). I can get rid of those, but also NEP 29 suggests that Python 3.9 get dropped in 30 days. I see at least three choices here:

I release a Python 3.9 compatible bayeux version, then we can merge this.
We disable testing on Python 3.9 in this PR and merge. Things are confusing for users on Python 3.9 who pip install bambi[jax], but I can't find a way to restrict python versions in optional dependencies.
We just wait 30 days, do a PR removing Python 3.9 support, then merge this.

Any preferences?

tomicapretto · 2024-03-07T14:05:49Z

I'm in favor of following NumPy's deprecation cycle but at the same time I feel it moves faster than many users. For that reason, even when the pipe operator is a feature I really like, I have not been using it because of this (i.e. not working for Python <= 3.9).

According to PyPI stats, users are downloading Bambi using Python 3.7, 3.8, and 3.9. I know PyPI stats may not be a crystalline reflection of reality, but it's still useful. I don't want to break their installations.

So, we could drop official support for 3.9 as soon as NumPy does (this includes testing with Python >= 3.10 only), but I would not start using Python >= 3.10 specific features in the default installation of Bambi right now. I guess there will be a point where it'll become unavoidable (for example, when PyMC or PyTensor start to do so) and that's fine.

What do others think?

Edit I think I didn't answer your questions straightforwardly.

I would wait, do a PR removing Python 3.9 support, and then merge this. I don't want to ask you to work on a Python 3.9 compatible version that will be disposed in the short future (unless you think it's worth it).

GStechschulte · 2024-03-08T07:57:31Z

Thanks @ColCarroll for the information, and I agree with @tomicapretto thoughts. Looking forward to eventually merge this!

tomicapretto · 2024-03-08T12:16:11Z

According to PyPI stats, users are downloading Bambi using Python 3.7, 3.8, and 3.9. I know PyPI stats may not be a crystalline reflection of reality, but it's still useful. I don't want to break their installations.

Just want to add that after some comms in our Slack it looks like we won't be breaking users' installations. They'll just get an older version of Bambi if they use Python < 3.10

GStechschulte · 2024-03-10T07:31:00Z

@tomicapretto so do we want to wait the $x$ days to drop support for Python 3.9, or to drop it now?

tomicapretto · 2024-03-14T14:45:53Z

@tomicapretto so do we want to wait the x days to drop support for Python 3.9, or to drop it now?

sorry for the slow response, we can do it now if you agree on that

edit also, it won't impact most users until we make a release, so now i realize it's even less dangerous

codecov-commenter · 2024-03-19T17:29:53Z

Codecov Report

Attention: Patch coverage is 82.45614% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 90.14%. Comparing base (9a1387a) to head (9f9d769).
Report is 3 commits behind head on main.

Files	Patch %	Lines
bambi/backend/pymc.py	77.77%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #775      +/-   ##
==========================================
+ Coverage   89.86%   90.14%   +0.27%     
==========================================
  Files          46       46              
  Lines        3810     3836      +26     
==========================================
+ Hits         3424     3458      +34     
+ Misses        386      378       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

GStechschulte · 2024-03-19T18:24:05Z

CI failing on 3.12 seems related to this PyTensor issue and corresponding PR.

tomicapretto · 2024-03-19T20:54:02Z

CI failing on 3.12 seems related to this PyTensor issue and corresponding PR.

Ha! Should we avoid supporting 3.12 in the immediate future?

tomicapretto · 2024-03-22T13:15:41Z

I was giving this a shot and just found that the development version of PyMC now supports Python 3.12

https://github.com/pymc-devs/pymc/blob/61ce412aa599939eaf299a6328059c184e6c25db/setup.py#L33-L36

This is not the case in the latest release 5.11. I think we can just wait a bit until PyMC releases a new version to upgrade the PyMC version requirement in our pyproject.toml and then we're done.

Another option would be to pin PyMC to the current development version but that will have to be updated as soon as PyMC releases the version suporting Python 5.12 so I think it's just better to wait a bit.

https://github.com/pymc-devs/pymc/blob/a06081e1e9649bd56e3528cb96380efdf6bb2dc0/setup.py#L33-L35

tomicapretto · 2024-03-28T21:35:15Z

There's a new version of PyMC supporting Python 3.12... 🤞

tomicapretto · 2024-03-28T22:40:00Z

@GStechschulte I let you have the honor to click the green button :D

GStechschulte · 2024-03-29T05:19:00Z

@GStechschulte I let you have the honor to click the green button :D

Woooooo! Gracias! 🎉

ColCarroll · 2024-03-29T12:20:12Z

Thanks for all the persistence @GStechschulte!

GStechschulte added 2 commits February 4, 2024 15:13

use bayeux to access a wide range of samplers

44966d3

use bayeux to access a wide range of samplers

061a1b0

GStechschulte and others added 2 commits February 5, 2024 22:04

add notebook links to family table (bambinos#774)

8afe534

access methods programatically

9f1d9d1

GStechschulte added 4 commits February 10, 2024 09:49

clean bayeux idata to be consistent with pymc model coords

9b42fc2

rename alternative sampler args in tests

91ce2a0

change docstring to reflect bayeux sampler names

89a2aee

bayeux dependencies are numpyro/jax/jaxlib/blackjax

d6058ad

GStechschulte requested a review from tomicapretto February 10, 2024 09:25

GStechschulte added 2 commits February 19, 2024 20:21

rename idata coords and dims to PyMC model

722c8b5

add JAX based sampler dependencies

ccc2877

ColCarroll reviewed Feb 20, 2024

View reviewed changes

bambi/backend/pymc.py Outdated Show resolved Hide resolved

bambi/backend/pymc.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

ColCarroll reviewed Feb 28, 2024

View reviewed changes

bambi/backend/pymc.py Show resolved Hide resolved

black formatting and resolve pylint errors

dd278d4

ColCarroll reviewed Mar 4, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

fix package name

b0e94a4

GStechschulte added 2 commits March 19, 2024 18:29

drop 3.9 and add 3.12 to testing matrix

65fd945

change Python versions in requires-python and target-version

4712f1a

remove python 3.11 black target-version

d508214

GStechschulte added 2 commits March 19, 2024 19:35

pin requires-python to <3.13

1d05684

pip upgrade setuptools

f06715e

Bump PyMC to 5.12

ef575d3

Upgrade black and pylint

9bf90a6

remove upgrading of setup tools

9f9d769

GStechschulte merged commit 714ccb7 into bambinos:main Mar 29, 2024
4 checks passed

This was referenced Mar 30, 2024

Type-hint and type-overload a few common Model methods #790

Open

Add convenient function to access list of inference methods #791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `bayeux` to access a wide range of samplers #775

Use `bayeux` to access a wide range of samplers #775

GStechschulte commented Feb 4, 2024 •

edited

Loading

ColCarroll commented Feb 4, 2024

ahartikainen commented Feb 4, 2024

ColCarroll commented Feb 4, 2024

GStechschulte commented Feb 5, 2024

review-notebook-app bot commented Feb 5, 2024

GStechschulte commented Feb 9, 2024 •

edited

Loading

GStechschulte commented Feb 10, 2024 •

edited

Loading

zwelitunyiswa commented Feb 10, 2024 via email

ColCarroll commented Feb 11, 2024

tomicapretto commented Feb 18, 2024

ColCarroll commented Feb 18, 2024

ColCarroll commented Feb 19, 2024

tomicapretto commented Feb 19, 2024

GStechschulte commented Feb 19, 2024

tomicapretto commented Feb 19, 2024

GStechschulte commented Feb 19, 2024

GStechschulte commented Feb 20, 2024

GStechschulte commented Mar 4, 2024

ColCarroll commented Mar 4, 2024

ColCarroll commented Mar 7, 2024

tomicapretto commented Mar 7, 2024 •

edited

Loading

GStechschulte commented Mar 8, 2024

tomicapretto commented Mar 8, 2024

GStechschulte commented Mar 10, 2024

tomicapretto commented Mar 14, 2024 •

edited

Loading

codecov-commenter commented Mar 19, 2024 •

edited

Loading

GStechschulte commented Mar 19, 2024 •

edited

Loading

tomicapretto commented Mar 19, 2024

tomicapretto commented Mar 22, 2024

tomicapretto commented Mar 28, 2024

tomicapretto commented Mar 28, 2024

GStechschulte commented Mar 29, 2024

ColCarroll commented Mar 29, 2024

Use bayeux to access a wide range of samplers #775

Use bayeux to access a wide range of samplers #775

Conversation

GStechschulte commented Feb 4, 2024 • edited Loading

ColCarroll commented Feb 4, 2024

ahartikainen commented Feb 4, 2024

ColCarroll commented Feb 4, 2024

GStechschulte commented Feb 5, 2024

review-notebook-app bot commented Feb 5, 2024

GStechschulte commented Feb 9, 2024 • edited Loading

GStechschulte commented Feb 10, 2024 • edited Loading

zwelitunyiswa commented Feb 10, 2024 via email

ColCarroll commented Feb 11, 2024

tomicapretto commented Feb 18, 2024

ColCarroll commented Feb 18, 2024

ColCarroll commented Feb 19, 2024

tomicapretto commented Feb 19, 2024

GStechschulte commented Feb 19, 2024

tomicapretto commented Feb 19, 2024

GStechschulte commented Feb 19, 2024

GStechschulte commented Feb 20, 2024

GStechschulte commented Mar 4, 2024

ColCarroll commented Mar 4, 2024

ColCarroll commented Mar 7, 2024

tomicapretto commented Mar 7, 2024 • edited Loading

GStechschulte commented Mar 8, 2024

tomicapretto commented Mar 8, 2024

GStechschulte commented Mar 10, 2024

tomicapretto commented Mar 14, 2024 • edited Loading

codecov-commenter commented Mar 19, 2024 • edited Loading

Codecov Report

GStechschulte commented Mar 19, 2024 • edited Loading

tomicapretto commented Mar 19, 2024

tomicapretto commented Mar 22, 2024

tomicapretto commented Mar 28, 2024

tomicapretto commented Mar 28, 2024

GStechschulte commented Mar 29, 2024

ColCarroll commented Mar 29, 2024

Use `bayeux` to access a wide range of samplers #775

Use `bayeux` to access a wide range of samplers #775

GStechschulte commented Feb 4, 2024 •

edited

Loading

GStechschulte commented Feb 9, 2024 •

edited

Loading

GStechschulte commented Feb 10, 2024 •

edited

Loading

tomicapretto commented Mar 7, 2024 •

edited

Loading

tomicapretto commented Mar 14, 2024 •

edited

Loading

codecov-commenter commented Mar 19, 2024 •

edited

Loading

GStechschulte commented Mar 19, 2024 •

edited

Loading