Add support for numpyro and blackjax PyMC samplers #526

markgoodhead · 2022-06-08T10:37:05Z

This is to address #522 and #525 inspired by @zwelitunyiswa's example

I decided to add a single new value to the fit() method which allows switching in of numpyro/blackjax samplers instead of the pymc default. I decided against some cpu/gpu flags because it's mostly decided by whatever Jax can find and the methods I saw to disable GPUs are quite hacky involving playing with your environment variables which I felt is out of scope for a library to be fiddling with so I've just noted this in the documentation instead.

I've tested the samplers locally and they work on one of my personal projects, but I'll try and knock up a simple example shortly which demonstrates them all.

One note: The PyMC 4 release blog post says:

These samplers live in a different submodule sampling_jax but the plan is to integrate them into pymc.sample(backend="JAX").

So we should expect the implementation here to change pretty soon, so I think it's worth keeping the implementation in bambi simple so it's easy to port-over when this happens.

markgoodhead · 2022-06-08T11:56:55Z

OK so I've hit a snag I don't quite understand. My personal project works fine but my test example fails:

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import time

az.style.use("arviz-darkgrid")
rng = np.random.default_rng(0)

size = 1000
x = rng.normal(size=size)
print(x)
data = pd.DataFrame(
    {
        "x": x,
        "y": rng.normal(loc=x, size=size)
    }
)
print(data)

bmb_model = bmb.Model("y ~ x", data)
bmb_model_numpyro = bmb.Model("y ~ x", data)
bmb_model_blackjax = bmb.Model("y ~ x", data)
t0 = time.time()
idata = bmb_model.fit()
t1 = time.time()
idata_numpyro = bmb_model_numpyro.fit(chains=4, tune=1000, draws=1000, sampler_backend="numpyro", chain_method="vectorized")
t2 = time.time()
idata_blackjax = bmb_model_blackjax.fit(chains=4, tune=1000, draws=1000, sampler_backend="blackjax", chain_method="vectorized")
t3 = time.time()

print(f"Default: {t1-t0} Numpyro: {t2-t1} Blackjax: {t3-t2}")

It fails on line

bambi/bambi/backend/pymc.py

Line 322 in 762f30a

idata.posterior[intercept_name] -= np.dot(X.mean(0), coefs).reshape(shape)

during the numpyro run because it appears that one of the idata attributes is read-only... must admit this is quite deep into the weeds of bambi/pymc internals and I'm a little stuck as to how to proceed. It seems like the InferenceData object being returned by numpyro's fit() is inconsistent with the standard one, which is most likely a PyMC bug?

aloctavodia

@markgoodhead Thanks for your contribution. I have a few comments.

bambi/backend/pymc.py

bambi/models.py

Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>

aloctavodia · 2022-06-08T12:44:37Z

OK so I've hit a snag I don't quite understand. My personal project works fine but my test example fails:
import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import time

az.style.use("arviz-darkgrid")
rng = np.random.default_rng(0)

size = 1000
x = rng.normal(size=size)
print(x)
data = pd.DataFrame(
    {
        "x": x,
        "y": rng.normal(loc=x, size=size)
    }
)
print(data)

bmb_model = bmb.Model("y ~ x", data)
bmb_model_numpyro = bmb.Model("y ~ x", data)
bmb_model_blackjax = bmb.Model("y ~ x", data)
t0 = time.time()
idata = bmb_model.fit()
t1 = time.time()
idata_numpyro = bmb_model_numpyro.fit(chains=4, tune=1000, draws=1000, sampler_backend="numpyro", chain_method="vectorized")
t2 = time.time()
idata_blackjax = bmb_model_blackjax.fit(chains=4, tune=1000, draws=1000, sampler_backend="blackjax", chain_method="vectorized")
t3 = time.time()

print(f"Default: {t1-t0} Numpyro: {t2-t1} Blackjax: {t3-t2}")
It fails on line

bambi/bambi/backend/pymc.py

Line 322 in 762f30a

idata.posterior[intercept_name] -= np.dot(X.mean(0), coefs).reshape(shape)

during the numpyro run because it appears that one of the idata attributes is read-only... must admit this is quite deep into the weeds of bambi/pymc internals and I'm a little stuck as to how to proceed. It seems like the InferenceData object being returned by numpyro's fit() is inconsistent with the standard one, which is most likely a PyMC bug?

This is weird, this runs for me with both numpyro and blackjax

…rt of the method argument

…lackjax # Conflicts: # bambi/models.py

markgoodhead · 2022-06-08T12:55:49Z

How odd! Perhaps my environment isn't setup correctly and I'm behind on the latest versions. What versions of pymc/jax/arviz/xarray etc are you using?

I have modified my example script to work now with the new method arg approach (and fixed a bug in chains handling I spotted). Note: I'm not actually sure if I need to construct 3 models - call it paranoia at ruling out bugs in case bmb.Model was stateful between fits 😂

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import time

az.style.use("arviz-darkgrid")
rng = np.random.default_rng(0)

size = 1000
x = rng.normal(size=size)
data = pd.DataFrame(
    {
        "x": x,
        "y": rng.normal(loc=x, size=size)
    }
)

bmb_model = bmb.Model("y ~ x", data)
bmb_model_numpyro = bmb.Model("y ~ x", data)
bmb_model_blackjax = bmb.Model("y ~ x", data)
t0 = time.time()
idata = bmb_model.fit()
t1 = time.time()
idata_numpyro = bmb_model_numpyro.fit(method="nuts_numpyro", chain_method="vectorized")
t2 = time.time()
idata_blackjax = bmb_model_blackjax.fit(method="nuts_blackjax", chain_method="vectorized")
t3 = time.time()

print(f"Default: {t1-t0} Numpyro: {t2-t1} Blackjax: {t3-t2}")

bambi/backend/pymc.py

bambi/models.py

zwelitunyiswa · 2022-06-08T13:07:48Z

Vecrorized does work for both blackjack and numpyro( https://www.pymc.io/projects/docs/en/stable/api/samplers.html) but I found vectorized was slower on cpu (and my gpu won’t work since I have an M1).

…

On Wed, Jun 8, 2022 at 08:44 Osvaldo A Martin ***@***.***> wrote: chain_method="vectorized" This is weird, this runs for me with both numpyro and blackjax — Reply to this email directly, view it on GitHub <#526 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH3QQV3MUOM6VTU6MFJSXATVOCIUBANCNFSM5YGBJLIA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bambi/__init__.py

…parallel"

markgoodhead · 2022-06-08T13:40:17Z

Vecrorized does work for both blackjack and numpyro( https://www.pymc.io/projects/docs/en/stable/api/samplers.html) but I found vectorized was slower on cpu (and my gpu won’t work since I have an M1).
…
On Wed, Jun 8, 2022 at 08:44 Osvaldo A Martin @.> wrote: chain_method="vectorized" This is weird, this runs for me with both numpyro and blackjax — Reply to this email directly, view it on GitHub <#526 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV3MUOM6VTU6MFJSXATVOCIUBANCNFSM5YGBJLIA . You are receiving this because you were mentioned.Message ID: @.>

Yes vectorized is generally faster if you're on a single GPU, otherwise for multiple GPUs or multiple CPU cores I expect parallel would be better.

aloctavodia · 2022-06-08T13:47:21Z

Looks good, the only missing part is a test

markgoodhead · 2022-06-08T14:15:54Z

Looks good, the only missing part is a test

Hmm so I just tried modifying existing tests to also run the new fit methods, e.g.

def test_group_specific_categorical_interaction(crossed_data):
    crossed_data["fourcats"] = sum([[x] * 10 for x in ["a", "b", "c", "d"]], list()) * 3
    model = Model("Y ~ continuous + (threecats:fourcats|site)", crossed_data)
    model.fit(tune=10, draws=10)
    model.fit(tune=10, draws=10, method="nuts_numpyro")

However I again get an import error on from bambi import math at the top of the test file (which works if I comment it out) and my tests fail on the same error my example code gives above 🤦 do you get the same error modifying that test or does it work for you given the example also worked for you?

aloctavodia · 2022-06-08T15:46:43Z

Do you mind adding the test anyway?

markgoodhead · 2022-06-08T16:40:19Z

Do you mind adding the test anyway?

Tests added... Fingers crossed they actually work!

aloctavodia · 2022-06-08T17:11:17Z

Be sure to run black and pylint https://github.com/bambinos/bambi/blob/main/CONTRIBUTING.md#pull-request-checklist

markgoodhead · 2022-06-08T17:37:43Z

Be sure to run black and pylint https://github.com/bambinos/bambi/blob/main/CONTRIBUTING.md#pull-request-checklist

Done 👍

markgoodhead · 2022-06-08T18:13:52Z

One small issue is that pylint isn't happy with the import within the code itself - I assume you're happy to ignore the error here?

tomicapretto · 2022-06-08T18:26:05Z

One small issue is that pylint isn't happy with the import within the code itself - I assume you're happy to ignore the error here?

You could add # pylint: disable=import-outside-toplevel next to the import, I think that should work.

markgoodhead · 2022-06-08T18:42:35Z

I just tried updating my version of xarray (which was 0.21.1 before) to the latest on pip (2022.3.0) and I still get the same xarray error... otherwise my versions are all compatible with the pymc 4.0.0 release on pip. Does anyone else get this error? If not, what versions of xarray etc are you using?

…ar imports and this is needed for tests to work

markgoodhead · 2022-06-08T20:48:05Z

Tests look to be failing due to Jax not being installed (ModuleNotFoundError: No module named 'jax') - @tomicapretto please can you assist? @aloctavodia requested that Jax be an optional install for the user so I guess we just need to add this to the github actions install only or something?

aloctavodia · 2022-06-08T21:50:27Z

We can add jax, numpyro, blackjax and any other necessary requirement for jax-based samplers to https://github.com/bambinos/bambi/blob/main/requirements-dev.txt

tomicapretto · 2022-06-08T23:04:20Z

I think we could have something like requirements-optional.txt like what you can find in ArviZ https://github.com/arviz-devs/arviz. I think requirements-dev.txt should be only for development dependencies.

Below, you will need to add another line saying pip install -r requirements-optional.txt.

bambi/.github/workflows/test.yml

Lines 35 to 36 in ecfdbf2

    
                     pip install -r requirements.txt 
        
                     pip install -r requirements-dev.txt

aloctavodia · 2022-06-08T23:09:22Z

Agreed, that's cleaner.

canyon289 · 2022-06-09T02:34:56Z

Please also ensure the optional dependencies in setup.py is setup correctly for optional requirements

https://stackoverflow.com/a/43090648/414104

…ingly

markgoodhead · 2022-06-09T08:50:25Z

I've added the optional requirements files and hopefully done the setup.py changes @canyon289 requested correctly (all a bit new to me so I could well have done it wrong!). I wasn't sure what versions to specify in the file so I tried to find the equivalents in pymc to align with what they have... and was a bit surprised when I couldn't find any! Perhaps something similar should be added to pymc and then by depending on a specific pymc version this would flow naturally upstream to bambi?

Another thing to note here is that if a user installs jax via this version I believe they won't get CUDA support by default - further downstream libraries like numpyro look to sort of copy the Jax installation instructions in their setup optional structure. I think the best solution overall would be for each part of the library hierarchy to depend on the correct optional install in the sub-library they depend on, e.g. bambi[gpu] would end up calling pymc[gpu] which would call numpyro[gpu] etc... perhaps this is a bit out of scope for this PR though as it requires a lot of co-ordination with other repos and this current solution is a reasonable intermediate step?

markgoodhead · 2022-06-09T09:56:24Z

OK it looks like the tests are failing for the same reason my local environment doesn't work which I've no idea how to fix! Anyone got any advice what I should try/do here?

2022-06-09T09:04:21.8572705Z bambi/models.py:265: in fit
2022-06-09T09:04:21.8572923Z     return self.backend.run(
2022-06-09T09:04:21.8573156Z bambi/backend/pymc.py:91: in run
2022-06-09T09:04:21.8573373Z     result = self._run_mcmc(
2022-06-09T09:04:21.8573593Z bambi/backend/pymc.py:288: in _run_mcmc
2022-06-09T09:04:21.8573880Z     idata = self._clean_mcmc_results(idata, omit_offsets, include_mean)
2022-06-09T09:04:21.8574228Z bambi/backend/pymc.py:363: in _clean_mcmc_results
2022-06-09T09:04:21.8574631Z     idata.posterior[intercept_name] -= np.dot(X.mean(0), coefs).reshape(shape)
2022-06-09T09:04:21.8575105Z /usr/share/miniconda/envs/test/lib/python3.8/site-packages/xarray/core/_typed_ops.py:290: in __isub__
2022-06-09T09:04:21.8575466Z     return self._inplace_binary_op(other, operator.isub)
2022-06-09T09:04:21.8575943Z /usr/share/miniconda/envs/test/lib/python3.8/site-packages/xarray/core/dataarray.py:3121: in _inplace_binary_op
2022-06-09T09:04:21.8576273Z     f(self.variable, other_variable)
2022-06-09T09:04:21.8576690Z /usr/share/miniconda/envs/test/lib/python3.8/site-packages/xarray/core/_typed_ops.py:480: in __isub__
2022-06-09T09:04:21.8577039Z     return self._inplace_binary_op(other, operator.isub)

# Conflicts: # bambi/tests/test_built_models.py

codecov-commenter · 2022-06-09T14:33:09Z

Codecov Report

Merging #526 (478a5ed) into main (f9dc90d) will increase coverage by 0.14%.
The diff coverage is 95.45%.

@@            Coverage Diff             @@
##             main     #526      +/-   ##
==========================================
+ Coverage   86.69%   86.84%   +0.14%     
==========================================
  Files          32       32              
  Lines        2586     2622      +36     
==========================================
+ Hits         2242     2277      +35     
- Misses        344      345       +1

Impacted Files	Coverage Δ
bambi/models.py	`88.65% <ø> (ø)`
bambi/backend/pymc.py	`80.88% <90.00%> (+0.60%)`	⬆️
bambi/tests/test_built_models.py	`99.03% <100.00%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9dc90d...478a5ed. Read the comment docs.

canyon289 · 2022-06-09T16:21:29Z

bambi/backend/pymc.py

+                            **kwargs,
+                        )
+                    else:
+                        raise


Please raise a specific exception with a helpful message

Ah this was the code before I changed this function, it's just been moved around. To be honest I wondered about removing this whole error handling because I've seen pymc do the same thing internally anyway but I thought that might be out of scope for this PR - I'll do whatever is the consensus here 😄

If youre willing editing here would be helpful, but youre right if you just moved the code it can be out of scope! My ask is just open an issue ticket to track and reference this discussion :)

canyon289 · 2022-06-09T16:23:22Z

bambi/tests/test_built_models.py

+    model.fit(method="nuts_blackjax", chain_method="vectorized")
+
+
+def test_regression_blackjax():


Nit: These two tests test_regression_blackjax and test_regression_nunpyro could be parameterized to reduce amount of code that needs to be read or maintained

I agree with @canyon289 here, but if you want @markgoodhead you can open an issue fix this later.

canyon289 · 2022-06-09T16:24:29Z

@markgoodhead thanks for doing this! this is a great capability add for bambi

aloctavodia

LGTM! Thank you @markgoodhead

Add support for numpyro and blackjax PyMC samplers

05d0aec

aloctavodia requested changes Jun 8, 2022

View reviewed changes

bambi/backend/pymc.py Outdated Show resolved Hide resolved

bambi/models.py Outdated Show resolved Hide resolved

bambi/models.py Outdated Show resolved Hide resolved

bambi/models.py Outdated Show resolved Hide resolved

Update bambi/models.py

5e6997f

Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>

mark added 3 commits June 8, 2022 13:47

Lazily import jax sampling. Refactor sampler_backend to instead be pa…

5698b75

…rt of the method argument

Merge remote-tracking branch 'origin/numpyro_blackjax' into numpyro_b…

bdd6f1f

…lackjax # Conflicts: # bambi/models.py

Fix for chains bug in numpyro and blackjax backends

a62a115

Minor error message fix

21aa870

aloctavodia reviewed Jun 8, 2022

View reviewed changes

bambi/__init__.py Outdated Show resolved Hide resolved

mark added 2 commits June 8, 2022 14:18

Rename mcmc-numpyro/blackjax to nuts_numpyro/blackjax

99cc5f0

Remove incorrect statement about chain_method as the default is now "…

dc2609c

…parallel"

Extend tests to also cover numpyro/blackjax samplers

0dfc44c

Run black and pylint

9dbb1f9

Re-run black with latest version

1377b58

Add pylint error ignores for lazy imports

0f4bca6

Re-add math import on init. Lazy importing looks to have fixed circul…

5a7c7fa

…ar imports and this is needed for tests to work

Add optional dependencies for Jax samplers & modify test setup accord…

9ecdc01

…ingly

aloctavodia and others added 4 commits June 9, 2022 09:24

Update pymc.py

324f838

Run black

0621b5b

Add new numpyro/blackjax only tests and revert old tests

f5dfd5e

Merge branch 'main' into numpyro_blackjax

478a5ed

# Conflicts: # bambi/tests/test_built_models.py

markgoodhead requested a review from aloctavodia June 9, 2022 15:00

canyon289 reviewed Jun 9, 2022

View reviewed changes

aloctavodia approved these changes Jun 9, 2022

View reviewed changes

canyon289 merged commit 91903c8 into bambinos:main Jun 10, 2022

canyon289 mentioned this pull request Jun 10, 2022

Merge two tests test_regression_blackjax and test_regression_nunpyro #530

Open

markgoodhead mentioned this pull request Jun 11, 2022

Remove retry logic in mcmc as pymc already does this #532

Closed

markgoodhead deleted the numpyro_blackjax branch June 27, 2022 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for numpyro and blackjax PyMC samplers #526

Add support for numpyro and blackjax PyMC samplers #526

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia left a comment

aloctavodia commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

zwelitunyiswa commented Jun 8, 2022 via email

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

tomicapretto commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia commented Jun 8, 2022

tomicapretto commented Jun 8, 2022

aloctavodia commented Jun 8, 2022

canyon289 commented Jun 9, 2022

markgoodhead commented Jun 9, 2022 •

edited

markgoodhead commented Jun 9, 2022

codecov-commenter commented Jun 9, 2022

canyon289 Jun 9, 2022 •

edited

markgoodhead Jun 9, 2022

canyon289 Jun 9, 2022

canyon289 Jun 9, 2022

aloctavodia Jun 9, 2022

canyon289 commented Jun 9, 2022

aloctavodia left a comment

		model.fit(method="nuts_blackjax", chain_method="vectorized")


		def test_regression_blackjax():

Add support for numpyro and blackjax PyMC samplers #526

Add support for numpyro and blackjax PyMC samplers #526

Conversation

markgoodhead commented Jun 8, 2022 • edited

markgoodhead commented Jun 8, 2022 • edited

aloctavodia left a comment

Choose a reason for hiding this comment

aloctavodia commented Jun 8, 2022 • edited

markgoodhead commented Jun 8, 2022 • edited

zwelitunyiswa commented Jun 8, 2022 via email

markgoodhead commented Jun 8, 2022 • edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 • edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 • edited

aloctavodia commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

tomicapretto commented Jun 8, 2022

markgoodhead commented Jun 8, 2022

markgoodhead commented Jun 8, 2022 • edited

aloctavodia commented Jun 8, 2022

tomicapretto commented Jun 8, 2022

aloctavodia commented Jun 8, 2022

canyon289 commented Jun 9, 2022

markgoodhead commented Jun 9, 2022 • edited

markgoodhead commented Jun 9, 2022

codecov-commenter commented Jun 9, 2022

Codecov Report

canyon289 Jun 9, 2022 • edited

Choose a reason for hiding this comment

markgoodhead Jun 9, 2022

Choose a reason for hiding this comment

canyon289 Jun 9, 2022

Choose a reason for hiding this comment

canyon289 Jun 9, 2022

Choose a reason for hiding this comment

aloctavodia Jun 9, 2022

Choose a reason for hiding this comment

canyon289 commented Jun 9, 2022

aloctavodia left a comment

Choose a reason for hiding this comment

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

aloctavodia commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 8, 2022 •

edited

markgoodhead commented Jun 9, 2022 •

edited

canyon289 Jun 9, 2022 •

edited