In [1]:
import pandas as pd
import numpy as np

import bambi as bmb


# define model data
data = pd.DataFrame(
    {
        "y": np.random.normal(size=50),
        "g": np.random.choice(["Yes", "No"], size=50),
        "x1": np.random.normal(size=50),
        "x2": np.random.normal(size=50),
    }
)

  import pandas.util.testing as tm


In [2]:
# define and fit model with MCMC
model = bmb.Model("y ~ x1 + x2", data, family="gaussian")
idata = model.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [y_sigma, x2, x1, Intercept]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 33 seconds.


In [3]:
chain_n = len(idata.posterior.coords.get("chain"))
draw_n = len(idata.posterior.coords.get("draw"))

In [4]:
x1 = idata.posterior["x1"]
x1_stacked = x1.stack(samples=("chain", "draw"))
x1_stacked

Have a look at the MultiIndex above. The order we used for the stacking is `("chain", "draw")`. It is like saying

In [5]:
samples = [None] * chain_n * draw_n
for i in range(chain_n):
    for j in range(draw_n):
        idx = i * draw_n + j
        samples[idx] = (i, j)

In [6]:
print(samples[:4])
print(samples[-4:])

[(0, 0), (0, 1), (0, 2), (0, 3)]
[(3, 996), (3, 997), (3, 998), (3, 999)]


In terms of [NumPy documentation](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) it is using `order="C"` because the last axis index changes fastest.

If you take the values out of that `xarray.DataArray`, you need to make sure that `.stack()` and `.reshape()` are performing the operations in the same consistent manner.

In this case, they both use `order="C"` and that's why the following recovers the original values.

In [7]:
print(x1_stacked.shape)
x1_stacked.values

(2000,)


array([0.09471466, 0.03987166, 0.1564112 , ..., 0.11219181, 0.14391421,
       0.02960541])

In [8]:
x1_stacked.values.reshape((chain_n, draw_n))

array([[ 0.09471466,  0.03987166,  0.1564112 , ...,  0.15913814,
         0.15913814,  0.09562304],
       [-0.06398758,  0.27575427, -0.11687804, ...,  0.11219181,
         0.14391421,  0.02960541]])

In [9]:
x1_stacked.values.reshape((chain_n, draw_n)) == x1.values

array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])

In summary, it would be good to test that you're effectively applying a function and its inverse (in this case, `.stack()` and `.reshape()`).

If `.reshape()` worked in a different way, it would not be its inverse.

One case where you need to pay attention to the order of the dims, is when you do `.stack()` on only a subset of the dimensions. Let's see the following example

In [10]:
y = idata.log_likelihood["y"]
y

In [11]:
y_stacked = y.stack(samples=("chain", "draw"))
y_stacked


Note the `samples` dim was sent to the tail of the dimensions. Let's see the following comparison.

In [12]:
y_stacked.values.reshape((chain_n, draw_n, 50)) == y.values

array([[[ True, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False,  True]]])

All `False`! That's because `y_stacked` is of shape `(50, 2000)`. We can fix it in several ways

Something that seems to work, but it's not very clear

In [13]:
y_stacked.values.T.reshape((chain_n, draw_n, 50)) == y.values

array([[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]]])

A better solution: transpose being explicit on the dimensions.

In [14]:
y_stacked = y.stack(samples=("chain", "draw")).transpose("samples", "y_dim_0")
y_stacked


In [15]:
y_stacked.values.reshape((chain_n, draw_n, 50)) == y.values

array([[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]]])