In [1]:
import arviz as az
import numpy as np

# Generate some fake data:
n_chains = 2
n_draws = 1000
n_params = 3

# Posterior samples for a parameter "theta" of shape (chains, draws, params)
theta = np.random.randn(n_chains, n_draws, n_params)
# Fake log-probabilities for the sample stats (shape: chains x draws)
log_prob = np.random.randn(n_chains, n_draws)

# Define dims: here, the non-sample dimension for theta is named "theta_id"
dims = {"theta": ["theta_id"]}

# Define coords: the coordinate "theta_id" has three values (one for each parameter)
coords = {"theta_id": ["alpha", "beta", "gamma"]}

# Define sample statistics:
sample_stats = {"lp": log_prob}

# Define some custom attributes (these can be any additional metadata)
attrs = {"model": "Example Mixture Model", "n_params": n_params}

# Create the InferenceData object using arviz.from_dict
idata = az.from_dict(
    posterior={"theta": theta},  # theta is a (chains x draws x params) array
    dims=dims,
    coords=coords,
    sample_stats=sample_stats,
    attrs=attrs
)

# Now you can access the posterior variable "theta"
print("Shape of theta:", idata.posterior.theta.values.shape)
# Expected output: Shape of theta: (2, 1000, 3)

# You can also inspect the coordinates:
print("Coordinates for 'theta_id':", idata.coords["theta_id"].values)


Shape of theta: (2, 1000, 3)


AttributeError: 'InferenceData' object has no attribute 'coords'

In [2]:
idata

In [None]:
Below is a revised version of your introduction that is written in a more accessible, reader-friendly style while preserving all the essential mathematical details:

---

Normalizing flows provide a flexible framework for modeling complex, high-dimensional distributions. The central idea is to start with a simple base distribution—often a multivariate normal—and transform it into a more complicated target distribution using a smooth, invertible mapping (a diffeomorphism). In our formulation, a normalizing flow maps samples from the base distribution \(q(\vec{U})\) to the target distribution \(p(\vec{Y})\) via a transformation \(\mathscr{f}\) with parameters \(\theta^\mathscr{f}\). Formally, if \(\vec{b}\) is a sample from the base distribution, then

\[
\vec{y} = \mathscr{f}(\vec{b}; \theta^\mathscr{f}), \quad \vec{b} \sim p(\vec{B}; \theta^b), \quad \vec{b} \in \mathcal{B}, \quad \vec{y} \in \mathcal{Y}.
\]

The transformation \(\mathscr{f}\) is a diffeomorphism, meaning it is smooth and invertible. More precisely, we require that

\[
\begin{aligned}
\mathscr{f} &: \mathcal{B} \to \mathcal{Y}, \\
\mathscr{f}^{-1} &: \mathcal{Y} \to \mathcal{B}, \\
\mathscr{f} &\in C^\infty(\mathcal{B}, \mathcal{Y}), \quad \mathscr{f}^{-1} \in C^\infty(\mathcal{Y}, \mathcal{B}).
\end{aligned}
\]

Once we have defined the mapping, we can compute the probability density of the transformed variable using the change of variables formula. If \(\pmb{J}^\mathscr{f}(\vec{b})\) denotes the Jacobian matrix of \(\mathscr{f}\) at \(\vec{b}\), then

\[
p(\vec{y}) = p(\vec{b}) \cdot \left|\det\!\left(\pmb{J}^\mathscr{f}(\vec{b})\right)\right|^{-1},
\]
or equivalently,
\[
p(\vec{y}) = p\!\left(\mathscr{f}^{-1}(\vec{y})\right) \cdot \left|\det\!\left(\pmb{J}^{\mathscr{f}^{-1}}(\vec{y})\right)\right|.
\]
Here, the Jacobian matrix is given by

\[
\pmb{J}^\mathscr{f}(\vec{b}) =
\begin{bmatrix}
\frac{\partial \vec{y}_1}{\partial \vec{b}_1} & \dots & \frac{\partial \vec{y}_1}{\partial \vec{b}_K} \\[1mm]
\vdots & \ddots & \vdots \\[1mm]
\frac{\partial \vec{y}_K}{\partial \vec{b}_1} & \dots & \frac{\partial \vec{y}_K}{\partial \vec{b}_K}
\end{bmatrix}.
\]

For learning the target distribution, it is crucial to evaluate the density efficiently. This requires fast computation of both the inverse transformation \(\mathscr{f}^{-1}\) and the determinant of its Jacobian. Likewise, efficient sampling from the flow depends on quickly computing \(\mathscr{f}\) itself. In many practical implementations, the design of the transformation ensures that the Jacobian is triangular, which significantly simplifies the determinant calculation (see, e.g., \textcite{papamakarios_normalizing_2019} for a detailed discussion).

Typically, the base distribution is chosen as a multivariate normal (MVN), whose support is all of \(\mathbb{R}^K\). This choice works well for many applications, but it can create challenges when the target distribution naturally lives on a space with non-Euclidean geometry—such as on tori or spheres \cite{rezende_normalizing_2020, gemici_normalizing_2016}—or on compact subsets of \(\mathbb{R}^K\), like a ball or a convex polytope \(\mathcal{F} \subset \mathbb{R}^K\). In such cases, a diffeomorphism mapping the base distribution onto the entire target space may not exist.

One way to address these challenges is to increase the flexibility of the mapping by composing several simpler diffeomorphisms. Instead of a single transformation \(\mathscr{f}\), we define a composite mapping as

\[
\mathscr{f} = \mathscr{f}_L \circ \mathscr{f}_{L-1} \circ \cdots \circ \mathscr{f}_1,
\]
where each \(\mathscr{f}_\ell\) is a diffeomorphism between intermediate spaces (with \(\mathcal{H}_0 = \mathcal{B}\) and \(\mathcal{H}_L = \mathcal{Y}\)). The change of variables formula for the composite flow becomes

\[
p(\vec{y}) = p(\vec{b}) \prod_{\ell=1}^L \left|\det\!\left(\pmb{J}^{\mathscr{f}_\ell}(\vec{h}_{\ell-1})\right)\right|^{-1},
\]
where \(\vec{h}_0 = \vec{b}\) and \(\vec{h}_\ell = \mathscr{f}_\ell(\vec{h}_{\ell-1})\). This layered construction enables us to model highly complex transformations by combining simpler, tractable ones.

Pushing this idea further, one may consider the limit as the number of composed flows tends to infinity. In this continuous limit, the discrete sequence of transformations is replaced by a continuous evolution governed by an ordinary differential equation (ODE). This approach leads to **continuous normalizing flows (CNFs)** or, equivalently, **neural ODEs**. In the continuous formulation, we describe the transformation by a time-dependent state \(\vec{h}(t)\) that evolves according to

\[
\frac{d\vec{h}(t)}{dt} = \mathbf{g}(\vec{h}(t), t; \theta), \quad \vec{h}(0) = \vec{b}, \quad \vec{h}(T) = \vec{y},
\]
where \(\mathbf{g}\) is an instantaneous velocity field typically modeled by a neural network. The corresponding evolution of the density is given by

\[
\frac{d\log p(\vec{h}(t))}{dt} = -\operatorname{Tr}\!\left(\frac{\partial \mathbf{g}}{\partial \vec{h}(t)}\right),
\]
and integration over time yields

\[
p(\vec{y}) = p(\vec{b}) \exp\!\left(-\int_0^T \operatorname{Tr}\!\left(\frac{\partial \mathbf{g}}{\partial \vec{h}(t)}\right) dt\right).
\]
This continuous perspective not only offers a smooth, adaptive way to model transformations but also provides an alternative viewpoint to the discrete stacking of flows.

A recent and promising development in this area is **flow matching**. Traditional training of normalizing flows via maximum likelihood requires the computation of the Jacobian determinant, which can be challenging in high dimensions. Flow matching offers a different approach by directly aligning the model’s instantaneous velocity field \(\mathbf{g}(\vec{h}(t), t; \theta)\) with a target vector field derived from the data. By constructing a loss function that penalizes discrepancies between the two, flow matching avoids the need for explicit determinant calculations, potentially simplifying optimization and enhancing stability (see, e.g., \textcite{flow_matching_reference}).

In summary, normalizing flows transform a simple base distribution into a complex target distribution through smooth, invertible mappings. Enhancements such as composing multiple diffeomorphisms, taking the continuous limit to form neural ODEs, and employing novel training methods like flow matching all contribute to the robustness and flexibility of these models. These advances allow normalizing flows to be applied even when the target distribution has a non-Euclidean geometry or is defined on a compact support, such as in the case of convex polytopes.

--- 

This version maintains the mathematical rigor of your original text while presenting the ideas in a clearer and more reader-friendly manner.