---
title: Problem Formulation
math:
  '\abs': '\left\lvert #1 \right\rvert'
  '\Set': '\left\{ #1 \right\}'
  '\mc': '\mathcal{#1}'
  '\M': '\boldsymbol{#1}'
  '\R': '\mathsf{#1}'
  '\RM': '\boldsymbol{\mathsf{#1}}'
  '\op': '\operatorname{#1}'
  '\E': '\op{E}'
  '\d': '\mathrm{\mathstrut d}'
---

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline
SEED = 0

## Mutual information estimation

**How to formulate the problem of mutual information estimation?**

The problem of estimating the mutual information is:

::::{prf:definition} Mutual information estimation
:label: def:MIE

Given $n$ samples

$$(\R{X}_1,\R{Y}_1),\dots, (\R{X}_n,\R{Y}_n) \sim P_{\R{X},\R{Y}}^n$$ 

i.i.d. drawn from an *unknown* probability measure $P_{\R{X},\R{Y}}$, estimate the *mutual information (MI)*

$$
\begin{align}
I(\R{X}\wedge\R{Y}) &:= E\left[\log \frac{d P_{\R{X},\R{Y}}}{d (P_{\R{X}} \times P_{\R{Y}})}(\R{X},\R{Y}) \right],
\end{align}
$$ (eq:MI)

which is an expected density ratio of $(\R{X}, \R{Y})$ from their joint distribution to the product of their marginal distributions.

::::

Run the following code, which uses `numpy` to 
- generate i.i.d. samples from a multivariate gaussian distribution, and
- store the samples as numpy arrays assigned to `XY`.

In [None]:
# Seeded random number generator for reproducibility
XY_rng = np.random.default_rng(SEED)

# Sampling from an unknown probability measure
rho = 1 - 0.19 * XY_rng.random()
mean, cov, n = [0, 0], [[1, rho], [rho, 1]], 1000
XY = XY_rng.multivariate_normal(mean, cov, n)

# The sample is plotted on a scatterplot
plt.scatter(XY[:, 0], XY[:, 1], s=2, label=r"$(x,y):=(\mathsf{X}_i, \mathsf{Y}_i)$")
plt.title(r"Random sample")
plt.xlabel(r"$x$")
plt.ylabel(r"$y$")
plt.legend()
plt.show()

::::{seealso}
:class: dropdown

The official documentations of [`multivariate_normal`][mn] and [`scatter`][sc].

[mn]: https://numpy.org/doc/stable/reference/random/generated/numpy.random.multivariate_normal.html
[sc]: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html


To get help in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html):

- **Docstring**: 
  - Move the cursor to the object and 
    - click `Help->Show Contextual Help` or
    - click `Shift-Tab` if you have limited screen space.

- **Directory**:
  - Right click on a notebook and choose `New Console for Notebook`. 
  - Run `dir(obj)` for a previously defined object `obj` to see the available methods/properties of `obj`.

::::

::::{exercise}
:label: ex:sampling-distribution

 What is unknown about the above sampling distribution?

::::

YOUR ANSWER HERE

To show the data samples using `pandas`:

In [None]:
XY_df = pd.DataFrame(XY, columns=[r"$x$", r"$y$"])
XY_df

To plot the data using `seaborn`:

In [None]:
def plot_samples_with_kde(df, **kwargs):
    """
    This function creates a PairGrid plot of the input DataFrame with Seaborn,
    plotting a scatterplot on the lower triangle, kernel density estimates (KDE)
    on the upper triangle and the diagonals.

    Parameters:
    df (DataFrame): The input DataFrame to be plotted. Each column is a separate
                    data series that will be plotted against each other.

    **kwargs: Additional keyword arguments are passed down to the sns.PairGrid()
              function.

    Returns:
    p (PairGrid): A seaborn.PairGrid object for further customization.

    See also:
    PairGrid allows us to create a grid of subplots using the same plot type to
    visualize data.
    """
    p = sns.PairGrid(df, **kwargs)
    p.map_lower(sns.scatterplot, s=2)  # scatter plot of samples
    p.map_upper(sns.kdeplot)  # kernel density estimate (kde) for joint distribution
    p.map_diag(sns.kdeplot)  # kde for marginal distributions
    p.fig.subplots_adjust(top=0.9)
    return p


plot_samples_with_kde(XY_df)
plt.suptitle("Random sample with density estimates")
plt.show()

::::{exercise} sample normal distribution
:label: ex:sample-gaussian

Complete the following code so that `XY_ref` stores the i.i.d. samples of $(\R{X}',\R{Y}')$ where $\R{X}'$ and $\R{Y}'$ are zero-mean independent gaussian random variables with unit variance.

:::{hint}
:class: dropdown

```python
...
cov_ref, n_ = ..., n
XY_ref = XY_ref_rng_ref.multivariate_normal(mean, ..., n_)
...
```
:::

::::

In [None]:
XY_ref_rng = np.random.default_rng(SEED)
# YOUR CODE HERE
raise NotImplementedError()
XY_ref_df = pd.DataFrame(XY_ref, columns=[r"$x$", r"$y$"])
plot_samples_with_kde(XY_ref_df)
plt.suptitle("Random sample for independent random variables")
plt.show()

## Divergence estimation

**Can we generalize the problem further?**

Estimating MI in [](#def:MIE) may be viewed as a special case of the following problem:

::::{prf:definition} Divergence estimation
:label: DE

For $P_{\R{Z}}\ll P_{\R{Z}'}$, estimate the Kullback-Leibler (KL) *divergence*

$$
\begin{align}
D(P_{\R{Z}}\|P_{\R{Z}'}) &:= E\left[\log \frac{d P_{\R{Z}}}{d P_{\R{Z}'}}(\R{Z}) \right]
\end{align}
$$ (eq:D)

using 
- a sequence $\R{Z}^n:=(\R{Z}_1,\dots, \R{Z}_n) \sim P_{\R{Z}}^n$ of i.i.d. samples from $P_{\R{Z}}$ if $P_{\R{Z}}$ is unknown, and
- a sequence ${\R{Z}'}^{n'}\sim P_{\R{Z}'}^{n'}$ of i.i.d. samples from $P_{\R{Z}'}$  if $P_{\R{Z}'}$, is unknown.

::::

The mutual information can be regarded as the KL divergence from the joint distribution to the product distributions, i.e.,

$$
I(\R{X}\wedge \R{Y}) = D\left(P_{\R{X}, \R{Y}}\middle\|P_{\R{X}}P_{\R{Y}}\right).
$$

::::{prf:remark}

- One may further consider the problem of estimating the *density ratio* $\frac{d P_{\R{Z}}(z)}{d P_{\R{Z}'}(z)}$ or estimate the density $\frac{dP_{\R{Z}}}{d\mu}$ defined respective to some reference measure $\mu\gg P_{\R{Z}}$.
- Although $\R{X}^n$ and $\R{Y}^n$ for MI estimation should have the same length, $\R{Z}^n$ and ${\R{Z}'}^{n'}$ for the divergence estimation can have different lengths, i.e., $n \not\equiv n'$. Why?[^different-block-length]

::::

[^different-block-length]: The dependency between $\R{Z}$ and $\R{Z}'$ does not affect the divergence.

Regarding the mutual information as a divergence from joint to product distributions, the problem can be further generalized to estimtate other divergences such as the $f$-divergence:

For a function $f\in {(-\infty,\infty]}^{[0,\infty)}$ strictly [convex](https://en.wikipedia.org/wiki/Convex_function) with $f(1)=0$, the $f$-divergence from $P_{\R{Z}}$ to $P_{\R{Z}'}\gg P_{\R{Z}}$ is defined as

$$
\begin{align}
D_f(P_{\R{Z}} \| P_{\R{Z}'}) 
&:=
E\left[f\left(\frac{dP_{\R{Z}}}{dP_{\R{Z}'}}(\R{Z}')\right)\right].
\end{align}
$$ (eq:f-D)

YOUR ANSWER HERE

::::{solution} ex:D
:class: dropdown

:::{prf:proof}
:nonumber: true

With $f(u) = u\log u$,

\begin{align}
D_f(P_{\R{Z}} \| P_{\R{Z}'})
&= E\left[\frac{dP_{\R{Z}}}{dP_{\R{Z}'}}(\R{Z}')\log \frac{dP_{\R{Z}}}{dP_{\R{Z}'}}(\R{Z}')\right]\\
&= \int_{z\in \Omega_{\R{Z}}} \textcolor{gray}{d P_{\R{Z}'}(z)} \cdot \frac{d P_{\R{Z}}(z)}{\textcolor{gray}{d P_{\R{Z}'}(z)}} \log \frac{d P_{\R{Z}}(z)}{d P_{\R{Z}'}(z)}.\\
&= E\left[\log \frac{dP_{\R{Z}}}{dP_{\R{Z}'}}(\R{Z})\right]\\
\end{align}

where the last equality is by the property of density ratio.
:::

::::

::::{exercise}
:label: ex:D:non-negativity

For the $f$-divergence to be called a divergence, it must satisfy the property that $D_f(P_{\R{Z}}\|P_{\R{Z}'})\geq 0$ with equality iff $P_{\R{Z}}=P_{\R{Z}'}$. Prove this using Jensen's inequality and the properties of $f$.

::::

YOUR ANSWER HERE