# LZ78 Usage Tutorial: Probability Source

## Prerequisites
1. Follow the setup instructions in `tutorials/README.md`
2. In the same Python environment as you used for that tutorial, run `pip install ipykernel`
3. Use that Python environment as the kernel for this notebook.

## Important Note
Sometimes, Jupyter doesn't register that a cell containing code from the `lz78` library has started running, so it seems like the cell is waiting to run until it finishes.
This can be annoying for operations that take a while to run, and **can be remedied by putting `stdout.flush()` at the beginning of the cell**.

## Imports

In [None]:
from lz78 import DirichletLZ78Source, DiracDirichletLZ78Source, DiscreteThetaLZ78Source, mu_k
from sys import stdout
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import gamma as GammaFunc
import scipy.special as sp

## LZ78 Probability Source

[The LZ78 Source](https://arxiv.org/abs/2503.10574) describes a probability source that draws symbols from the sequential probability assignment of [A Family of LZ78-based Universal Sequential Probability Assignments](https://arxiv.org/abs/2410.06589), starting with just a single root node and traversing/building the tree as symbols are drawn.

[The LZ78 Source](https://arxiv.org/abs/2503.10574) proves results on:
1. the entropy rate of the source,
2. convergence of a realization's log probability to the entropy rate, and 
3. the almost-sure convergence of any fixed-order empirical entropy (of a realization) to a quantity that is a "Jensen gap" larger than the entropy rate.

This tutorial walks through how these theoretical results manifest in practice.

## 1. Types of Probability Sources Supported

The LZ78 source is equivalent to the following formulation:
1. Choose a prior distribution, $\Pi$, on the simplex of probability mass functions over your alphabet.
2. For every new node of the LZ78 tree, draw $\Theta \sim \Pi$.
3. At the current node of the LZ78 tree, draw the next symbol according to the corresponding $\Theta$ value.

So, an LZ78 source is characterized by the prior distribution, $\Pi$.

### 1.1 Supported Priors
There are three types of prior distributions supported:
1. **Dirichlet**: a Dirichlet($\gamma, \dots, \gamma$) prior, which corresponds to the prior used in the `LZ78SPA` class.
2. **Discrete** (only for binary alphabets): $\Pi$ is some probability mass function over discrete points on the $[0, 1]$ interval.

    **Note:** the proofs of the theoretical results for the LZ78 source require $\Pi$ to have support on the full simplex, which does not hold for this distribution.

3. **Dirac-Dirichlet Mixture** (only for binary alphabets): this is a perturbation of the above discrete distribution such that $\Pi$ has support on the full $[0, 1]$ interval.
    
    This prior places weight `dirichlet_weight` on a Dirichlet($\gamma, \dots, \gamma$) distribution and weight `1-dirichlet_weight` on a distribution with equal-height point masses at `dirac_loc` and `1-dirac_loc`.

### 1.2 Dirichlet Prior Example

In [None]:
GAMMA = 0.5
lz78_source = DirichletLZ78Source(
    alphabet_size=2, gamma=GAMMA, seed=123
)

#### **Entropy Rate**
The entropy rate for a Dirichlet LZ78 source over a binary alphabet is available in closed form (courtesy of Mathematica).

In [None]:
SQRT_PI = np.sqrt(np.pi)
LOG2 = np.log(2)
EULER_MASCH = 0.57721566490153286060651209008240243104215933593992
def harmonic_number(n):
    return sp.digamma(n + 1) + EULER_MASCH

def binary_entropy(p):
    if p == 0 or p == 1:
        return 0
    return -p * np.log2(p) - (1-p) * np.log2(1-p)

def compute_lz78_dirichlet_entropy_rate(a):
    if a == 0:
        return 0
    return -2 * GammaFunc(2*a) * (
        4**(-a) * SQRT_PI * GammaFunc(a) * (
            harmonic_number(a) - harmonic_number(2*a)
        )
    ) / (
        GammaFunc(a + 1/2) * GammaFunc(a)**2 * LOG2
    )

In [None]:
entropy_rate = compute_lz78_dirichlet_entropy_rate(GAMMA)
entropy_rate

#### **Generating symbols and recording the scaled log probability at intervals**:

The `generate_symbols` instance method returns the total log probability of the symbols generated in that function call.

The `get_scaled_log_loss` method returns the scaled log probability of all symbols generated thus far.

In [None]:
stdout.flush()
lz78_source = DirichletLZ78Source(alphabet_size=2, gamma=GAMMA, seed=123)
ns = [int(round(x)) for x in np.logspace(1, 7, 20)]

prev_n = 0
losses = []
for n in ns:
    lz78_source.generate_symbols(n - prev_n)
    prev_n = n
    losses.append(lz78_source.get_scaled_log_loss())


In [None]:
plt.figure(figsize=(12,4))
plt.plot(
    ns, np.ones(len(ns)) * entropy_rate,
    'k--', linewidth=4, label="Entropy Rate"
)
plt.plot(
    ns, losses, "-o", linewidth=2, markersize=4,
    color="red", label="Log Probabilities"
)
plt.xscale("log")
plt.legend(fontsize=12)
plt.grid(True)
plt.title(f"Log Probability of Sequences from LZ78 Source\n(Dirichlet Prior with $\gamma=${GAMMA})", fontdict={"size": 18})
plt.xlabel("Number of Symbols", fontdict={"size": 15})
plt.ylabel("Log Probability", fontdict={"size": 15})
plt.tick_params(labelsize=12)
plt.show()

### 1.3 Discrete Prior Example

Let's consider the following prior:

In [None]:
THETA_VALUES = [0.1, 0.5, 0.8]
PROBABILITIES = [0.3, 0.5, 0.2]

plt.figure(figsize=(10,3))
plt.stem(
    THETA_VALUES, PROBABILITIES, "r"
)
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.grid(True)
plt.title(f"Prior Distribution", fontdict={"size": 16})
plt.xlabel("Theta", fontdict={"size": 12})
plt.ylabel("Probability", fontdict={"size": 12})
plt.show()

#### **Generating symbols and recording the scaled log probability at intervals**:

In [None]:
stdout.flush()
lz78_source = DiscreteThetaLZ78Source(THETA_VALUES, PROBABILITIES)
ns = [int(round(x)) for x in np.logspace(1, 7, 20)]

prev_n = 0
losses = []
for n in ns:
    lz78_source.generate_symbols(n - prev_n)
    prev_n = n
    losses.append(lz78_source.get_scaled_log_loss())


In [None]:
entropy_rate = 0
for (theta, prob) in zip(THETA_VALUES, PROBABILITIES):
    entropy_rate += prob * binary_entropy(theta)

In [None]:
plt.figure(figsize=(12,4))
plt.plot(
    ns, np.ones(len(ns)) * entropy_rate,
    'k--', linewidth=4, label="Entropy Rate"
)
plt.plot(
    ns, losses, "-o", linewidth=2, markersize=4,
    color="red", label="Log Probabilities"
)
plt.xscale("log")
plt.legend(fontsize=12)
plt.grid(True)
plt.title(f"Log Probability of Sequences from LZ78 Source\n(Point Mass Prior)", fontdict={"size": 18})
plt.xlabel("Number of Symbols", fontdict={"size": 15})
plt.ylabel("Log Probability", fontdict={"size": 15})
plt.tick_params(labelsize=12)
plt.show()

### 1.4 Dirac-Dirichlet Mixture Example

Now, conider a prior distribution with weight 0.1 on a uniform distribution and weight 0.9 on the distribution with point masses at 0.2 and 0.8.

In [None]:
stdout.flush()
lz78_source = DiracDirichletLZ78Source(gamma=1, dirichlet_weight=0.1, dirac_loc=0.2)
ns = [int(round(x)) for x in np.logspace(1, 7, 20)]

prev_n = 0
losses = []
for n in ns:
    lz78_source.generate_symbols(n - prev_n)
    prev_n = n
    losses.append(lz78_source.get_scaled_log_loss())

In [None]:
def compute_lz78_dirac_dirichlet_entropy_rate(gamma, dirichlet_weight, dirac_loc):
    dirac_entropy = binary_entropy(dirac_loc)
    dirichlet_entropy = compute_lz78_dirichlet_entropy_rate(gamma)
    return (1-dirichlet_weight) * dirac_entropy + dirichlet_weight * dirichlet_entropy
entropy_rate = compute_lz78_dirac_dirichlet_entropy_rate(1, 0.1, 0.2)

In [None]:
plt.figure(figsize=(12,4))
plt.plot(
    ns, np.ones(len(ns)) * entropy_rate,
    'k--', linewidth=4, label="Entropy Rate"
)
plt.plot(
    ns, losses, "-o", linewidth=2, markersize=4,
    color="red", label="Log Probabilities"
)
plt.xscale("log")
plt.legend(fontsize=12)
plt.grid(True)
plt.title(f"Log Probability of Sequences from LZ78 Source\n(Dirac-DirichletPrior)", fontdict={"size": 18})
plt.xlabel("Number of Symbols", fontdict={"size": 15})
plt.ylabel("Log Probability", fontdict={"size": 15})
plt.tick_params(labelsize=12)
plt.show()