# Chapter 10 - Notes

## Set Up

### Packages

In [1]:
import os

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import seaborn as sns
from scipy import stats
from sklearn.preprocessing import StandardScaler



### Defaults

In [2]:
# seaborn defaults
sns.set(
    style="whitegrid",
    font_scale=1.2,
    rc={
        "axes.edgecolor": "0",
        "axes.grid.which": "both",
        "axes.labelcolor": "0",
        "axes.spines.right": False,
        "axes.spines.top": False,
        "xtick.bottom": True,
        "ytick.left": True,
    },
)

colors = sns.color_palette()

### Constants

In [3]:
DATA_DIR = "../data"
HOWELL_FILE = "howell.csv"
CHERRY_BLOSSOMS_FILE = "cherry_blossoms.csv"
WAFFLE_DIVORCE_FILE = "waffle_divorce.csv"
MILK_FILE = "milk.csv"

RANDOM_SEED = 42

In [4]:
def load_data(file_name, data_dir=DATA_DIR, **kwargs):
    path = os.path.join(data_dir, file_name)
    return pd.read_csv(path, **kwargs)

## 10.1 Maximum entropy

### 10.1.1 Gaussian

The Gaussian distribution is the highest entropy continuous distribution with a given (finite) variance.

### 10.1.2 Binomial

The Binomial distribution is the highest entropy discrete distribution with two possible unordered outcomes and a given (finite) expected value.
More precisely it is the maximum extropy distribution among generalised binomial distributions (a distribution of the sum of $n$ independent but not identically distributed Bernoulli variables) with fixed expected value.

## 10.2 General linear models

### 10.2.1 Meet the family

The Exponential distribution has maximum entropy among all non-negative continuous distributions with a given expected value.
It arises as the distribution of the difference (in time or space) between subsequent independent events, when those events occur at a constant rate.

The Gamma distribuiton has maximum entropy among all non-negative continuous distributions with a given expected value and expected log logarithm.
It arises as a sum of Exponential distributions, or in other words, as the distribution of the time it takes for $k$ subsequent independent events that occur at a constant rate.

The Poisson distribution is the limit of the Binomial distribution as $n\to \infty$ with $\lambda=np$ as the rate of events.
It arises as the number of events in a fixed time period, where the events are independent and occur at a constant rate.
It is used for counts that never approach any theoretical maximum. More precisely, it has maximum entropy among the set of generalised Binomial distributions fixed expected value and $n\to\infty$.