# Hierarchical Models

Goals:
* Introduce ourselves to the use of nuisance parameters and structure in models.
* Practice devising models for more complex situations than we've seen so far.

## References

* A
* B
* C

## Nuisance parameters

A **nuisance parameter** is any model parameter that we are not particularly interested in at the end of the day.

This is an entirely subjective label - nuisance parameters are not formally treated any differently than other parameters.

### Nuisance parameters to encode uncertainty

Including nuisance parameters in a model provides a way to *explicitly account for and marginalize over systematic uncertainties*.

This can be as simple as assigning a prior distribution to some quantity that would otherwise remain fixed, or it can involve a more explicit expansion of the model.

Often nuisance parameters represent **latent variables** - things that are logically or physically part of a model, but which cannot be directly measured.

### Example: measuring flux with a background

Recall our simple measurement of a galaxy's flux based on a number of detected counts from the [Bayes Theorem chunk](bayes_theorem.ipynb):
* $N|\mu \sim \mathrm{Poisson}(\mu)$
* $\mu \sim \mathrm{Gamma}(\alpha,\beta)$

### Example: measuring flux with a background

Let's explicitly include a constant conversion between counts and flux:

* $N|\mu \sim \mathrm{Poisson}(\mu)$
* $\mu = C\,F$
* $F \sim \mathrm{Gamma}(\alpha,\beta)$

Q: Why does it make sense to keep a Gamma prior on $F$ (with slightly redefined parameters)?

### Example: measuring flux with a background
The colored arrow here means "deterministically related".
<table>
    <tr>
        <td><img src="../graphics/hier_poissoneg_pgm1.png" width=100%></td>
    </tr>
</table>

Q: What secret message describing this class is encoded in the PGM?

### Example: measuring flux with a background

Now let's expand the model to account for the fact that we actually measure counts from both the galaxy and the background. Being good astronomers, we also remembered to take a second measurement of a source-free (background-only) patch of sky.

Subscripts: $b$ for background, $g$ for galaxy, $s$ for the science observation that includes both.

### Example: measuring flux with a background
<table>
    <tr>
        <td><img src="../graphics/hier_poissoneg_pgm2.png" width=100%></td>
    </tr>
</table>

### Example: measuring flux with a background
* $N_b|\mu_b \sim \mathrm{Poisson}(\mu_b)$
* $\mu_b = C_b\,F_b$
* $F_b \sim \mathrm{Gamma}(\alpha_b,\beta_b)$
* $N_s|\mu_s \sim \mathrm{Poisson}(\mu_s)$
* $\mu_s = C_s(F_g+F_b)$
* $F_g \sim \mathrm{Gamma}(\alpha_g,\beta_g)$

The background quantities could be regarded as nuisance parameters here - we need to account for our uncertainty in $F_b$, but measuring it isn't the point of the analysis.

## Hierarchical models

Often, especially in physics, the model for a data set naturally takes a hierarchical structure.
* e.g. measurements of multiple sources inform us about a source *class*

In statistics, this is related to the concept of **exchangeability** - as far as we know, individual sources of a given class are equivalent (until we measure them).

### Hierarchical models

In practice, the hierarchy usually takes the form of *common priors* for the individual measured sources.
* The prior parameters describe the statistical properties of the source class, and are often what we're trying to measure.
* Those prior parameters are therefore left free, with priors of their own, aka hyperpriors.

### Hierarchical models

General form:
1. $P(x|\theta)$ describes the measurement process
2. $P(\theta)$ decomposes as $P(\theta|\phi_1)\,P(\phi_1)$
2. $P(\phi_1)$ decomposes as $P(\phi_1|\phi_2)\,P(\phi_2)$
3. $\ldots$ $P(\phi_n)$, usually taken to be "uninformative".

### Example: galaxy luminosity function

Let's modify the previous example as follows
* We're interested in luminosity rather than flux - if we know the distance to the target, this just means including another known factor in $C_g$, which now converts counts to $L$.
* We'll measure $m>1$ galaxies, and are interested in constraining the luminosity function, traditionally modelled as

$n(x) = \phi^\ast x^\alpha e^{-x}; \quad x=\frac{L}{L^\ast}$

Here $n$ is the number density of galaxies.

### Example: galaxy luminosity function

* For simplicity, we'll assume that we've measured *every* galaxy above a given luminosity in some volume. This is not very realistic, but we'll tackle the issues raised by incomplete data sets some other time.
* Let's also assume that the same background applies to each galaxy measurement, and that we have a galaxy-free observation of it, as before.

Now - what does the PGM for our experiment look like?

### Example: measuring flux with a background
Compressing the $L\rightarrow N$ and $F\rightarrow N$ conversions,
<table>
    <tr>
        <td><img src="../graphics/hier_poissoneg_pgm3.png" width=100%></td>
    </tr>
</table>

## Stuff

## Stuff

## Stuff

## Stuff

## Stuff

## Stuff