# Generative Models

Goals:
* Introduce probabilistic graphical models in the context of generating mock data
* Explore some of the practical aspects of simulating data
* Walk through a simple mock data generation example 

## References

* Ivezic 
* Bishop 

## Understanding by Modeling

<table><tr width=90%>
<td><img src="../graphics/tour_cluster_image.png" height=300></td>
<td><img src="../graphics/tour_cluster_image_zoom.png" height=300></td>
</tr></table>

Understanding the features in these images means _making a model that can "predict" them_

## When do we want to generate data?

* Inference: we generate noise-free data to compare with our noisy data

* Checking: to investigate whether our model is a good one or not

* Testing: does our analysis recover the input model?

## The Sampling Distribution

* The noisy, mock data are drawn from a PDF known as the "sampling distribution"

* In the case of an X-ray image pixel, containing mock counts $N_k$, this PDF is $P(N_k|\mu_k,H)$

* $\mu_k$ is a parameter, the expected number of counts in the $k^{th}$ pixel

* $H$ is the list of assumptions that defines our model

## Conditional probability

* $P((N_k|\mu_k,H)$ is pronounced "the probabilty for $N_k$ given $\mu_k$ and $H$"

* This means that if you know what the value of $\mu_k$ is, you can _draw_ a sample value of $N_k$ from $P$ - since  $H$ tells you what the functional form of $P$ is.

## Sampling in practice

* In general, sampling from a PDF is difficult

* For certain standard distributions, however, there are fast algorithms

In [1]:
import scipy.stats

P = scipy.stats.poisson(mu=3)
P.rvs(size=10)

array([4, 4, 5, 1, 4, 5, 2, 4, 1, 4])

## Sampling in practice

* [numpy.random]() and [scipy.stats]() are two useful libraries for drawing samples from PDFs

* You may have encountered some of these routines as "random number generators"

* Sampling from a PDF means generating random numbers from that distribution

## Choosing the input model parameters

* When testing, we often want to assert a set of input model parameters $\theta$, and then see what they produce

* Testing at large scale might involve generating many datasets, with different inputs. In this case we might want to sample from a *plausible distribution of input parameters*

* In practice: choose a particular standard probability distribution for $\theta$ and sample from it

## Deterministic relationships

* Often, our model provides a deterministic relationship between parameters: if you know $\theta$, then you know $\mu$

* In our X-ray image case, $\theta$ could be the set of parameters that describe the gas temperature and density profiles of a spehrically symmetric cluster of galaxies with known centroid position

* $\mu(\theta)$ would then be some complicated function that took the cluster parameters $\theta$ as input, and predicted the expected counts at any pixel position.

## Probabilistic Graphical Models

* This procedure for simulating a mock X-ray image dataset can be usefully illustrated with a _directed acyclic graph_ called a "Probabilstic Graphical Model"

* In the present context these provide something like a "flow diagram" showing how we might draw a sample mock dataset from our model

* Let's look at the PGM for a simple X-ray image simulation 
