# Estimation

Often times, we can assume we have data that follows a random variable that normally distributed, or distributed according to some other distribution that has parameters. If we do make this assumption though, then how do we find out the value of these parameters?

We saw that the normal distribution has parameters $\mu$ (mean) and $\sigma$ (standard deviation). If we have data that we think came from a normal distribution, but we don't know exactly what kind of normal distribution, then $\mu$ and $\sigma$ are unknown and for all intents and purposes they are random variables with their own PDFs. In this section we will cover how to find the PDFs of these parameters.

Finding these PDFs is very useful! Say we have a bunch of cat heart weight measurements. We think they come from a normal distribution, but we want to know what the mean and standard deviation of that normal distribution is. There won't be an oracle that will tell us exactly, so the next best thing is to find the PDF of these random variables. If we have the PDF of $\mu$ for example, we can find out things like what the most likely value of $\mu$ is, how certain we are that $\mu$ is close to that value, and probabilitiy that $\mu$ is greater than 10 grams. 

#### Bayes Rule

<font color='red'>Bayes rule</font> is a formula that helps us find the PDF of random variables in terms of the PDF of other random variables. Bayes rule is often used in the estimation of parameters. If we're interested in finding the distribution of $\mu$ given the data we saw, then Bayes rule states:

$$p(\,\mu\,|\,\mathrm{Data})=\frac{p(\,\mathrm{Data}\,|\,\mu)\,p(\mu)}{p(\mathrm{Data})} \propto p(\,\mathrm{Data}\,|\,\mu)\,p(\mu)$$

This formula is of course the same for any parameter not just $\mu$. $p(\,\mathrm{Data}\,|\,\mu)$ is called a likelihood and $p(\mu)$ is called a prior. We will discuss what these mean.

#### Likelihoods

A <font color='red'>likelihood</font> is a formula that tells us how likely a certain parameter value is for a certain set of data. Recall that the PDF of a normal distribution in terms of $\mu$ and $\sigma$ is

$$p(\,x\,|\,\mu, \sigma)= \frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{exp}\left\{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right\}$$

Say we have a single cat heart measurement that is 10.2 grams, we assume it's from a normal distribution, and an oracle tells us the variance, $\sigma$ is 2. Then for this data the likelihood of $\mu$ is

$$p(\,\mathrm{Data}\,|\,\mu) = p(10.2\,|\,\mu,\,2) = \frac{1}{\sqrt{8\pi}}\mathrm{exp}\left\{-\frac{1}{2}\left(\frac{10.2-\mu}{2}\right)^2 \right\}$$

We can visually see that the most "likely" value of $\mu$ is 10.2, this is because the likelihood is at a maximum when $\mu=10.2$. In other words, the value of $\mu$ that explains how we got that one data point the best is 10.2, the value of our single measurement.

Now we say we have a second independent measurement of 10.4. The likelihood of $\mu$ for these two points is simply the product of their PDFs. That is

$$p(\,\mathrm{Data}\,|\,\mu)=p(10.2,\,10.4\,|\,\mu,\,2)=\frac{1}{\sqrt{8\pi}}\mathrm{exp}\left\{-\frac{1}{2}\left(\frac{10.2-\mu}{2}\right)^2 \right\} \cdot \frac{1}{\sqrt{8\pi}}\mathrm{exp}\left\{-\frac{1}{2}\left(\frac{10.4-\mu}{2}\right)^2 \right\}$$

If we did some simplification of the above expression we would see the likely value of $\mu$ is 10.3 i.e. that average of our two measurements 10.2 and 10.4. Intuitively this makes sense. The value of $\mu$ that best explains the two data points we saw is the average of the two points. In general if we have $N$ points instead of two and we label each measurement $x_i$ then our likelihood is

$$p(\,\mathrm{Data}\,|\,\mu)= \prod_{i=1}^N\frac{1}{\sqrt{8\pi}}\mathrm{exp}\left\{-\frac{1}{2}\left(\frac{x_i-\mu}{2}\right)^2 \right\}$$

Thus no matter how many measurements or realizations o

#### Priors

#### Estimation in PyMC3