### Estimation
If we are given a random sample from a distribution, and we are asked to find the *mean* of the distribution itself, one way we could do so is to use the *mean* of the sample as an estimate of the *mean* of the distribution. This process is called **estimation**, and the statistic that we used, which is the sample mean, is the **estimator**.

In [1]:
import pandas as pd
import numpy as np

randomSample = pd.Series([-0.441, 1.774, -0.101, -1.138, 2.975, -2.138])
randomSample.mean()

0.15516666666666667

However, what if the random sample of a distribution had outliers? Would taking the mean of the sample to estimate the mean of the distribution be the best choice?

In [2]:
randomSample2 = pd.Series([-0.441, 1.774, -0.101, -1.138, 2.975, -213.8])
randomSample2.mean()

-35.121833333333335

In this case, the sample mean has been skewed by an outlier, which also affects our estimation of the distribution mean.

One option is to identify and discard outliers, then compute the sample mean of the values left. Here, we're keeping all the values within 4 units from 0, and discarding the rest:

In [6]:
randomSample3 = randomSample2.where(np.abs(randomSample2) < 4)
randomSample3.mean()

0.6138

Another option is to use the median as an estimator, like so:

In [9]:
randomSample2.median()

-0.271

Picking the best estimator depends on the circumstances, like whether there are outliers, and on what the goal is.

If there are no outliers, the sample mean minimizes the **mean squared error**, which looks like this:

In [None]:
def MeanSqrError(sample, distribMean):
    mean = sample.mean()
