# Estimation

In [1]:
N = 1048 # population
mu = 37.72 # population mean
sigma = 16.04 # population standard deviation

### Distribution of sample mean

In [2]:
from math import sqrt

n = 35 # sample size
mean = mu # mean of sample distribution is equal population mean (mu), @central_limit_theorem
sd = sigma / sqrt(n) # sd or SE @standard_error
sd

2.711254849169081

#### Point Estimation 

In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown population parameter (for example, the population mean).

![images](https://cqeacademy.com/wp-content/uploads/2017/12/Point-Estimate-Interval-Estimate.jpg)

### Margin of Error!!

    2 * sigma / sqrt(n)
    

![images](https://www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/images/263/26343.png)

In [3]:
margin_of_error = 2 * sigma / sqrt(n)
margin_of_error

5.422509698338162

### Confidence Interval Bounds

    mean + 2*sigma / sqrt_root(n) > mean > mean - 2*sigma / sqrt_root(n)

In [4]:
print(mean, sd)

37.72 2.711254849169081


In [5]:
def confidence_interval_bounds(mean, se, samples=None):
    """
         se: standard error!
    """
    left = mean - (2 * se)
    right  = mean + (2 * se)
    
    return (left, right)

confidence_interval_bounds(40, 2.71)

(34.58, 45.42)

## Z-table

In [6]:
import scipy.stats as st

# exact z score
def exact_z_score(left, right):
    left = st.norm.ppf(left)
    right = st.norm.ppf(right)
    
    return (left, right)

left, right = exact_z_score(.025, .975)
print(left, right)

-1.9599639845400545 1.959963984540054


#### 95% Exact CI

In [7]:
print(40 + (left * sd), 40 + (right * sd))

34.68603814271902 45.31396185728097


### CI for Larger Sample Size(n=250)

In [8]:
_sd = 1.01
print(40 + (left * _sd), 40 + (right * _sd))

38.020436375614544 41.979563624385456


### Z for 98% CI

In [9]:
left, right = exact_z_score(.01, .99)
print(left, right)

-2.3263478740408408 2.3263478740408408


### Find 98% CI

In [10]:
print(40 + (left * _sd), 40 + (right * _sd))

37.65038864721875 42.34961135278125


### Engagement Ratio

Link to [Engagement Ratio](https://docs.google.com/spreadsheet/ccc?key=0Alo47BBiqLE0dFZ1cUhzYVZCbmNXZXoyUDdoampaRFE&usp=sharing) data Copy and paste the data into your own spreadsheet to perform the calculations. From Google Drive (at the top of the page once you're signed in to your Google account), click the button on the left that says "CREATE" and click "Spreadsheet." Enter your answer as a number without any special characters, including commas.

In [11]:
import pandas as pd

df = pd.read_csv('datasets/Engagement Ratio.csv')
df.head()

Unnamed: 0,Ratio
0,0.000149
1,0.032047
2,0.071611
3,0.120725
4,0.004766


In [12]:
df.Ratio.mean(), df.Ratio.std()

(0.07726584465256987, 0.10721572539079689)

### Standard Error

In [13]:
from math import sqrt

sigma = .107
n = 20

SE = sigma / sqrt(n)
print(SE)

0.02392592735924775