# Finding Confidence Interval for Sample when Population Variance is Unknown

In this example, we'll find the confidence interval for the mean of electrical engineer salary.
Our population variance is unknown.
We have 9 samples and we assume that data is normally distributed

For finding confidence intervals when population variance is unknown, t-statistic is being used as reliability factor (RF)
Without much detail, you can use http://www.ttable.org/ for finding t value

The formula for the confidence interval is;
[SampleMean - RF * StandardError , SampleMean + RF * StandardError]

So the our population mean is in the interval above. I'll explain the standard error later in this notebook.

In [10]:
#inserting libraries
import pandas as pd #pandas is great when dealing datasets
import numpy as np #i'll just type the dataset manually for this example since it has only 9 samples.
import math #for mathematical functions

In [11]:
#create our dataset
df = np.array([78000, 90000, 75000, 117000, 105000, 96000, 89500,
              102300, 80000])
df

array([ 78000,  90000,  75000, 117000, 105000,  96000,  89500, 102300,
        80000])

In [12]:
#we need sample mean
sample_mean = df.mean()
sample_mean

92533.33333333333

### How to find t-stat
First decide your confidence level. Let's say we look for 99% confidence level.
Confidence level = 1 - α
so, for 99% interval, our alpha is 1% whic is equal to 0.01
We are looking for an two tailed t-value (short explanation, if you check an (hypothesis = some value) it is two tailed, if you chech hypothesis greater (>) or less (<) than some value, it is one tailed. since we look for a mean it is two tailed)

Now look for this value in t table for 99% confidence (at the bottom) and add the two tailed 0.01 value for n-1 samples
it is 3.355

In [13]:
t_99 = 3.355

### Standard Error
Standard error = standar deviation of sample divided by square root of population count.

In [14]:
sample_count = len(df)
sample_count

9

In [15]:
#Standar Deviation (std) of the sample
std = df.std()
std

13135.109862925057

In [16]:
#Standar error
standard_error = std/math.sqrt(sample_count-1)
standard_error

4643.962627852305

In [17]:
#lets define the interval which has our mean with 90% possibilitiy
interval_99 =((sample_mean - t_99 * standard_error) ,(sample_mean + t_99 * standard_error))
interval_99

(76952.83871688884, 108113.82794977781)

The result is our population mean is between 76952.84 and 108113.83 US Dollars
This interval is called as confidence interval.

We can and should define a function for future use of these calculations.

In [18]:
def conf_interval(count, t, std, mean):
    '''
    count = sample dataset count
    t = t value from the table
    std = standard deviatoin of sample
    mean = mean of sample dataset
    '''
    return ((mean - t * (std/math.sqrt(count-1))),(mean + t * (std/math.sqrt(count-1))))