# Finding Confidence Interval for Sample when Population Variance is Known

In this example, we'll find the confidence interval for the mean of software developer salary.
We know that our population variance is known. Standard deviation of our population data equals to $15.000
We have 30 samples and our data is normally distributed

For finding confidence intervals when population variance is known, z-statistic is being used as reliability factor (RF)
Without much detail, you can use Standard Normal Table for finding z https://en.wikipedia.org/wiki/Standard_normal_table

The formula for the confidence interval is;
[SampleMean - RF * StandardError , SampleMean + RF * StandardError]

So the our population mean is in the interval above. I'll explain the standard error later in this notebook.

In [16]:
#inserting libraries
import pandas as pd #pandas is great when dealing datasets
import math #for mathematical functions

In [17]:
#let's read our data from the excel table
path = USE THE salary_dataset in the repository
df = pd.read_excel(path, header=0, usecols="B", skiprows=4)
df.head()

Unnamed: 0,Dataset
0,117313
1,104002
2,113038
3,101936
4,84560


In [18]:
#a very quick way to find some info about your dataset is describe function
df.describe()

Unnamed: 0,Dataset
count,30.0
mean,100200.366667
std,11478.406127
min,80740.0
25%,90493.75
50%,101236.0
75%,111635.5
max,117313.0


In [19]:
#so, we need sample mean
sample_mean = df['Dataset'].mean()
sample_mean

100200.36666666667

### How to find z-stat
First decide your confidence level. Let's say we look for 99% confidence level.
Confidence level = 1 - α
so, for 99% interval, our alpha is 1% whic is equal to 0.01
We are looking for an interval that covers both above and below 0, which is mean of normal distribution.
So α/2 = 0.005
1 - α/2 = 0.995

Now look for this value in z table and add the respective colum and row value.
Choose the closest value if there is no exact value.

Let's start with 90% confidence interval.
α = 10% = 0.1
α/2 = 0.05
1 - α = 0.950
From the cumilative table in wikipedia (https://en.wikipedia.org/wiki/Standard_normal_table), we find the value 0.950,
there is no 0.950 so we choose the closest which is 0.95053, row value is 1.6 and column value is 0.05 so our z-stat (Reliability Factor) is 1.65 for 90% confidence interval.

In [20]:
z_90 = 1.65

### Standard Error
Standard error = standar deviation of population divided by square root of population count.

In [21]:
sample_count = df['Dataset'].count()
sample_count

30

In [22]:
#Standar Deviation (std) of the population is given = $15000
std = 15000
standard_error = std/math.sqrt(sample_count)
standard_error

2738.6127875258308

In [23]:
#lets define the interval which has our mean with 90% possibilitiy
interval_90 =((sample_mean - z_90 * standard_error) ,(sample_mean + z_90 * standard_error))
interval_90

(95681.65556724905, 104719.07776608429)

The result is our population mean is between 95681.66 and 104719.08 US Dollars
This interval is called as confidence interval.

We can and should define a function for future use of these calculations.

In [24]:
def conf_interval(count, z, std, mean):
    '''
    count = sample dataset count
    z = z value from the unit normal table
    std = standard deviatoin of population
    mean = mean of sample dataset
    '''
    return ((mean - z * (std/math.sqrt(count))),(mean + z * (std/math.sqrt(count))))