# Statistical Inference - Parametric Tests Demonstration In Python

## Objectives

1. To demonstrate applications of parametric tests of hypothesis using Python
2. To interpret the output for statistical inferences

### Import Libraries

In [6]:
import pandas as pd
from scipy.stats import ttest_1samp
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

### 1. One sample t-test

### Background 
A large company is concerned about time taken by employees to complete weekly MIS report.


### The Objective of this case is:
To check if average time taken to complete the MIS report is more than 90 minutes

### Data Description
| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| Time  | Time taken to complete MIS report      | Numeric  |


In [10]:
data=pd.read_csv('ONE SAMPLE t TEST.csv')
ttest_1samp(data.Time, popmean=90,alternative='greater')

TtestResult(statistic=1.9176218472595046, pvalue=0.04074043079962237, df=11)

#### Inference :
- The defualt p value is  for a two sided test. In this case we have one sided H1. Because the sampling distribution is symmetric, the p value for one sided test is half of that of two sided test.
- Since p/2 is <0.05, reject H0. Average time taken to complete the MIS report is more than 90 minutes 

### 2. Independent samples t-test

### Background
The company is assessing the difference in time to complete MIS report between two groups of employees : 
- Group I: Experience(0-1 years)
- Group II: Experience(1-2 years)

### The Objective of this case is:
 
To test whether the average time taken to complete MIS by both the groups is same.

### Data Description

| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| time_g1  | Time to complete MIS report by Group I     | Numeric  |
| time_g2   | Time to complete MIS report by Group II       | Numeric  |

In [7]:
data=pd.read_csv('INDEPENDENT SAMPLES t TEST.csv')
stats.ttest_ind(data['time_g1'],data['time_g2'],nan_policy='omit',equal_var=True)

Ttest_indResult(statistic=0.22345590920212569, pvalue=0.8250717960964378)

#### Inference :
Since p-value is >0.05, do not reject H0. There is no significant difference in average time taken to complete the MIS report between two groups of employees.



### Note :

- Welch's t test is used to test the equality of two means if variances of two groups can not be assumed equal.
- If 2 variances are not equal, t test syntax in Python is given below:

stats.ttest_ind(data['time_g1'],data['time_g2'], equal_var=False, nan_policy='omit')


### 3. Paired sample t-test

### Background
The company organized a training program to improve efficiency. Time taken to complete MIS report before and after training are recorded for 15 employees.

### The Objective of this case is:
To test whether the average time taken to complete MIS before and after training is not different.

### Data Description

| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| time_before  | Time to complete MIS report before training      | Numeric  |
| time_after   | Time to complete MIS report after training       | Numeric  |


In [8]:
data=pd.read_csv('PAIRED t TEST.csv')
stats.ttest_rel(data['time_before'],data['time_after'] ,alternative='greater')


TtestResult(statistic=8.22948711672449, pvalue=4.918935850301797e-07, df=14)

#### Inference :
Since p-value is <0.05, reject H0. Average time taken to complete the MIS report after the training is less. Hence, training is effective.


### 4. One Way ANOVA

### Background
A large company is assessing the difference in 'Satisfaction Index' of employees in Finance, Marketing and Client-Servicing departments.

### The Objective of this case is:
To test whether mean satisfaction index for employees in three departments (CS, Marketing, Finance) are equal.

### Data Description
| Columns  | Description         | Type      |
|-----------|---------------------|-----------|
| satindex  | Satisfaction Index  | Numeric   |
| dept      | Department          | Character |


In [9]:
data = pd.read_csv('One way anova.csv')


model = ols('satindex ~ C(dept)', data=data).fit()
aov_table = sm.stats.anova_lm(model, typ=2)
aov_table


Unnamed: 0,sum_sq,df,F,PR(>F)
C(dept),220.059945,2.0,2.308047,0.114836
Residual,1620.858974,34.0,,


#### Inference :
Since p-value is >0.05, do not reject H0. There is no significant difference in satisfaction index among 3 different departments.
