![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Inferential statistics - T-test & P-value

### Instructions

1. We will have another simple example on two sample t test (pooled- when the variances are equal). But this time this is a one sided t-test

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `files_for_lab/machine.txt`.
Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other

2. An additional problem (not mandatory): In this case we can't assume that the population variances are equal. Hence in this case we cannot pool the variances.
   Independent random samples of 17 sophomores and 13 juniors attending a large university yield the following data on grade point averages. Data is provided in the file `files_for_lab/student_gpa.txt`.
   At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPAs of sophomores and juniors at the university differ?

   Test statistics can be calculated as: [link to the image - Test statistics calculation for Unpooled Variance Case](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/7.04/7.04-unpooled_variances.png)

   Degrees of freedom is `(n1-1)+(n2-1)`.


In [1]:
import pandas as pd # manipulación de datos
import numpy as np # operaciones matemáticos
import matplotlib.pyplot as plt # visualización
import scipy.stats as st # estadística
import chardet

In [2]:
file = 'machine.txt'

with open(file, 'rb') as f:
    result = chardet.detect(f.read())
    encoding_detected = result['encoding']

df = pd.read_csv(file, delimiter='\t', encoding=encoding_detected)


In [3]:
df.columns

Index(['New machine', '    Old machine'], dtype='object')

Ho: New machine is faster than old machine.(by mean)

In [4]:
t_statistic, p_value = st.ttest_ind(df["New machine"],df["    Old machine"], alternative='less')
t_statistic, p_value

(-3.3972307061176026, 0.0016055712503872579)

In [5]:
if t_statistic < 0 and  p_value < 0.05:
    print("Reject null hipothesis. The new machine is faster than old machine")
else:
    print("We cannot reject null hipothesis.")
          

Reject null hipothesis. The new machine is faster than old machine


In [6]:
file2 = 'student_gpa.txt'
with open(file2, 'rb') as f:
    result = chardet.detect(f.read())
    codificacion_detectada = result['encoding']
df2 = pd.read_csv(file2, delimiter='\t', encoding=codificacion_detectada)
df2

Unnamed: 0,Sophomores,Juniors
0,3.04,2.56
1,1.71,2.77
2,3.3,2.7
3,2.88,3.0
4,2.11,2.98
5,2.6,3.47
6,2.92,3.26
7,3.6,3.2
8,2.28,3.19
9,2.82,2.65


In [7]:
df2["Sophomores"]

0     3.04
1     1.71
2     3.30
3     2.88
4     2.11
5     2.60
6     2.92
7     3.60
8     2.28
9     2.82
10    3.03
11    3.13
12    2.86
13    3.49
14    3.11
15    2.13
16    3.27
Name: Sophomores, dtype: float64

In [12]:
t_statistic, p_value = st.ttest_ind(df2["Sophomores"], df2['  Juniors'], equal_var=False, nan_policy ="omit", alternative="less")
t_statistic, p_value

(-0.9231495630900278, 0.18210903376742854)

In [9]:
p_value

0.3642180675348571

In [13]:
alpha = 0.05
dof = (len(df2["Sophomores"]) - 1) + (len(df2['  Juniors']) - 1)  # Degrees of Freedom (n1-1)+(n2-1)

In [17]:
t_critical_value = st.t.ppf(1 - alpha / 2, dof)
t_critical_value

2.036933343460101

In [11]:
if p_value < alpha:
    print("Reject the null hypothesis. The current annual salary is higher than 86000$.")
else:
    print("Fail to reject the null hypothesis")

Fail to reject the null hypothesis


In [22]:
if abs(t_statistic) > t_critical_value:
    print("We reject the null hypothesis, but in this sample Sophomeres has less average grades than Juniors")
else:
    print("Fail to reject the null hypothesis, without conditions")

Fail to reject the null hypothesis, without conditions
