# Lab | Inferential statistics - T-test & P-value

### Instructions

1. *One tailed t-test* - In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `files_for_lab/machine.txt`.
   Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

2. *Matched Pairs Test* - In this challenge we will compare dependent samples of data describing our Pokemon (file `files_for_lab/pokemon.csv`). Our goal is to see whether there is a significant difference between each Pokemon's defense and attack scores. Our hypothesis is that the defense and attack scores are equal. Compare the two columns to see if there is a statistically significant difference between them and comment your result.

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns

In [13]:
# 1. One tailed t-test - In a packing plant, a machine packs cartons with jars. 
#It is supposed that a new machine will pack faster on the average than the machine currently used. 
# To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. 
# The results, in seconds, are shown in the tables in the file files_for_lab/machine.txt. 
# Assume that there is sufficient evidence to conduct the t test, 
# does the data provide sufficient evidence to show if one machine is better than the other?

df = pd.read_csv('machine.txt',encoding = "utf-16", sep = "\t", names = ["new", "old"], header=0)
df

Unnamed: 0,new,old
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   new     10 non-null     float64
 1   old     10 non-null     float64
dtypes: float64(2)
memory usage: 288.0 bytes


In [15]:
sample1 = df['new']
sample2 = df['old']

In [28]:
print('mean of the new machine =', sample1.mean())
print('mean of the old machine =', sample2.mean())
print('std new machine =', sample1.std(ddof=1))
print('std old machine =', sample2.std(ddof=1))

#looks like the mean of the new one is smaller


mean of the new machine = 42.14
mean of the old machine = 43.230000000000004
std new machine = 0.6834552736727638
std old machine = 0.7498888806572157


In [39]:
t_statistic, p_value = st.ttest_ind(df['new'], df['old'], equal_var=False)
print("P-value = {:.4f}".format(p_value))
print("T-statistic = {:.4f}".format(t_statistic))
# We can see that the P-value is < 0.05 so we can reject H0 (new machine not faster) and confirm that the new machine is faster. 

P-value = 0.0032
T-statistic = -3.3972


In [45]:
## Matched Pairs Test - In this challenge we will compare dependent samples of data describing our Pokemon 
# Our goal is to see whether there is a significant difference between each Pokemon's defense and attack scores. 
# Our hypothesis is that the defense and attack scores are equal. 
# Compare the two columns to see if there is a statistically significant difference between them 
# and comment your result.

pokemon = pd.read_csv('pokemon.csv')
pokemon

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [47]:
#separating the relevant values
attack_def = pokemon[['Attack', 'Defense']]
attack_def

Unnamed: 0,Attack,Defense
0,49,49
1,62,63
2,82,83
3,100,123
4,52,43
...,...,...
795,100,150
796,160,110
797,110,60
798,160,60


In [49]:

t_statistic1, p_value1 =st.ttest_ind(attack_def['Defense'], attack_def['Attack'], equal_var=False)

print("P-value = {:.4f}".format(p_value1))
print("T-statistic = {:.4f}".format(t_statistic1))


#Conclusion is that P-value is < 0.05 so we can reject H0 
# and confirm that attack and defense are not equal (attack is bigger)

P-value = 0.0012
T-statistic = -3.2418


In [None]:
#optional part ANOVA test