# Lab | Inferencial statistics - T-test & P-value

In [3]:
import numpy as np
import scipy.stats as st
import math
import statistics

1. In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. 
To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file files_for_lab/machine.txt. 
Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other

$ H_0: \mu_{before} = \mu_{after} $

$ H_1: \mu_{before} \le \mu_{after} $

- We assume that variances are equal but unknown that's why we can use pooled variances test

In [4]:
# Old machine values
sample1 = [42.7, 43.6, 43.8, 43.3, 42.5, 43.5, 43.1, 41.7, 44, 44.1]

n1 = len(sample1)

sample_mean_n1 = np.mean(sample1)

sample_std_n1 = statistics.stdev(sample1)

print('Sample mean for old machine times: {:.2f} \nSample standar deviation for old machine times: {:.2f} \nSample length for old machine times: {}'.format(sample_mean_n1, sample_std_n1, n1))

Sample mean for old machine times: 43.23 
Sample standar deviation for old machine times: 0.75 
Sample length for old machine times: 10


In [5]:
# New machine values
sample2 = [42.1, 41, 41.3, 41.8, 42.4, 42.8, 43.2, 42.3, 41.8, 42.7]

n2 = len(sample2)

sample_mean_n2 = np.mean(sample2)

sample_std_n2 = statistics.stdev(sample2)

print('Sample mean for new machine times: {:.2f} \nSample standar deviation for new machine times: {:.2f} \nSample length for new machine times: {}'.format(sample_mean_n2, sample_std_n2, n2))

Sample mean for new machine times: 42.14 
Sample standar deviation for new machine times: 0.68 
Sample length for new machine times: 10


In [6]:
pooled_sample_std = math.sqrt(((n1-1)*sample_std_n1**2 + (n2-1)*sample_std_n2**2)/(n1+n2-2))
pooled_sample_std

0.7174414416676962

In [7]:
statistic = (sample_mean_n1-sample_mean_n2)/(pooled_sample_std*math.sqrt((1/n1)+(1/n2)))
statistic

3.3972307061176026

In [11]:
critical_value = st.t.ppf(0.025, n1+n2-2)
critical_value

-2.10092204024096

- We fail to reject the null hypothesis. We can't be sure that the new machine is better that the older one.

-----------------------

In [9]:
st.ttest_ind(a = sample1, b = sample2, alternative = "less") 

Ttest_indResult(statistic=3.3972307061176026, pvalue=0.9983944287496127)

2. An additional problem (not mandatory): In this case we can't assume that the population variances are equal. Hence in this case we cannot pool the variances. 
Independent random samples of 17 sophomores and 13 juniors attending a large university yield the following data on grade point averages. Data is provided in the file files_for_lab/student_gpa.txt. 
At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPAs of sophomores and juniors at the university differ?

$ H_0: \mu_{sophomores} = \mu_{juniors} $

$ H_1: \mu_{sophomores} \neq \mu_{juniors} $

In [16]:
# Sophomores data
sample1 = [3.04, 1.71, 3.3, 2.88, 2.11, 2.6, 2.92, 3.6, 2.28, 2.82, 3.03, 3.13, 2.86, 3.49, 3.11, 2.13, 3.27]

n1 = len(sample1)

sample_mean_n1 = np.mean(sample1)

sample_std_n1 = statistics.stdev(sample1)

In [17]:
# Juniors data
sample2 = [2.56, 2.77, 2.7, 3, 2.98, 3.47, 3.26, 3.2, 3.19, 2.65, 3, 3.39, 2.58]

n2 = len(sample2)

sample_mean_n2 = np.mean(sample2)

sample_std_n2 = statistics.stdev(sample2)

In [19]:
statistic = (sample_mean_n1-sample_mean_n2-0)/math.sqrt((sample_std_n1**2/n1)+(sample_std_n2**2/n2))
statistic

-0.9231495630900276

In [20]:
critical_value_lower = st.t.ppf(0.025, n1+n2-2)
critical_value_upper = st.t.ppf(1-0.025, n1+n2-2)
critical_value_lower, critical_value_upper

(-2.048407141795244, 2.048407141795244)

In [18]:
st.ttest_ind(sample1, sample2, alternative = "two-sided") 

Ttest_indResult(statistic=-0.864325455323425, pvalue=0.39475359666695975)

- Since the p-value is greater than 0.05, we accept the null hypothesis: we have enough evidence to say, with 95% confidence level, that the mean grade points between sophomores and juniors are not equal.