An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of moisture the shingles contain when they are packaged. Customers may feel that they have purchased a product lacking in quality if they find moisture and wet shingles inside the packaging.   In some cases, excessive moisture can cause the granules attached to the shingles for texture and colouring purposes to fall off the shingles resulting in appearance problems. To monitor the amount of moisture present, the company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed, and based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet is calculated. The company would like to show that the mean moisture content is less than 0.35 pound per 100 square feet.

The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.


In [1]:
# Loading the necessary packages

# Packages that are required for basic computation
import numpy as np
import pandas as pd

# Packages for graph
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(color_codes=True) #adds a nice background to the graphs
sns.set() #setting the default seaborn style for our plots

# Packages for statistics
from scipy import stats
from scipy.stats import   ttest_1samp

# Packages to ignore warnings
from warnings import filterwarnings
filterwarnings("ignore")

In [2]:
mydata = pd.read_csv('A+&+B+shingles.csv')

In [3]:
mydata.head()

Unnamed: 0,A,B
0,0.44,0.14
1,0.61,0.15
2,0.47,0.31
3,0.3,0.16
4,0.15,0.37


In [4]:
mydata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       36 non-null     float64
 1   B       31 non-null     float64
dtypes: float64(2)
memory usage: 704.0 bytes


In [5]:
mydata.isnull().sum()

A    0
B    5
dtype: int64

In [6]:
mydata.describe(include = 'all').T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
A,36.0,0.316667,0.135731,0.13,0.2075,0.29,0.3925,0.72
B,31.0,0.273548,0.137296,0.1,0.16,0.23,0.4,0.58


In [7]:
mydata.columns

Index(['A', 'B'], dtype='object')

In [8]:
mydata['A'].min()

0.13

In [9]:
mydata['A'].max()

0.72

In [10]:
mydata['B'].min()

0.1

In [11]:
mydata['B'].max()

0.58

### 3.1 Do you think there is evidence that means moisture contents in both types of shingles are within the permissible limits? State your conclusions clearly showing all steps.


### Shingle A

##### Step 1: Define null and alternative hypotheses

H0 : $\mu$ <=0.35


HA : $\mu$ > 0.35

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 2: Decide the significance level

Here we select $\alpha$ = 0.05.

In [12]:
print("The sample size for this problem is",len(mydata))

The sample size for this problem is 36


#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 3: Identify the test statistic

We do not know the population standard deviation and n = 36. So we use the t distribution and the $t_{STAT}$ test statistic.

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 4: Calculate the p - value and test statistic

In [13]:
t_statistic_A, p_value_A = ttest_1samp(mydata['A'], 0.35)

print('One sample t test for shingle A \n\nt statistic: {0} \n\np value: {1} '.format(t_statistic_A, p_value_A/2))

One sample t test for shingle A 

t statistic: -1.4735046253382782 

p value: 0.07477633144907513 


#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 5: Decide to reject or accept null hypothesis

In [19]:
alpha_value = 0.05 # Level of significance
print('Level of significance: %.2f' %alpha_value)
if p_value_A/2 < alpha_value: 
    print('\nWe have evidence to reject the null hypothesis since p value < Level of significance')
else:
    print('\nWe have no evidence to reject the null hypothesis since p value > Level of significance') 

print ("\nOur one-sample t-test for shingle A, p-value=", p_value_A/2)

Level of significance: 0.05

We have no evidence to reject the null hypothesis since p value > Level of significance

Our one-sample t-test for shingle A, p-value= 0.07477633144907513


#### Therefore, at 95% confidence level, there is  sufficient evidence  to prove that mean moisture content is less than or equal to 0.35 in shingle A

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Shingle B

##### Step 1: Define null and alternative hypotheses

H0 : $\mu$ <=0.35

HA : $\mu$ > 0.35
#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 2: Decide the significance level
Here we select $\alpha$ = 0.05.

In [15]:
print("The sample size for this problem is 31")

The sample size for this problem is 31


#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 3: Identify the test statistic

We do not know the population standard deviation and n = 31. So we use the t distribution and the $t_{STAT}$ test statistic.

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 4: Calculate the p - value and test statistic

In [16]:
t_statistic_B, p_value_B = ttest_1samp(mydata['B'], 0.35, nan_policy = 'omit')
print('One sample t test for shingle B \n\nt statistic: {0} \n\np value: {1} '.format(t_statistic_B, p_value_B/2))

One sample t test for shingle B 

t statistic: -3.1003313069986995 

p value: 0.0020904774003191826 


#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 5: Decide to reject or accept null hypothesis

In [20]:
alpha_value = 0.05 # Level of significance
print('Level of significance: %.2f' %alpha_value)
if p_value_B/2 < alpha_value: 
    print('\nWe have evidence to reject the null hypothesis since p value < Level of significance')
else:
    print('\nWe have no evidence to reject the null hypothesis since p value > Level of significance') 

print ("\nOur one-sample t-test for shingle B, p-value=", p_value_B/2)

Level of significance: 0.05

We have evidence to reject the null hypothesis since p value < Level of significance

Our one-sample t-test for shingle B, p-value= 0.0020904774003191826


#### Therefore, at 95% confidence level, there is  sufficient evidence  to prove that mean moisture content is greater than 0.35 in shingle B

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### 3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct the test of the hypothesis. What assumption do you need to check before the test for equality of means is performed?

##### Step 1: Define null and alternative hypotheses

$H_0$: $\mu{A}$ = $\mu{B}$

$H_A$: $\mu{A}$ $\neq$ $\mu{B}$
#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 2: Decide the significance level
Here we select $\alpha$ = 0.05.

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 3: Identify the test statistic

We do not know the population standard deviation and n > 30. So we use the t distribution and the $t_{STAT}$ test statistic for two sample unpaired test.

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 4: Calculate the p - value and test statistic

In [72]:
from   scipy.stats               import ttest_1samp, ttest_ind
t_statistic_AB, p_value_AB  = ttest_ind(mydata['A'],mydata['B'],nan_policy = 'omit')

print('Two sample t test for shingle A & B \n\nt statistic: {0} \n\np value: {1} '.format(t_statistic_AB, p_value_AB))

Two sample t test for shingle A & B 

t statistic: 1.2896282719661123 

p value: 0.2017496571835306 


#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##### Step 5: Decide to reject or accept null hypothesis

In [75]:
# p_value < 0.05 => alternative hypothesis:
# they don't have the same mean at the 5% significance level
print ("two-sample t-test p-value=", p_value_AB)

alpha_level = 0.05

if p_value_AB < alpha_level:
    print('\nWe have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
    print('\nWe conclude that the mean for shingles A and B are not equal')
else:
    print('\nWe do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis')
    print('\nWe conclude that mean for shingles A and B are equal.')

two-sample t-test p-value= 0.2017496571835306

We do not have enough evidence to reject the null hypothesis in favour of alternative hypothesis

We conclude that mean for shingles A and B are equal.


#### Therefore, at 95% confidence level, there is  sufficient evidence  to prove that mean mean for shingles A and B are equal

#### ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------