In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.stats as stats
from scipy.stats import ttest_1samp, shapiro, ttest_ind, levene
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of moisture the shingles contain when they are packaged. Customers may feel that they have purchased a product lacking in quality if they find moisture and wet shingles inside the packaging.   In some cases, excessive moisture can cause the granules attached to the shingles for texture and coloring purposes to fall off the shingles resulting in appearance problems. To monitor the amount of moisture present, the company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed, and based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet are calculated. The company would like to show that the mean moisture content is less than 0.35 pounds per 100 square feet.The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.

# 1. Do you think there is evidence that means moisture contents in both types of shingles are within the permissible limits? State your conclusions clearly showing all steps.

In [69]:
df3=pd.read_csv('A+&+B+shingles.csv')
df3.head()

Unnamed: 0,A,B
0,0.44,0.14
1,0.61,0.15
2,0.47,0.31
3,0.3,0.16
4,0.15,0.37


In [70]:
df3.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
A,36.0,0.316667,0.135731,0.13,0.2075,0.29,0.3925,0.72
B,31.0,0.273548,0.137296,0.1,0.16,0.23,0.4,0.58


In [71]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       36 non-null     float64
 1   B       31 non-null     float64
dtypes: float64(2)
memory usage: 704.0 bytes


# A Shingles

In [72]:
df3['A'].mean()

0.3166666666666666

In [73]:
df3['A'].std()

0.13573082605973166

- **Given:- n=36, mu(population mean)=0.35, xbar=3.167, std=0.135, Assuming alpha = 0.05**

## Step 1

- **Defining NULL and Alternate Hypothesis.**

- H0 : mean moisture content <=0.35
- HA : mean moisture content > 0.35

## Step 2 
#### Identifying test statistic

- It is a one sample t-test

## Step 3
#### As the population standard deviation is not given/known we are cannot use z-test, so we use one sample t-test. hence we calculate t-statistic and p-value.

In [74]:
t_statistic, p_value = ttest_1samp(df3['A'], 0.35, alternative='greater' )
print('One sample t test \nt statistic: {0} p value: {1} '.format(t_statistic, p_value))

One sample t test 
t statistic: -1.4735046253382782 p value: 0.9252236685509249 


## Step 4 

#### Decide to reject or accept null hypothesis. 

- By using t-test_1samp we found that P-Value is greater than alpha, which is 0.925 > 0.05.
- Therefore we failed to reject the null hypothesis H0.

# B Shingles

In [75]:
df3['B'].mean()

0.2735483870967742

In [76]:
df3['B'].std()

0.13729647694185443

- **Given:- n=31, mu(population mean)=0.35, xbar=0.273, std=0.137, Assuming alpha = 0.05**

## Step 1

- Defining NULL and Alternate Hypothesis.

- H0 : mean moisture content <=0.35
- HA : mean moisture content > 0.35

## Step 2 
#### Identifying test statistic

- It is a one sample t-test

## Step 3
#### Calculating P-Value and test statistics

In [77]:
t_statistic, p_value = ttest_1samp(df3['B'].dropna(), 0.35, alternative='greater' )
print('One sample t test \nt statistic: {0} p value: {1} '.format(t_statistic, p_value))

One sample t test 
t statistic: -3.1003313069986995 p value: 0.9979095225996808 


## Step 4 
#### Decide to reject or accept null hypothesis. 

- By using t-test_1samp we found that P-Value is greater than alpha, which is 0.997 > 0.05.
- Therefore we failed to reject the null hypothesis H0.

# 2. Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct the test of the hypothesis. What assumption do you need to check before the test for equality of means is performed?

## Step 1
- (A shingles) Given:- n=36, mu(population mean)=0.35, xbar=3.167, std=0.135, Assuming alpha = 0.05
- (B shingles) Given:- n=31, mu(population mean)=0.35, xbar=0.237, std=0.137, Assuming alpha = 0.05

## Step 2
- Defining NULL and Alternate Hypothesis.

#### H0 is muA=muB
#### HA is muA!=muB


## Step 3
- Since the population mean (mu=0.35) is equal for both the samples, so it is Two-sampled test.
- Both the samples are independent, so we will perform t-test_ind.

In [78]:
t_statistic, p_value = ttest_ind(df3['A'],df3['B'], nan_policy='omit')
print('Two sample t test \nt statistic: {0} p value: {1} '.format(t_statistic, p_value))

Two sample t test 
t statistic: 1.2896282719661123 p value: 0.2017496571835306 


- Here we found the p-value which is greater than alpha, so we failed to reject the null hypothesis.
- Though we Accept the null hypothesis.

## Checking for equality whether data is equally distributed or not.

- H0 is muA=muB
- HA is muA!=muB

In [79]:
shapiro(df3['A'])

ShapiroResult(statistic=0.9375598430633545, pvalue=0.042670514434576035)

In [80]:
shapiro(df3['A'].dropna())

ShapiroResult(statistic=0.9375598430633545, pvalue=0.042670514434576035)

### Using Shapiro test we can see that p-value is less than alpha i.e 0.0426 < 0.05 
### So we don't  have enough evidence to prove the data as a normal distribution.

In [86]:
levene(df3['A'],df3['B'].dropna())

LeveneResult(statistic=0.23808965111555147, pvalue=0.6272312061867605)