# Hypothesis Testing

##### Import libraries and Dataset

In [1]:
import numpy as np
import pandas as pd

from scipy import stats

In [2]:
aqi = pd.read_csv('c4_epa_air_quality.csv')

## Data Exploration

In [3]:
aqi.head()

Unnamed: 0.1,Unnamed: 0,date_local,state_name,county_name,city_name,local_site_name,parameter_name,units_of_measure,arithmetic_mean,aqi
0,0,2018-01-01,Arizona,Maricopa,Buckeye,BUCKEYE,Carbon monoxide,Parts per million,0.473684,7
1,1,2018-01-01,Ohio,Belmont,Shadyside,Shadyside,Carbon monoxide,Parts per million,0.263158,5
2,2,2018-01-01,Wyoming,Teton,Not in a city,Yellowstone National Park - Old Faithful Snow ...,Carbon monoxide,Parts per million,0.111111,2
3,3,2018-01-01,Pennsylvania,Philadelphia,Philadelphia,North East Waste (NEW),Carbon monoxide,Parts per million,0.3,3
4,4,2018-01-01,Iowa,Polk,Des Moines,CARPENTER,Carbon monoxide,Parts per million,0.215789,3


In [4]:
aqi.describe(include='all')

Unnamed: 0.1,Unnamed: 0,date_local,state_name,county_name,city_name,local_site_name,parameter_name,units_of_measure,arithmetic_mean,aqi
count,260.0,260,260,260,260,257,260,260,260.0,260.0
unique,,1,52,149,190,253,1,1,,
top,,2018-01-01,California,Los Angeles,Not in a city,Kapolei,Carbon monoxide,Parts per million,,
freq,,260,66,14,21,2,260,260,,
mean,129.5,,,,,,,,0.403169,6.757692
std,75.199734,,,,,,,,0.317902,7.061707
min,0.0,,,,,,,,0.0,0.0
25%,64.75,,,,,,,,0.2,2.0
50%,129.5,,,,,,,,0.276315,5.0
75%,194.25,,,,,,,,0.516009,9.0


In [5]:
aqi['state_name'].value_counts()

California              66
Arizona                 14
Ohio                    12
Florida                 12
Texas                   10
New York                10
Pennsylvania            10
Michigan                 9
Colorado                 9
Minnesota                7
New Jersey               6
Indiana                  5
North Carolina           4
Massachusetts            4
Maryland                 4
Oklahoma                 4
Virginia                 4
Nevada                   4
Connecticut              4
Kentucky                 3
Missouri                 3
Wyoming                  3
Iowa                     3
Hawaii                   3
Utah                     3
Vermont                  3
Illinois                 3
New Hampshire            2
District Of Columbia     2
New Mexico               2
Montana                  2
Oregon                   2
Alaska                   2
Georgia                  2
Washington               2
Idaho                    2
Nebraska                 2
R

## Statistical Tests

Procedure:
1. Formulate the null hypothesis and the alternative hypothesis.<br>
2. Set the significance level.<br>
3. Determine the appropriate test procedure.<br>
4. Compute the p-value.<br>
5. Draw your conclusion.

### Hypothesis 1: ROA is considering a metropolitan-focused approach. Within California, they want to know if the mean AQI in Los Angeles County is statistically different from the rest of California.

In [6]:
Los_Angeles_County= aqi[aqi["county_name"]=="Los Angeles"]
Other_County= aqi[(aqi["state_name"]=="California") & (aqi["county_name"]!="Los Angeles")]

#### Formulate hypothesis:

**Formulate null and alternative hypotheses:**

*   $H_0$: There is no difference in the mean AQI between Los Angeles County and the rest of California.
*   $H_A$: There is a difference in the mean AQI between Los Angeles County and the rest of California.

#### Set the significance level:

significance level = 0.05

#### Determine the appropriate test procedure:

In comparing the sample means between two independent samples, I will utilize a **two-sample  𝑡-test**.

#### Compute the P-value

In [8]:
stats.ttest_ind(a=Los_Angeles_County["aqi"],b=Other_County["aqi"],equal_var=False)

Ttest_indResult(statistic=2.1107010796372014, pvalue=0.049839056842410995)

#### **Conclusion**

I reject the null hypothesis because the P-value(0.049839056842410995) is less than my signifance level

### Hypothesis 2: With limited resources, ROA has to choose between New York and Ohio for their next regional office. Does New York have a lower AQI than Ohio?

In [9]:
ny= aqi[aqi["state_name"]=="New York"]
ohio= aqi[aqi["state_name"]=="Ohio"]

#### Formulate hypothesis:

**Formulate null and alternative hypotheses:**

*   $H_0$: The mean AQI of New York is greater than or equal to that of Ohio.
*   $H_A$: The mean AQI of New York is **below** that of Ohio.

#### Significance Level

Remains at 5%

#### Determine the appropriate test procedure:

In comparing the sample means between two independent samples, I will utilize a **two-sample  𝑡-test**.

#### Compute the P-value

In [11]:
stats.ttest_ind(a=ny["aqi"], b=ohio["aqi"], alternative="less", equal_var=False)

Ttest_indResult(statistic=-2.025951038880333, pvalue=0.03044650269193468)

#### **Conclusion**

I reject the null hypothesis because the P-value(0.03044650269193468) is less than my signifance level

###  Hypothesis 3: A new policy will affect those states with a mean AQI of 10 or greater. Can you rule out Michigan from being affected by this new policy?

**Formulate null and alternative hypotheses:**

In [12]:
michigan = aqi[aqi['state_name']=='Michigan']

**Formulate null and alternative hypotheses:**

*   $H_0$: The mean AQI of Michigan is less than or equal to 10.
*   $H_A$: The mean AQI of Michigan is greater than 10.

#### Significance Level

Remains at 5%

#### Determine the appropriate test procedure:

In comparing the sample means between two independent samples, I will utilize a **two-sample  𝑡-test**.

#### Compute the P-value

Here, I am comparing one sample mean relative to a particular value in one direction. Therefore, I will utilize a **one-sample  𝑡-test**.

In [14]:
stats.ttest_1samp(michigan['aqi'], 10, alternative='greater')

TtestResult(statistic=-1.7395913343286131, pvalue=0.9399405193140109, df=8)

#### **Conclusion**

I reject the null hypothesis because the P-value(0.03044650269193468) is less than my signifance level

With a p-value (0.94) being greater than 0.05 (as your significance level is 5%) and a t-statistic < 0 (-1.74), fail to reject the null hypothesis.

Therefore, I cannot conclude at the 5% significance level that Michigan's mean AQI is greater than 10. This implies that Michigan would not be affected by the new policy.