In [1]:
df = read.csv('data_final.csv')


## Does South East Asia Region have more NO<sub>2</sub> emissions than the European Region in 2019?

$H_0$: The NO<sub>2</sub> emissions in the South East Asia Region and European Region are similar.    
$H_1$: The NO<sub>2</sub> emissions in the South East Asia Region is greater than in the European Region.    

Method: Welch Two Sample t-test
  
Conclusion: p-value was less than 0.05, which rejects the null hypothesis.  


In [2]:
df2019 <- df[df$Measurement.Year==2019, ]

sea2 <- subset(df2019, select=NO2....g.m3.,
                      subset=WHO.Region=="South East Asia Region", drop=T)

eur2  <- subset(df2019, select=NO2....g.m3.,
                      subset=WHO.Region=="European Region", drop=T)

print(t.test(sea2, eur2, alt="greater"))


	Welch Two Sample t-test

data:  sea2 and eur2
t = 6.1356, df = 246.22, p-value = 1.682e-09
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 4.023139      Inf
sample estimates:
mean of x mean of y 
 23.53045  18.02611 



## Given that South East Asian has greater NO<sub>2</sub> emissions than Europe, additional testing was performed on France and India. 

## Does France have less NO<sub>2</sub> emissions in 2016 than in 2010? 

Method: Two-sample boostrap test for the ratio of means of NO<sub>2</sub>.

$H_0$: France did not decrease NO<sub>2</sub> emissions in 2016 compared to 2010.  
$H_1$: France did decrease NO<sub>2</sub> emissions in 2016 compared to 2010.

The bootstrap tests also shows there is 95% confidence in the alternative hypothesis.  

Yes, France has decreased their NO<sub>2</sub> emissions in 2016.  


In [3]:
df_france <- df[df$WHO.Country.Name=="France", ]

france2010 <- subset(df_france, select=NO2....g.m3.,
                      subset=Measurement.Year=="2010", drop=T)

france2016  <- subset(df_france, select=NO2....g.m3.,
                      subset=Measurement.Year=="2016", drop=T)

t.test(france2016, france2010, alt="less")


	Welch Two Sample t-test

data:  france2016 and france2010
t = -4.3778, df = 597.82, p-value = 7.075e-06
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -2.275993
sample estimates:
mean of x mean of y 
 19.30578  22.95500 


In [4]:
df_france <- df[df$WHO.Country.Name=="France", ]
df_france_2010 <- df_france[df_france$Measurement.Year == "2010", ]
df_france_2016 <- df_france[df_france$Measurement.Year == "2016", ]
N <- 100000

ratiomeans<- numeric(N)

for (i in 1:N)
{
  fr_sample_2010 <- sample(df_france_2010$NO2....g.m3.[df_france_2010$Measurement.Year == '2010'], length(df_france_2010), replace = TRUE)
  fr_sample_2016 <- sample(df_france_2016$NO2....g.m3.[df_france_2016$Measurement.Year == '2016'], length(df_france_2016), replace = TRUE)
  ratiomeans[i] <- mean(fr_sample_2010)/mean(fr_sample_2016)
}

quantile(ratiomeans, c(0.025, 0.975))

fr_mean_2010 <- mean(df_france_2010$NO2....g.m3.[df_france_2010$Measurement.Year == '2010'])
fr_mean_2016 <- mean(df_france_2016$NO2....g.m3.[df_france_2016$Measurement.Year == '2016'])
ratio_actual <- fr_mean_2010/fr_mean_2016
bias= round(mean(ratiomeans)-mean(ratio_actual), digits = 3) #estimated bias

print(paste("The estimated bias is", bias))

[1] "The estimated bias is 0.033"


## Does India have less NO<sub>2</sub> emissions in 2018 than in 2014?  

Method: t-test

$H_0$: India did not decrease NO<sub>2</sub> emissions in 2018 compared to 2014.  
$H_1$: India did decrease NO<sub>2</sub> emissions in 2018 compared to 2014.

The p-value is greater than 0.05, which fails to reject the null hypothesis.

R library: t-test[^1]

[^1]: T.test: Student’s t-test. RDocumentation. (n.d.).
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test

In [5]:
df_india <- df[df$WHO.Country.Name=="India", ]

india2014 <- subset(df_india, select=NO2....g.m3.,
                      subset=Measurement.Year==2014, drop=T)

india2018  <- subset(df_india, select=NO2....g.m3.,
                      subset=Measurement.Year==2018, drop=T)

print(t.test(india2018, india2014, alt="less"))



	Welch Two Sample t-test

data:  india2018 and india2014
t = 1.0464, df = 268.02, p-value = 0.8518
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
     -Inf 4.302835
sample estimates:
mean of x mean of y 
 23.35289  21.68338 



## WHO Nitrogen Dioxide Guidelines

According to the existing WHO air quality guidelines, an annual average indoor nitrogen dioxide guideline of 40 μg/m3 is recommended. In the WHO dataset, a new column was added labeling NO<sub>2</sub> above and within 40 μg/m3[^1]. 

Objective: To assess if global compliance with recommended NO<sub>2</sub> is dependent on year.

Method: Chi-squared test  


[^1]: Jarvis DJ, Adamkiewicz G, Heroux ME, et al. Nitrogen dioxide. In: WHO Guidelines for Indoor Air Quality: Selected Pollutants. Geneva: World Health Organization; 2010. 5. Available from: https://www.ncbi.nlm.nih.gov/books/NBK138707/


In [6]:
df$WHO.Guideline <- ifelse(df$NO2....g.m3. >= 41.0, "above", "within")


In [7]:
df_global <- df[(df$Measurement.Year>=2013 & df$Measurement.Year<=2019),]

table_year<- table(df_global$Measurement.Year, df_global$WHO.Guideline)
table_year

print(chisq.test(table_year))

      
       above within
  2013   119   1347
  2014   111   1520
  2015   103   1869
  2016   118   2056
  2017   104   2149
  2018    91   2317
  2019    53   2409


	Pearson's Chi-squared test

data:  table_year
X-squared = 94.367, df = 6, p-value < 2.2e-16



## Is global NO<sub>2</sub> dependent on Year?

Therefore, compliance with recommended NO<sub>2</sub> differs between the years. 

$H_0$: The compliance with recommended NO<sub>2</sub> are independent from years. The compliance with recommended NO<sub>2</sub> does not differ between years.

$H_1$: The compliance with recommended NO<sub>2</sub> are dependent on the year. The compliance with recommended NO<sub>2</sub> differs between the years.

