# NO2 Prediction by using Machine Learning Regression Analyses in Google Earth Engine


## **Machine Learning can create a Model to Predict specific value base on existing data set (dependent and independent values).**

## **Introduction**
### **Nitrogen Dioxide (NO2) air pollution**.
The World Health Organization estimates that air pollution kills 4.2 million people every year.  
The main effect of breathing in raised levels of NO2 is the increased likelihood of respiratory problems. NO2 inflames the lining of the lungs, and it can reduce immunity to lung infections.
There are connections between respiratory deceases / also exposure to viruses and more deadly cases.

##### ***Sources of NO2***:
The rapid population growth, 
The fast urbanization: 
*   Industrial facilities
*   Fossil fuels (coal, oil and gas)
*   Increase of transportation – 80 %.



The affect air pollution (NO2):  population health, and global warming.


## **Objective**
The theme of this project is to  create a Model to Predict specific value (NO2) for past years  base on existing data set (Landsat and Sentinel-5P(TROPOMI) images) for 2019. These Prediction can be used for Monitoring and Statistical Analyses of developing NO2 over Time.

## **DataSet:**
The Sentinel-5P satellite with TROPOspheric Monitoring Instrument (TROPOMI) instrument provides high spectral resolution (7x3.5 km2) for all spectral bands to register level of NO2.  
TROPOMI available from October 13, 2017.
Landsat satellite launched in 1972 and images are available for more then 40 years.

## **Concept:**
Regression: 
The model can make generalizations about new data. The model has been learned from the training data, and can be used to predict the result of test data: here, we might be given an x-value, and the model would allow us to predict the y value.  By drawing this separating line, we have learned a model which can generalize to new data.

## 1._ Install libraries

In [None]:
!pip install earthengine-api


## 2._ Establish connection

In [None]:
!earthengine authenticate

**`Complete End to End Python code for Random Forest Regression:`**

In [None]:
# Import necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import rasterio as rio
from rasterio.plot import show

# Import the data ( CSV formats)
data = pd.read_csv('name_of_file.csv')
data.head()

In [None]:
# Store the Data in form of dependent and independent variables separatly
X = data.ilog[:, 0:1].values
y = data.ilog[:, 1].values

In [None]:
# Import the Random Forest Regressor
from sklearn.ensemble import RandomForestRegressor

# Craete a Random Forest Regressor object from Random Forest Regressor Class
RFReg = RandomForestRegressor(n_estimators = 100, random_state = 0)

# Fit the random forest regressor with Training Data represented by X_train and y_train
RFReg.fit(X_train, y_train)

In [None]:
#Predicted Height from test dataset w.r.t Random Forest Regression
y_predict_rfr = RFReg.predict((X_test))

#Model Evaluation using R-Square for Random Forest Regression
from sklearn import metrics
r_square = metrics.r2_score(y_test, y_predict_rfr)
print('R-Square Error associated with Random Forest Regression is:', r_square)

In [None]:
''' Visualise the Random Forest Regression by creating range of values from min value of X_train to max value of X_train  
having a difference of 0.01 between two consecutive values'''
X_val = np.arange(min(X_train), max(X_train), 0.01) 
  
#Reshape the data into a len(X_val)*1 array in order to make a column out of the X_val values 
X_val = X_val.reshape((len(X_val), 1))  
  
#Define a scatter plot for training data 
plt.scatter(X_train, y_train, color = 'blue') 
  
#Plot the predicted data 
plt.plot(X_val, RFReg.predict(X_val), color = 'red')  
  
#Define the title 
plt.title('NO2 prediction using Random Forest Regression')  
  
#Define X axis label 
plt.xlabel('NDVI') 
  
#Define Y axis label 
plt.ylabel('Level of NO2') 

#Set the size of the plot for better clarity
plt.figure(figsize=(1,1))
  
#Draw the plot 
plt.show()

In [None]:
# Predicting Height based on Age using Random Forest Regression 
no2_pred = RFReg.predict([[41]])
print("Predicted NO2t: % d"% no2_pred)

**Model Evaluation**

In [None]:
#Model Evaluation using Mean Square Error (MSE)
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_predict))

In [None]:
#Model Evaluation using Root Mean Square Error (RMSE)
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_predict)))

In [None]:
#Model Evaluation using Mean Absolute Error (MAE)
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_predict))

In [None]:
#Model Evaluation using R-Square
from sklearn import metrics
r_square = metrics.r2_score(y_test, y_predict)
print('R-Square Error:', r_square)

In [None]:
#For Illustration Purpose Only. 
#Considering Multiple Linear Equation with two Variables : grade = a0 + a1*time_to_study + a2*class_participation
#Model Evaluation using Adjusted R-Square. 
# Here n = no. of observations and p = no. of independent variables

n = 50
p = 2
Adj_r_square = 1-(1-r_square)*(n-1)/(n-p-1)
print('Adjusted R-Square Error:', Adj_r_square)