## Background

Environment and its changes are the most complex system. It is unarguably accepted that the climatic changes are greately affected by various environmental factors. Microclimate digs deep down to identify the factors that are effecting the COVID - 19 spread. We are usiing PM 2.5 as the key contributor and finding the correlation and predict the future values.

##The dataset

**Dataset file name:** modified_data_for_prediction.csv


**Features and labels:** 

1.   Date
2.   Time
3.   Location
4.   Value
2.   New Value (Float): PM2.5 Values

In [None]:
import pandas as pd
import numpy as np

#read the file
df = pd.read_csv('modified_data_for_prediction.csv')
df.tail()

In [None]:
# Modifying data for our requirements
df["Modified_Date"] = df["Date"] + " " + df["Time"]
new = df["Modified_Date"].str.split("/",n=1, expand =True)
new_1 = new[1].str.split("/",n=1, expand =True)
df['Month'] = new_1[0]
df['Month'] = df['Month'].astype('int')
df.head(5)

In [None]:
df1 = df[["Modified_Date", "Location", "New_Value", "Month"]]
df1['Modified_Date'] = pd.to_datetime(df1.Modified_Date,dayfirst = True)
df1.index = df1['Modified_Date']

In [None]:
# Checking the data types
df1.dtypes

In [None]:
# Plots by month with respect to locations 
import seaborn as sns
sns.factorplot(data = df1, x = "Month", y = "New_Value",hue = "Location",row="Location")

In [None]:
# Replacing the Locations by numbers
labels = df1['Location'].unique().tolist()
mapping = dict( zip(labels,range(len(labels))))
df1.replace({'Location': mapping},inplace=True)
mapping

In [None]:
#Correlation of features with respect to New_Values
df1.corrwith(method="pearson", other=df1["New_Value"])

In [None]:
# Adding Stages of Lockdown to the data (Future scope)

In [None]:
# For every location please change the value to 
#'Brighton': 0
#'Footscray': 1
#'Box Hill': 2
#'Macleod': 3
#'Brooklyn': 4
#'Alphington': 5
#'Melbourne CBD': 6
#'Campbellfield': 7

df_location = df1[df1['Location']==7]
df_location.tail(5)

In [None]:
#to plot within notebook
import matplotlib.pyplot as plt
%matplotlib inline

#setting figure size
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 20,10


plt.style.use('bmh')
#plot
plt.plot(df_location["New_Value"])

In [None]:
df_location.dtypes

In [None]:
#Create a variable to predict 'x' records out into the future (43 days which is 1050 datetime records (24*43))
future_days = 1050
df = df_location

#Create a new column (the target or dependent variable) shifted 'x' records up
df['Predictions'] = df[['New_Value']].shift(-future_days)
#print the data
df.tail(5)

In [None]:
X = np.array(df.drop(columns=['Predictions','Modified_Date','Location','Month']))[:-future_days]
print(X)

In [None]:
y = np.array(df['Predictions'])[:-future_days]
print(y)

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [None]:
#Create the decision tree regressor model
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor().fit(x_train, y_train)

In [None]:
#Get the feature data, 
#AKA all the rows from the original data set except the last 'x' days
x_future = df.drop(columns=['Predictions','Modified_Date','Location','Month'])[:-future_days]
#Get the last 'x' rows
x_future = x_future.tail(future_days) 
#Convert the data set into a numpy array
x_future = np.array(x_future)
x_future

In [None]:
#Show the model tree prediction
tree_prediction = tree.predict(x_future)
print( tree_prediction )
print()

In [None]:
#Visualize the data
predictions = tree_prediction
#Plot the data
valid =  df[X.shape[0]:]
valid['Predictions'] = predictions #Create a new column called 'Predictions' that will hold the predicted prices
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Days',fontsize=18)
plt.ylabel('PM 2.5 Vlues',fontsize=18)
plt.plot(df['New_Value'])
plt.plot(valid[['New_Value','Predictions']])
plt.legend(['Train', 'Val', 'Predictions' ], loc='lower right')
plt.show()

In [None]:
valid

In [None]:
df_location

In [None]:
df_location.to_csv("Campbellfield.csv", index = False)