# 2D Design Project

<b>Problem Statement</b>: We wish to predict Singapore's GDP growth amidst COVID-19 based on various factors. By comparing the predicted growth rate with the actual growth rate, we can determine the effectiveness of Singapore's coping strategies against COVID-19.

Factors/Variables to consider (from most to least important):
- Time/date (time series data)
- Vaccination rate
- Daily active cases
- Hospitalised
- Recovered
- Government grants/funding
- Phases (circuit breaker, phase 1 etc.)

## Data Pre Processing

In [10]:
# Installing dependencies

# !pip install openpyxl

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
[K     |████████████████████████████████| 242 kB 29.6 MB/s 
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m


In [13]:
# Multiple Linear Regression

import numpy as np
import pandas as pd

# Importing the dataset
datasets = pd.read_excel('Covid-19 SG.xlsx')

datasets.tail()

# processing 

Unnamed: 0,Date,Daily Confirmed,False Positives Found,Cumulative Confirmed,Daily Discharged,Passed but not due to COVID,Cumulative Discharged,Discharged to Isolation,Still Hospitalised,Daily Deaths,...,Cumulative Individuals Vaccination Completed,Perc population completed at least one dose,Perc population completed vaccination,Sinovac vaccine doses,Cumulative individuals using Sinovac vaccine,Doses of other vaccines recognised by WHO,Cumulative individuals using other vaccines recognised by WHO,Unnamed: 34,Unnamed: 35,Unnamed: 36
634,2021-10-18,2553,0.0,150731,3071,0,125058,23742,1684,6,...,4559408.0,0.85,0.84,,,226702.0,118214.0,,,
635,2021-10-19,3994,0.0,154725,2535,0,127593,25170,1708,7,...,4562307.0,0.85,0.84,,,227800.0,118598.0,,,
636,2021-10-20,3862,0.0,158587,2715,0,130308,26319,1688,18,...,4563906.0,0.85,0.84,,,229117.0,119071.0,,,
637,2021-10-21,3439,0.0,162026,2606,0,132914,27241,1583,16,...,4567666.0,0.85,0.84,,,230651.0,119711.0,,,
638,2021-10-22,3637,0.0,165663,2825,0,135739,28043,1579,14,...,4568842.0,0.85,0.84,,,234168.0,120594.0,,,


In [None]:
# Multiple Linear Regression

import numpy as np
import pandas as pd

# Importing the datasets

X = datasets.iloc[:, :-1].values
Y = datasets.iloc[:, 4].values

# Encoding categorical data

# Encoding the Independent Variable

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

# Fitting the Multiple Linear Regression in the Training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_Train, Y_Train)

# Predicting the Test set results

Y_Pred = regressor.predict(X_Test)

# Building the optimal model using Backward Elimination

import statsmodels.formula.api as sm
X = np.append(arr = np.ones((50, 1)).astype(int), values = X, axis = 1)

X_Optimal = X[:, [0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = Y, exog = X_Optimal).fit()
regressor_OLS.summary()

X_Optimal = X[:, [0,1,2,4,5]]
regressor_OLS = sm.OLS(endog = Y, exog = X_Optimal).fit()
regressor_OLS.summary()

X_Optimal = X[:, [0,1,4,5]]
regressor_OLS = sm.OLS(endog = Y, exog = X_Optimal).fit()
regressor_OLS.summary()

X_Optimal = X[:, [0,1,4]]
regressor_OLS = sm.OLS(endog = Y, exog = X_Optimal).fit()
regressor_OLS.summary()

# Fitting the Multiple Linear Regression in the Optimal Training set

X_Optimal_Train, X_Optimal_Test = train_test_split(X_Optimal,test_size = 0.2, random_state = 0)
regressor.fit(X_Optimal_Train, Y_Train)

# Predicting the Optimal Test set results

Y_Optimal_Pred = regressor.predict(X_Optimal_Test)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=8063f459-52be-4c78-9eaa-2f01d373f9b4' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>