# CPSC322 Project
## The Statistical Analysis of United States Airlines During Covid-19

Data Source:
- Covid Information: ourworldindata.org/covid-cases
- Stocks Data: Yahoo Finance

In [32]:
import importlib
import os
import numpy as np

import mysklearn.myutils
importlib.reload(mysklearn.myutils)
import mysklearn.myutils as myutils

import mysklearn.mypytable
importlib.reload(mysklearn.mypytable)
from mysklearn.mypytable import MyPyTable 

import mysklearn.myevaluation
importlib.reload(mysklearn.mypytable)
from mysklearn.myevaluation import accuracy_score, confusion_matrix

import mysklearn.myclassifiers
importlib.reload(mysklearn.myclassifiers)
from mysklearn.myclassifiers import MyKNeighborsClassifier, MyNaiveBayesClassifier, MyDecisionTreeClassifier

In [33]:
projectName = os.path.join("input_data", "322-Predict-Table.csv")
projectTable = MyPyTable().load_from_file(projectName)

### Analysis of JetBlue Airlines
We used the average increase per market day and discretized as `TRUE` if the stock price went up at the end of the market day, and `FALSE` otherwise.

We used attributes such as:
- new world cases
- new world cases increase percentage
- total world cases
- new world cases increase percentage
- new us cases
- new us cases increase percentage
- total us cases
- new us cases increase percentage
- new world vaccinated 
- new-world-vaccinated-increase-(%)
- total-world-vaccinated
- total-world-vaccinated-increase-(%)
- new-usa-vaccinated
- new-usa-vaccinated-increase-(%)
- total-usa-vaccinated 
- total-usa-vaccinated-increase-(%)

In [34]:
y = projectTable.get_column(col_identifier="JBLU-Went-Up?")

col_identifiers = ["new-world-cases", "new-world-cases-increase-(%)", "total-world-cases", "total-world-cases-increase-(%)", "new-usa-cases", "new-usa-cases-increase-(%)", "total-usa-cases", "total-usa-cases-increase-(%)", "new-world-vaccinated", "new-world-vaccinated-increase-(%)", "total-world-vaccinated", "total-world-vaccinated-increase-(%)", "new-usa-vaccinated", "new-usa-vaccinated-increase-(%)", "total-usa-vaccinated", "total-usa-vaccinated-increase-(%)"]
X = myutils.get_multiple_cols(table=projectTable.data, header=projectTable.column_names, col_identifiers=col_identifiers)

In [35]:
np.random.seed(50)
indexes = np.random.randint(0, len(projectTable.data), 30)
X_train = [X[index] for index in range(len(X)) if index not in indexes]
y_train = [y[index] for index in range(len(y)) if index not in indexes]
X_test = [X[index] for index in indexes]
y_sol = [y[index] for index in indexes]

#### kNN Classifier

In [36]:
kNN_clf = MyKNeighborsClassifier(n_neighbors=5)
kNN_clf.fit(X_train=X_train, y_train=y_train)

y_pred = kNN_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of neighbors being 5, we achieved an accuracy score of `40%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL|
|:-:|:-:|:-:|:-:|
|TRUE|5|9|14|
|FALSE|9|7|16|
|TOTAL|14|16|30|

#### Naive Bayes Classifier

In [37]:
nb_clf = MyNaiveBayesClassifier()
nb_clf.fit(X_train=X_train, y_train=y_train)

y_pred = nb_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of test size being 30, we achieved an accuracy score of `53%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL|
|:-:|:-:|:-:|:-:|
|TRUE|0|14|14|
|FALSE|0|16|16|
|TOTAL|0|30|30|

### Analysis of United Airline

In [38]:
y = projectTable.get_column(col_identifier="UAL-Went-Up?")
y_train = [y[index] for index in range(len(y)) if index not in indexes]
y_sol = [y[index] for index in indexes]

#### kNN Classifier

In [39]:
kNN_clf = MyKNeighborsClassifier(n_neighbors=5)
kNN_clf.fit(X_train=X_train, y_train=y_train)

y_pred = kNN_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of neighbors being 5, we achieved an accuracy score of `47%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL|
|:-:|:-:|:-:|:-:|
|TRUE|5|9|14|
|FALSE|7|9|16|
|TOTAL|12|18|30|

#### Naive Bayes Classifier

In [40]:
nb_clf = MyNaiveBayesClassifier()
nb_clf.fit(X_train=X_train, y_train=y_train)

y_pred = nb_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of test size being 30, we achieved an accuracy score of `53%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL|
|:-:|:-:|:-:|:-:|
|TRUE|0|14|14|
|FALSE|0|16|16|
|TOTAL|0|30|30|

### Analysis of American Airline

In [41]:
y = projectTable.get_column(col_identifier="AAL-Went-Up?")
y_train = [y[index] for index in range(len(y)) if index not in indexes]
y_sol = [y[index] for index in indexes]

In [42]:
kNN_clf = MyKNeighborsClassifier(n_neighbors=5)
kNN_clf.fit(X_train=X_train, y_train=y_train)

y_pred = kNN_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of neighbors being 5, we achieved an accuracy score of `56%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL
|:-:|:-:|:-:|:-:|
|TRUE|6|7|13|
|FALSE|6|11|17|
|TOTAL|12|18|30|

#### Naive Bayes Classifier

In [43]:
nb_clf = MyNaiveBayesClassifier()
nb_clf.fit(X_train=X_train, y_train=y_train)

y_pred = nb_clf.predict(X_test=X_test)

acc_scr = accuracy_score(y_sol, y_pred)
con_max = confusion_matrix(y_sol, y_pred, ["TRUE", "FALSE"])

Using the random seed of 50, with number of test size being 30, we achieved an accuracy score of `57%`.

##### Confusion Matrix
||TRUE|FALSE|TOTAL|
|:-:|:-:|:-:|:-:|
|TRUE|0|13|13|
|FALSE|0|17|17|
|TOTAL|0|30|30|