# **Notebook X.** Summarizing Classification Outcomes
----


In  the earlier notebooks we have tested different classification models.

Here, we provide a comparison of all of the models as well as an in-depth analysis of the "best performing" model.

# X.1. Preamble: Load Packages
---

In [1]:
# General Packages #
import os
import pandas as pd
import numpy as np

# Load TQDM to Show Progress Bars #
from tqdm import tqdm
from tqdm.notebook import tqdm as tqdm_notebook

# Sklearn Packages #
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer, TfidfVectorizer
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report

from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score, confusion_matrix

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Turn of warnings, just to avoid pesky messages that might cause confusion here
# Remove when testing your own code #
import warnings
warnings.filterwarnings("ignore")

In [7]:
# Mount Personal Google Drive on own Machine -- You have to follow the link to log in #
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# X.2. Compare Classification Results ##
----------------

We have saved our model classification results in Google Drive. Here, we are going to compare the different classification results.

In [8]:
# Change to Working Directory with Training Data #
os.chdir("/content/drive/MyDrive/Power-data-main/")

Counter = 0
for root, dirs, files in os.walk("./Output/Model Performance/"):
  for file in files:
    temp = pd.read_csv("./Output/Model Performance/" + file)
    if Counter == 0:
      Combined = temp
    else:
      Combined = pd.concat([Combined, temp])
    Counter = Counter + 1


In [9]:
# See the Ouptut with All Models #

Combined.drop("Unnamed: 0", axis = 1).sort_values("Accuracy", ascending = False)

Unnamed: 0,Name,Type,Share,True-Positives,False-Negatives,False-Positives,True-Negatives,Accuracy,AUC,Precision,Recall,F1
0,XLNet (Chinese),Transformer,0.423,0.927,0.936,0.073,0.064,0.932,0.93,0.927,0.915,0.921
1,BERT-wwm (Chinese),Transformer,0.438,0.909,0.944,0.091,0.056,0.928,0.928,0.909,0.926,0.917
3,BERT (Chinese),Transformer,0.441,0.904,0.945,0.096,0.055,0.927,0.927,0.904,0.928,0.916
2,ELECTRA (Chinese),Transformer,0.432,0.911,0.938,0.089,0.062,0.927,0.926,0.911,0.918,0.915
0,Multi Layer Perceptron,Embedding Vectors,0.436,0.898,0.935,0.102,0.065,0.919,0.918,0.898,0.914,0.906
4,MacBERT (Chinese),Transformer,0.445,0.891,0.941,0.109,0.059,0.919,0.92,0.891,0.924,0.908
1,Support Vector Classifier (RBF),Embedding Vectors,0.45,0.879,0.94,0.121,0.06,0.913,0.914,0.879,0.923,0.901
0,Random Forest,Bag of Words,0.444,0.882,0.932,0.118,0.068,0.91,0.91,0.882,0.912,0.897
1,Support Vector Classifier (RBF),Bag of Words,0.425,0.898,0.918,0.102,0.082,0.909,0.907,0.898,0.89,0.894
0,Power data Word2Vec,CNN,0.46,0.861,0.94,0.139,0.06,0.904,0.906,0.861,0.925,0.892


In [10]:
# Best Performing Model #

Combined.drop("Unnamed: 0", axis = 1).sort_values("Accuracy", ascending = False).head(1)

Unnamed: 0,Name,Type,Share,True-Positives,False-Negatives,False-Positives,True-Negatives,Accuracy,AUC,Precision,Recall,F1
0,XLNet (Chinese),Transformer,0.423,0.927,0.936,0.073,0.064,0.932,0.93,0.927,0.915,0.921


In [11]:
Combined.to_csv("./Output/All Model Performance Combined.csv")