# Production Models 

Below are the models that are chosen for production. The models are chosen based on the performance on the validation set. The models are chosen based on the following criteria:
- Performance compared to the baseline model, and the other models
- Interpretability
- Training time

The models that are chosen are:
- NLP Classification Models:
    - TF-IDF + Logistic Regression
    - TF-IDF + Gradient Boosting Classifier
- Sentiment Analysis Models:
    - TextBlob

The performance of the models will be displayed below.

In [39]:
##Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Military Classification and Sentiment Analysis

### Classification Models

#### TF-IDF + Logistic Regression (Baseline Model)

In [40]:
#Import Data
base_lr = pd.read_csv('./files/classification_report_base_lr.csv')

#Clean DataFrame
base_lr.rename(columns={'Unnamed: 0':'Class'}, inplace=True)
base_lr.set_index('Class', inplace=True)
base_lr = base_lr.round(2)
base_lr.loc['accuracy', ['precision', 'recall', 'support']] = pd.NA
base_lr.replace(np.nan, '', regex=True, inplace=True)

base_lr

Unnamed: 0_level_0,precision,recall,f1-score,support
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Army,0.76,1.0,0.86,253.0
USMC,1.0,0.11,0.2,90.0
accuracy,,,0.77,
macro avg,0.88,0.56,0.53,343.0
weighted avg,0.82,0.77,0.69,343.0


![Logistic Regression Baseline Confusion Matrix]('./imgs/base_lr.png')

#### TF-IDF + Logistic Regression (Best Model)

In [42]:
#Import Data
best_lr = pd.read_csv('./files/classification_report_best_lr.csv')

#Clean DataFrame
best_lr.rename(columns={'Unnamed: 0':'Class'}, inplace=True)
best_lr.set_index('Class', inplace=True)
best_lr = best_lr.round(2)
best_lr.loc['accuracy', ['precision', 'recall', 'support']] = pd.NA
best_lr.replace(np.nan, '', regex=True, inplace=True)

best_lr

Unnamed: 0_level_0,precision,recall,f1-score,support
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Army,0.86,0.89,0.88,253.0
USMC,0.66,0.6,0.63,90.0
accuracy,,,0.81,
macro avg,0.76,0.74,0.75,343.0
weighted avg,0.81,0.81,0.81,343.0


![Logistic Regression Best Confusion Matrix]('./imgs/best_lr.png')

#### TF-IDF + Gradient Boosting Classifier (Baseline Model)

In [43]:
#Import Data
base_gb = pd.read_csv('./files/classification_report_base_gb.csv')

#Clean DataFrame
base_gb.rename(columns={'Unnamed: 0':'Class'}, inplace=True)
base_gb.set_index('Class', inplace=True)
base_gb = base_gb.round(2)
base_gb.loc['accuracy', ['precision', 'recall', 'support']] = pd.NA
base_gb.replace(np.nan, '', regex=True, inplace=True)

base_gb

Unnamed: 0_level_0,precision,recall,f1-score,support
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Army,0.8,0.98,0.88,253.0
USMC,0.85,0.31,0.46,90.0
accuracy,,,0.8,
macro avg,0.82,0.65,0.67,343.0
weighted avg,0.81,0.8,0.77,343.0


![Gradient Boosting Baseline Confusion Matrix]('./imgs/base_gb.png')

#### TF-IDF + Gradient Boosting Classifier (Best Model)

In [44]:
#Import Data
best_gb = pd.read_csv('./files/classification_report_best_gb.csv')

#Clean DataFrame
best_gb.rename(columns={'Unnamed: 0':'Class'}, inplace=True)
best_gb.set_index('Class', inplace=True)
best_gb = best_gb.round(2)
best_gb.loc['accuracy', ['precision', 'recall', 'support']] = pd.NA
best_gb.replace(np.nan, '', regex=True, inplace=True)

best_gb

Unnamed: 0_level_0,precision,recall,f1-score,support
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Army,0.81,0.98,0.88,253.0
USMC,0.86,0.33,0.48,90.0
accuracy,,,0.81,
macro avg,0.83,0.66,0.68,343.0
weighted avg,0.82,0.81,0.78,343.0


![Gradient Boosting Best Confusion Matrix]('./imgs/best_gb.png')

### Sentiment Analysis Models

#### TextBlob

## Mental Health Classification and Sentiment Analysis

### Classification Model

#### TD-IDF + Gradient Boosting Classifier

In [None]:
#Import Data
base_lr = pd.read_csv('./files/classification_report_base_lr.csv')

#Clean DataFrame
base_lr.rename(columns={'Unnamed: 0':'Class'}, inplace=True)
base_lr.set_index('Class', inplace=True)
base_lr = base_lr.round(2)
base_lr.loc['accuracy', ['precision', 'recall', 'support']] = pd.NA
base_lr.replace(np.nan, '', regex=True, inplace=True)

base_lr

### Sentiment Analysis Models

#### TextBlob

## Additional Notes and Comments

In production, the content of the posts will not be presented to protect the privacy of the users. The models will display overall sentiments of the posts without displaying the content of the posts. Validation of the sentiment analysis models based on the content of the posts must be done by the analysts manually.