# NLP Movie Genre Classification - Pickling the Best Model

by Andrew Alarcon

## Introduction

In this final notebook, I will be re-creating the best performing Logistic Regression model from notebook 3.1 and exporting the model using pickle.

## Importing the Libraries

In [9]:
import pandas as pd
import numpy as np
import joblib

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MaxAbsScaler
from sklearn.metrics import accuracy_score


## Loading in the Data

In [2]:
X_train = pd.read_csv('data/X_train.csv')
X_test = pd.read_csv('data/X_test.csv')

y_train = pd.read_csv('data/y_train.csv')
y_test = pd.read_csv('data/y_test.csv')

In [5]:
y_train = y_train.to_numpy()
y_test = y_test.to_numpy()

y_train = y_train.ravel()
y_test = y_test.ravel()

## Scaling the Data

In [3]:
abs_scaler = MaxAbsScaler()
abs_scaler.fit(X_train)

X_train_scaled_abs = abs_scaler.transform(X_train)
X_test_scaled_abs = abs_scaler.transform(X_test)

In [7]:
# Instantiate model
log_reg = LogisticRegression(solver='liblinear', C=10, max_iter=2000, random_state=42)

# Fit the model
result = log_reg.fit(X_train_scaled_abs, y_train)

# Predict on the model
y_pred_train = log_reg.predict(X_train_scaled_abs)
y_pred_test = log_reg.predict(X_test_scaled_abs)

# Get the accuracy on the training and testing sets
preds_acc_train = accuracy_score(y_train, y_pred_train)
preds_acc_test = accuracy_score(y_test, y_pred_test)

print(f'Logistic Regression accuracy on the training set = {preds_acc_train}')
print(f'Logistic Regression accuracy on the testing set = {preds_acc_test}')

Logistic Regression accuracy on the training set = 0.5150206165324955
Logistic Regression accuracy on the testing set = 0.49867478158437223


In [10]:
joblib.dump(log_reg, 'models/best_model.pkl')

['models/best_model.pkl']

## Conclusion

I have successfully saved my best performing Logistic Regression model! 

There are a few possible next steps after this project is finished. One would be to deploy this model onto a web app that allows anyone to enter a movie/show description and have it classified using this Logistic Regression model.

Another possible step would be to create an **even better** model using Recurrent Nueral Networks via GPT3 and OpenAI. This model would then be used for production on a web app like how I mentioned previously.
