# <u><center>Project 2 - Part 7 (Core)
- Authored By: Eric N. Valdez
- Date: 04/19/2024

### For this project you will create a streamlit app to get predictions from your best model.

# Part 1: Preparing Best Models for Streamlit
- ### Create a new Part 7 - Preparing for Streamlit  notebook.

### `In the new notebook`
- ### Define a filpaths ditionary and save it to config/filepaths.json to include file paths for each component you will save(`review below`).
- ### Copy your best models from part 6 into the new notebook.
    - #### Update your code to define the final public-facing class labels. 

In [12]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns',100)

In [1]:
from pprint import pprint
FPATHS = dict(
    data={
        "raw": {
            "full": "Data-NLP/movie_reviews_v2.csv",  # (This is the original full dataframe we already have)
            "eda": "Data-NLP/movie_reviews_v2.csv" # We haven't saved this yet
        },
        "ml": {
            "train": "Data-NLP/training-data.joblib",  # (X_train,y_train) We haven't saved this yet
            "test": "Data-NLP/testing-data.joblib",  # (X_test,y_test) We haven't saved this yet
        },
    },
    models={
        "linear_regression": "Models/linear_regression/linreg.joblib", # We haven't saved this yet
        "random_forest": "Models/random_forest/rf_reg.joblib", # We haven't saved this yet
    },
    images={
        "banner": "Images/IMDB.png", # We haven't saved this yet
    },
)
pprint(FPATHS)

{'data': {'ml': {'test': 'Data-NLP/testing-data.joblib',
                 'train': 'Data-NLP/training-data.joblib'},
          'raw': {'eda': 'Data-NLP/movie_reviews_v2.csv',
                  'full': 'Data-NLP/movie_reviews_v2.csv'}},
 'images': {'banner': 'Images/IMDB.png'},
 'models': {'linear_regression': 'Models/linear_regression/linreg.joblib',
            'random_forest': 'Models/random_forest/rf_reg.joblib'}}


In [2]:
# Save the filepaths 
import os, json
os.makedirs('config/', exist_ok=True)
FPATHS_FILE = 'config/filepaths.json'
with open(FPATHS_FILE, 'w') as f:
    json.dump(FPATHS, f)

In [3]:
import os
def create_directories_from_paths(nested_dict):
    """OpenAI. (2023). ChatGPT [Large language model]. https://chat.openai.com 
    Recursively create directories for file paths in a nested dictionary.
    Parameters:
    nested_dict (dict): The nested dictionary containing file paths.
    """
    for key, value in nested_dict.items():
        if isinstance(value, dict):
            # If the value is a dictionary, recurse into it
            create_directories_from_paths(value)
        elif isinstance(value, str):
            # If the value is a string, treat it as a file path and get the directory path
            directory_path = os.path.dirname(value)
            # If the directory path is not empty and the directory does not exist, create it
            if directory_path and not os.path.exists(directory_path):
                os.makedirs(directory_path)
                print(f"Directory created: {directory_path}")

# Use the function on your FPATHS dictionary
create_directories_from_paths(FPATHS)

In [4]:
# We can access a file using our dictionary
FPATHS['data']['raw']['full']

'Data-NLP/movie_reviews_v2.csv'

In [5]:
# We can access a file using our dictionary
FPATHS['models']['random_forest']

'Models/random_forest/rf_reg.joblib'

In [6]:
# Confirm the images is in the correct location
from IPython.display import display, Markdown
Markdown(f"<img src='{FPATHS['images']['banner']}'>")

<img src='Images/IMDB.png'>

In [7]:
import os, sys
%load_ext autoreload 
%autoreload 2
import movie_functions as fn

In [8]:
# %reload_ext autoreload

In [9]:
# loading the joblib from part 5 of the project
df = fn.joblib.load('Data-NLP/processed_data.joblib')
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8650 entries, 843 to 575264
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   original_title  8650 non-null   object 
 1   review          8650 non-null   object 
 2   rating          7454 non-null   float64
 3   ratings         2419 non-null   object 
 4   tokens          8650 non-null   object 
 5   lemmatized      8650 non-null   object 
 6   tokens-joined   8650 non-null   object 
 7   lemmas-joined   8650 non-null   object 
dtypes: float64(1), object(7)
memory usage: 608.2+ KB


Unnamed: 0_level_0,original_title,review,rating,ratings,tokens,lemmatized,tokens-joined,lemmas-joined
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
843,花樣年華,"This is a fine piece of cinema from Wong Kar-Wai that tells us a story of two people whom circumstance throws together - but not in a way you might expect. We start with two couples who move into a new building. One a newspaper man with his wife, the other a business executive and his wife. The ...",7.0,,"[fine, piece, cinema, wong, kar, wai, tell, story, people, circumstance, throw, way, expect, start, couple, new, building, newspaper, man, wife, business, executive, wife, businessman, rarely, home, journalist, wife, leave, increasingly, loose, end, long, friendship, develop, usually, noodle, en...","[fine, piece, cinema, wong, kar, wai, tell, story, people, circumstance, throw, way, expect, start, couple, new, building, newspaper, man, wife, business, executive, wife, businessman, rarely, home, journalist, wife, leave, increasingly, loose, end, long, friendship, develop, usually, noodle, en...",fine piece cinema wong kar wai tell story people circumstance throw way expect start couple new building newspaper man wife business executive wife businessman rarely home journalist wife leave increasingly loose end long friendship develop usually noodle entirely platonic relationship solid tru...,fine piece cinema wong kar wai tell story people circumstance throw way expect start couple new building newspaper man wife business executive wife businessman rarely home journalist wife leave increasingly loose end long friendship develop usually noodle entirely platonic relationship solid tru...
7443,Chicken Run,"A guilty pleasure for me personally, as I love both 'The Great Escape' and most of the works I have seen, over the years, from this rightfully-esteemed British animation company. Highly recommended both for children and for adults who enjoy animation.",9.0,High_rating,"[guilty, pleasure, personally, love, great, escape, work, see, year, rightfully, esteem, british, animation, company, highly, recommend, child, adult, enjoy, animation]","[guilty, pleasure, personally, love, great, escape, work, see, year, rightfully, esteem, british, animation, company, highly, recommend, child, adult, enjoy, animation]",guilty pleasure personally love great escape work see year rightfully esteem british animation company highly recommend child adult enjoy animation,guilty pleasure personally love great escape work see year rightfully esteem british animation company highly recommend child adult enjoy animation
7443,Chicken Run,"Made my roommate who hates stop-motion animation watched this in 2018 and even he had a good time. It's maybe not as great as I remember thinking it was when I was a little kid, but it still holds up to some degree.\r\n\r\n_Final rating:★★★ - I liked it. Would personally recommend you give it a ...",6.0,,"[roommate, hate, stop, motion, animation, watch, 2018, good, time, maybe, great, remember, think, little, kid, hold, degree, final, rating, ★, ★, ★, like, personally, recommend]","[roommate, hate, stop, motion, animation, watch, 2018, good, time, maybe, great, remember, think, little, kid, hold, degree, final, rating, ★, ★, ★, like, personally, recommend]",roommate hate stop motion animation watch 2018 good time maybe great remember think little kid hold degree final rating ★ ★ ★ like personally recommend,roommate hate stop motion animation watch 2018 good time maybe great remember think little kid hold degree final rating ★ ★ ★ like personally recommend
7443,Chicken Run,"A very good stop-motion animation!\r\n\r\n<em>'Chicken Run'</em>, which I watched a crap tonne when I was little but not for a vast number of years now, is an impressive production given it came out in 2000. Despite a pretty simple feel to the film, it's a very well developed concept.\r\n\r\nThe...",8.0,,"[good, stop, motion, animation, <, em>'chicken, run'</em, >, watch, crap, tonne, little, vast, number, year, impressive, production, give, come, 2000, despite, pretty, simple, feel, film, develop, concept, admittedly, short, run, time, truly, fly, course, look, relatively, terrific, impress, pac...","[good, stop, motion, animation, <, em>'chicken, run'</em, >, watch, crap, tonne, little, vast, number, year, impressive, production, give, come, 2000, despite, pretty, simple, feel, film, develop, concept, admittedly, short, run, time, truly, fly, course, look, relatively, terrific, impress, pac...",good stop motion animation < em>'chicken run'</em > watch crap tonne little vast number year impressive production give come 2000 despite pretty simple feel film develop concept admittedly short run time truly fly course look relatively terrific impress pacing clean cast julia sawalha definite s...,good stop motion animation < em>'chicken run'</em > watch crap tonne little vast number year impressive production give come 2000 despite pretty simple feel film develop concept admittedly short run time truly fly course look relatively terrific impress pacing clean cast julia sawalha definite s...
7443,Chicken Run,"Ok, there is an huge temptation to riddle this review with puns - but I'm just going to say it's a cracking little family adventure. It's seemingly based on a whole range of classic movies from the ""Great Escape"", ""Star Trek"" to ""Love Story"" with a score cannibalised from just about any/everythi...",7.0,,"[ok, huge, temptation, riddle, review, pun, go, crack, little, family, adventure, seemingly, base, range, classic, movie, great, escape, star, trek, love, story, score, cannibalise, write, messrs., korngold, williams, bernstein, add, super, stop, motion, animation, ray, harryhausen, proud, flock...","[ok, huge, temptation, riddle, review, pun, go, crack, little, family, adventure, seemingly, base, range, classic, movie, great, escape, star, trek, love, story, score, cannibalise, write, messrs., korngold, williams, bernstein, add, super, stop, motion, animation, ray, harryhausen, proud, flock...",ok huge temptation riddle review pun go crack little family adventure seemingly base range classic movie great escape star trek love story score cannibalise write messrs. korngold williams bernstein add super stop motion animation ray harryhausen proud flock chicken relentlessly exploit egg evil...,ok huge temptation riddle review pun go crack little family adventure seemingly base range classic movie great escape star trek love story score cannibalise write messrs. korngold williams bernstein add super stop motion animation ray harryhausen proud flock chicken relentlessly exploit egg evil...


## `Saving Your Models`
- ### For your Machine Learning Model:
    - Save your training data ([X_train, y_train])
    - Save your test data ([X_test, y_test])
    - Save your target_lookup dictionary and/or your label encoder
    - Save your best model

In [14]:
# Load data 
fpath =  "Data-NLP/movie_reviews_v2.csv"
df = pd.read_csv(fpath)
df = df.set_index("movie_id")
# Define columns to use
columns_to_use = columns_to_use = ['original_title', 'review', 'rating']
df = df[columns_to_use]
df.head()

Unnamed: 0_level_0,original_title,review,rating
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
843,花樣年華,"This is a fine piece of cinema from Wong Kar-Wai that tells us a story of two people whom circumstance throws together - but not in a way you might expect. We start with two couples who move into a new building. One a newspaper man with his wife, the other a business executive and his wife. The ...",7.0
7443,Chicken Run,"A guilty pleasure for me personally, as I love both 'The Great Escape' and most of the works I have seen, over the years, from this rightfully-esteemed British animation company. Highly recommended both for children and for adults who enjoy animation.",9.0
7443,Chicken Run,"Made my roommate who hates stop-motion animation watched this in 2018 and even he had a good time. It's maybe not as great as I remember thinking it was when I was a little kid, but it still holds up to some degree.\r\n\r\n_Final rating:★★★ - I liked it. Would personally recommend you give it a ...",6.0
7443,Chicken Run,"A very good stop-motion animation!\r\n\r\n<em>'Chicken Run'</em>, which I watched a crap tonne when I was little but not for a vast number of years now, is an impressive production given it came out in 2000. Despite a pretty simple feel to the film, it's a very well developed concept.\r\n\r\nThe...",8.0
7443,Chicken Run,"Ok, there is an huge temptation to riddle this review with puns - but I'm just going to say it's a cracking little family adventure. It's seemingly based on a whole range of classic movies from the ""Great Escape"", ""Star Trek"" to ""Love Story"" with a score cannibalised from just about any/everythi...",7.0


In [15]:
# Checking for null Values
df.isna().sum()

original_title       0
review               0
rating            1196
dtype: int64

In [17]:
def create_groups(x):
    if x>=9:
        return "High_Rating"
    elif x <=4:
        return "Low_Rating"
    else: 
        return None

In [18]:
# Should return high
create_groups(9)

'High_Rating'

In [19]:
# Should return low
create_groups(4)

'Low_Rating'

In [22]:
# Use the function to create a new "rating" column with groups
df['ratings'] = df['rating'].map(create_groups)
df['ratings'].value_counts(dropna=False)

None           6231
Low_Rating     1224
High_Rating    1195
Name: ratings, dtype: int64

In [23]:
# Define X and y
X = df['review']
y = df['ratings']

X.head()

movie_id
843     This is a fine piece of cinema from Wong Kar-Wai that tells us a story of two people whom circumstance throws together - but not in a way you might expect. We start with two couples who move into a new building. One a newspaper man with his wife, the other a business executive and his wife. The ...
7443                                                    A guilty pleasure for me personally, as I love both 'The Great Escape' and most of the works I have seen, over the years, from this rightfully-esteemed British animation company. Highly recommended both for children and for adults who enjoy animation.
7443    Made my roommate who hates stop-motion animation watched this in 2018 and even he had a good time. It's maybe not as great as I remember thinking it was when I was a little kid, but it still holds up to some degree.\r\n\r\n_Final rating:★★★ - I liked it. Would personally recommend you give it a ...
7443    A very good stop-motion animation!\r\n\r\n<em>'Chicken Run'

In [24]:
y.value_counts(normalize=True)

Low_Rating     0.505994
High_Rating    0.494006
Name: ratings, dtype: float64

In [55]:
# Split data into train, test, val
X_train, X_test, y_train, y_test = fn.train_test_split(X, y, test_size=0.2, random_state=42)
X_train_full, X_test, y_train_full, y_test = fn.train_test_split(X, y, test_size=.3, random_state=42)
X_val, X_test, y_val, y_test = fn.train_test_split(X_test, y_test, test_size=.5, random_state=42)
(len(X_train_full), len(X_val), len(X_test))

(6055, 1297, 1298)

In [26]:
# Check class balance
y_train_full.value_counts(normalize=True)

Low_Rating     0.509953
High_Rating    0.490047
Name: ratings, dtype: float64

In [56]:
y_train.value_counts()

Low_Rating     998
High_Rating    946
Name: ratings, dtype: int64

In [37]:
## Instantiate CountVectorizer
countvector = fn.CountVectorizer()#min_df=3, ngram_range=(1,2))
countvector.fit(X_train_full)

# Transform X_train to see the result (for demo only)
countvector.transform(X_train_full)

<6055x42617 sparse matrix of type '<class 'numpy.int64'>'
	with 809689 stored elements in Compressed Sparse Row format>

In [None]:
fn.evaluate_classification(count_pipe, X_train, y_train, X_val, y_val)

- ### For your  Deep NLP model:
    - Save your training data (train_ds)
    - Save your test data (test_ds)
    - Save your best neural network.
        - `Reminder:`use safe_format = 'tf' to save the model in a folder of repo-friendly files.

In [57]:
import json
with open ('config/filepaths.json') as f:
    FPATHS = json.load(f)
    
FPATHS.keys()

dict_keys(['data', 'models', 'images'])

In [58]:
FPATHS

{'data': {'raw': {'full': 'Data-NLP/movie_reviews_v2.csv',
   'eda': 'Data-NLP/movie_reviews_v2.csv'},
  'ml': {'train': 'Data-NLP/training-data.joblib',
   'test': 'Data-NLP/testing-data.joblib'}},
 'models': {'linear_regression': 'Models/linear_regression/linreg.joblib',
  'random_forest': 'Models/random_forest/rf_reg.joblib'},
 'images': {'banner': 'Images/IMDB.png'}}

In [59]:
def load_data(fpath):
    df = pd.read_csv(fpath)
    df = df.set_index("movie_id")
    return df

In [60]:
def load_Xy_data(fpath):
    return joblib.load(fpath)

# Part 2: Streamlit App

### You will create a Streamlit app to get model predictions for user-entered text. 

### You may select `either your best machine learning model or deep nlp model.` **Note:** for portfolio purposes, it would be best to eventually create an app for both.
- #### Create a Stramlit app for getting predictions for a user-entered text from your loaded model.
- #### (Optional but recommended); Include a Lime Text Explainer explanation for the prediction.
- #### Include the ability to load the training and test data to evaluate the model.