# Project 2 - Part 7 (Core)

For this project you will create a streamlit app to get predictions from your best model.

### Deliverables:
- 
New notebook file for Preparing for streamli- t
Models and data sav- ed
app.py in main folder of repo

## Imports

In [31]:
# packages
import streamlit as st

In [34]:
# functions

# Define the load train or test data function with caching
@st.cache_data
def load_Xy_data(fpath):
    return joblib.load(fpath)
    
@st.cache_resource
def load_model_ml(fpath):
    return joblib.load(fpath)

@st.cache_data
def load_network(fpath):
    model = tf.keras.models.load_model(fpath)
    return model

@st.cache_data
def load_lookup(fpath=FPATHS['Data']['ml']['target_lookup']):
    return joblib.load(fpath)

def predict_decode_deep(X_to_pred, network,lookup_dict,
                       return_index=True):
    
    if isinstance(X_to_pred, str):
        
        X = [X_to_pred]
    else:
        X = X_to_pred
    
    pred_probs = network.predict(X)

    pred_class = fn.convert_y_to_sklearn_classes(pred_probs)
    
    # Decode label
    class_name = lookup_dict[pred_class[0]]

    return class_name


def classification_metrics_streamlit(y_true, y_pred, label='',
                           figsize=(8,4),
                           normalize='true', cmap='Blues',
                           colorbar=False,values_format=".2f",
                                    class_names=None):
    """Modified version of classification metrics function from Intro to Machine Learning.
    Updates:
    - Reversed raw counts confusion matrix cmap  (so darker==more).
    - Added arg for normalized confusion matrix values_format
    """
    # Get the classification report
    report = classification_report(y_true, y_pred,target_names=class_names)
    
    ## Save header and report
    header = "-"*70
    final_report = "\n".join([header,f" Classification Metrics: {label}", header,report,"\n"])
        
    ## CONFUSION MATRICES SUBPLOTS
    fig, axes = plt.subplots(ncols=2, figsize=figsize)
    
    # Create a confusion matrix  of raw counts (left subplot)
    ConfusionMatrixDisplay.from_predictions(y_true, y_pred,
                                            normalize=None, 
                                            cmap='gist_gray_r',# Updated cmap
                                            display_labels = class_names, # Added display labels
                                            values_format="d", 
                                            colorbar=colorbar,
                                            ax = axes[0]);
    axes[0].set_title("Raw Counts")
    
    # Create a confusion matrix with the data with normalize argument 
    ConfusionMatrixDisplay.from_predictions(y_true, y_pred,
                                            normalize=normalize,
                                            cmap=cmap, 
                                            values_format=values_format, #New arg
                                            display_labels = class_names, # Added display labels
                                            colorbar=colorbar,
                                            ax = axes[1]);
    axes[1].set_title("Normalized Confusion Matrix")
    
    # Adjust layout and show figure
    fig.tight_layout()

    return final_report, fig

def classification_metrics_streamlit_tensorflow(model,X_train=None, y_train=None, 
                                                label='Training Data',
                                    figsize=(6,4), normalize='true',
                                    output_dict = False,
                                    cmap_train='Blues',
                                    cmap_test="Reds",
                                    values_format=".2f", 
                                                class_names = None,
                                    colorbar=False):
    
    ## Check if X_train is a dataset
    if hasattr(X_train,'map'):
        # If it IS a Datset:
        # extract y_train and y_train_pred with helper function
        y_train, y_train_pred = fn.get_true_pred_labels(model, X_train)
    else:
        # Get predictions for training data
        y_train_pred = model.predict(X_train)


     ## Pass both y-vars through helper compatibility function
    y_train = fn.convert_y_to_sklearn_classes(y_train)
    y_train_pred = fn.convert_y_to_sklearn_classes(y_train_pred)
    
    # Call the helper function to obtain regression metrics for training data
    report, conf_mat = classification_metrics_streamlit(y_train, y_train_pred, 
                                                        figsize=figsize,
                                         colorbar=colorbar, cmap=cmap_train, 
                                                        values_format=values_format,label=label,
                                                       class_names=class_names)
    return report, conf_mat

X_train, y_train = load_Xy_data(fpath=FPATHS['Data']['ml']['train'])

X_test, y_test = load_Xy_data(fpath=FPATHS['Data']['ml']['test'])

def get_X_to_predict():
    X_to_predict = pd.DataFrame({'bedrooms': selected_beds,
                             'bathrooms': selected_baths, 
                             'sqft_lot': selected_lot},
                               index=['House'])
    return X_to_predict

2024-01-26 16:21:01.200 
  command:

    streamlit run C:\Users\eliud\anaconda3\envs\dojo-env\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2024-01-26 16:21:01.202 No runtime found, using MemoryCacheStorageManager


# Part 1: Preparing Best Models for Streamlit

- Define a filepaths dictionary and save it to config/filepaths.json  to include file paths for each component you will save (review below).

- Copy your best models from part 6 into the new notebook. 

- Update your code to define the final public-facing class labels. 

## Saving Your Models

### For your Machine Learning model:

- Save your training data  ([X_train, y_train]​)

- Save your test data ([X_test, y_test]​)​

- Save your target_lookup dictionary and/or your label encoder

- Save your best model

### For your Deep NLP model:

- Save your training data  (train_ds​)

- Save your test data (test_ds​)​

- Save your best neural network.

Reminder: use safe_format='tf' to save the model in a folder of repo-friendly files.

## Imports

In [None]:
# %load_ext autoreload
# %autoreload 2

In [11]:
# packages
import pandas as pd
import joblib
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
from imblearn.under_sampling import RandomUnderSampler
import spacy
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
from sklearn.metrics import ConfusionMatrixDisplay, classification_report
import tensorflow as tf
# Increase column width
pd.set_option('display.max_colwidth', 250)
import tensorflow as tf
from pprint import pprint

In [12]:
# functions

# ChatGPT function from LP
import os
def create_directories_from_paths(nested_dict):
    """OpenAI. (2023). ChatGPT [Large language model]. https://chat.openai.com 
    Recursively create directories for file paths in a nested dictionary.
    Parameters:
    nested_dict (dict): The nested dictionary containing file paths.
    """
    for key, value in nested_dict.items():
        if isinstance(value, dict):
            # If the value is a dictionary, recurse into it
            create_directories_from_paths(value)
        elif isinstance(value, str):
            # If the value is a string, treat it as a file path and get the directory path
            directory_path = os.path.dirname(value)
            # If the directory path is not empty and the directory does not exist, create it
            if directory_path and not os.path.exists(directory_path):
                os.makedirs(directory_path)
                print(f"Directory created: {directory_path}")

#### Define a filepaths dictionary and save it to config/filepaths.json  to include file paths for each component you will save (review below).

In [24]:
FPATHS = dict(

    Data={
        "ml":{
            "train":"Data/training_data_ml.joblib",
            "test":"Data/test_data_ml.joblib",
            "target_lookup":"Data/target_lookup.joblib",
            "label_encoder":"Data/label_encoder.joblib",
         },
        "tf":{
            "train_tf":"Data/train_ds.joblib",
            "test_tf":"Data/test_ds.joblib",
        },
    },

    Models={
        "clf":"Models/clf.joblib",
        "gru":"Models/gru.joblib"}
)
pprint(FPATHS)

{'Data': {'ml': {'label_encoder': 'Data/label_encoder.joblib',
                 'target_lookup': 'Data/target_lookup.joblib',
                 'test': 'Data/test_data_ml.joblib',
                 'train': 'Data/training_data_ml.joblib'},
          'tf': {'test_tf': 'Data/test_ds.joblib',
                 'train_tf': 'Data/train_ds.joblib'}},
 'Models': {'clf': 'Models/clf.joblib', 'gru': 'Models/gru.joblib'}}


In [25]:
# Use the function on your FPATHS dictionary
create_directories_from_paths(FPATHS)

In [26]:
 ## Save the filepaths 
import os, json
os.makedirs('config/', exist_ok=True)
FPATHS_FILE = 'config/filepaths.json'
with open(FPATHS_FILE, 'w') as f:
    json.dump(FPATHS, f)

#### Copy your best models from part 6 into the new notebook. 

#### Update your code to define the final public-facing class labels. (Updated in Functions)

In [29]:
FPATHS

{'Data': {'ml': {'train': 'Data/training_data_ml.joblib',
   'test': 'Data/test_data_ml.joblib',
   'target_lookup': 'Data/target_lookup.joblib',
   'label_encoder': 'Data/label_encoder.joblib'},
  'tf': {'train_tf': 'Data/train_ds.joblib',
   'test_tf': 'Data/test_ds.joblib'}},
 'Models': {'clf': 'Models/clf.joblib', 'gru': 'Models/gru.joblib'}}

# Part 2: Streamlit App

- Select either your best machine learning model or deep nlp model.

- Create a streamlit app for getting predictions for a user-entered text from your loaded model

- Include the ability to load the training and test data to evaluate the model.