# TAD_WEEK_11_Broker_Carl

### 1) Describe how each of the following metrics can be used to assess a Naive Bayes or other types of classifiers:

(Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives.)

A. Accuracy

<img src="../images/accuracy.png">

    source: https://developers.google.com/machine-learning/crash-course/classification/accuracy

    The Accuracy metric is useful because it gives us a way to objectivly measure the overall performance of our model.

   B. Precision

<img src="../images/precision.png">

    source: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

    The Precision metric is also useful because it gives us a way to measure the proportion of positive identifications that were actually correct. 

C. Recall

<img src="../images/recall.png">

    source: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

    The Recall metric measures the proportion of actual positives were that identified correctly.

### 2) Give an example for each of the three metrics above where you believe one would be better suited than the other three.

A. In video game reviews, it's hard to determine if someone is trolling/memeing. This (fictitious) model claims to be able to detect trolling/memeing with an accuracy of 4%

B. If I raise my trolling/memeing classifier's threshold, how do I determine if adjusting such is useful? 

C. If I raise my trolling/memeing classifier's threshold, and I want to monitor the number of false positives. 

### 3) Calculate accuracy, precision and recall for your Week 7 or 9 (Naive Bayes or SVM) homework.

In [1]:
### import numpy as np
import pandas as pd

from sklearn.metrics import confusion_matrix
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

df_game_reviews = pd.read_csv(r'../data/processed/game_reviews_processed.csv', low_memory=False)

# clean
# subset data that's hand-coded
def prune_cols(df):
    # keep only needed cols and subset english only
    global df_game_reviews
    df_game_reviews = df_game_reviews.loc[df_game_reviews['Sentiment'].notnull(), ['recommendationid', 'Sentiment', 'Emotion', 'review']] # subsets for english
    df_game_reviews = df_game_reviews.loc[df_game_reviews['review'].notnull(), ['recommendationid', 'Sentiment', 'Emotion', 'review']] # subsets for NaN reviews   
    return df_game_reviews

prune_cols(df_game_reviews)

# clean
# one hot encode emotions into new cols

def code_cols(df):
    df['emotion_code'] = None
    df['emotion_joy'] = None
    df['emotion_fear'] = None
    df['emotion_anger'] = None
    df['emotion_sadness'] = None
    
    for index, row in df.iterrows():
        if  df.loc[index,'Emotion'] == 'Joy':
            df.loc[index,'emotion_code'] = 1
        elif df.loc[index,'Emotion'] == 'Anger':
            df.loc[index,'emotion_code'] = 2
        elif df.loc[index,'Emotion'] == 'Fear':
            df.loc[index,'emotion_code'] = 3
        else:
            df.loc[index,'emotion_code'] = 4
        
    
    for index, row in df.iterrows():
        if  df.loc[index,'Emotion'] == 'Joy':
            df.loc[index,'emotion_joy'] = 1
            df.loc[index,'emotion_fear'] = 0
            df.loc[index,'emotion_anger'] = 0
            df.loc[index,'emotion_sadness'] = 0
        elif df.loc[index,'Emotion'] == 'Anger':
            df.loc[index,'emotion_joy'] = 0
            df.loc[index,'emotion_fear'] = 0
            df.loc[index,'emotion_anger'] = 1
            df.loc[index,'emotion_sadness'] = 0
        elif df.loc[index,'Emotion'] == 'Fear':
            df.loc[index,'emotion_joy'] = 0
            df.loc[index,'emotion_fear'] = 1
            df.loc[index,'emotion_anger'] = 0
            df.loc[index,'emotion_sadness'] = 0
        else:
            df.loc[index,'emotion_joy'] = 0
            df.loc[index,'emotion_fear'] = 0
            df.loc[index,'emotion_anger'] = 0
            df.loc[index,'emotion_sadness'] = 1
    
    #set dtype()
    df['emotion_code'] = df['emotion_code'].astype('int64')    
    df['emotion_joy'] = df['emotion_joy'].astype('int64')
    df['emotion_fear'] = df['emotion_fear'].astype('int64')
    df['emotion_anger'] = df['emotion_anger'].astype('int64')
    df['emotion_sadness'] = df['emotion_sadness'].astype('int64')
        
    return df

code_cols(df_game_reviews)

#vectorize
vectorizer = CountVectorizer(stop_words='english')

#build features
all_features = vectorizer.fit_transform(df_game_reviews['review'])


#split up dataset
X_train, X_test, y_train, y_test = train_test_split(all_features, df_game_reviews.emotion_code, 
                                                    test_size=0.3, random_state=42)

#call model 
classifier = MultinomialNB()

#fit model
classifier.fit(X_train, y_train)

MultinomialNB()

### Accuracy, Precision, Recall

In [2]:
#calculate TP, FP, TN, FN
y_true = []
y_pred = classifier.predict(X_test)
y_hat = []

for item in y_test:
    y_true.append(item)
    
for item in y_pred:
    y_hat.append(item)
    
    
def perf_measure(y_true, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_true[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_true[i]!=y_hat[i]:
           FP += 1
        if y_true[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_true[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)

perf_measure(y_true, y_hat)

(48, 34, 0, 0)

In [3]:
# return count of correctly perdicted docs
nr_correct = (y_test == classifier.predict(X_test)).sum()
print(f'{nr_correct} documents classified correctly')

# retrun incorrectly perdicted docs
nr_incorrect = y_test.size - nr_correct
print(f'{nr_incorrect} documents classified incorrectly')

# return accuracy
acc = nr_correct / (y_test.size)
print(f'The (testing) Accuracy of the model is {acc:.2%}')

# return precision
pre = 48/(48+34)
print(f'The (testing) Precision of the model is {pre:.2%}')

# return Recall
rec = 48/(48+0)
print(f'The (testing) Recall of the model is {rec:.2%}')

51 documents classified correctly
37 documents classified incorrectly
The (testing) Accuracy of the model is 57.95%
The (testing) Precision of the model is 58.54%
The (testing) Recall of the model is 100.00%


### 4) Vary input data in at least three ways

(Eg: Drop/don't drop stop words, keep only common words, label more data) and compute accuracy, precision and recall each time. 

I edited the random seed (42, 15, 87)

In [4]:
#ver 2:

#split data w/different random seed
X_train, X_test, y_train, y_test = train_test_split(all_features, df_game_reviews.emotion_code, 
                                                    test_size=0.3, random_state=15)

#fit new model
classifier.fit(X_train, y_train)

# calculate TP, FP, TN, FN
y_true = []
y_pred = classifier.predict(X_test)
y_hat = []

for item in y_test:
    y_true.append(item)
    
for item in y_pred:
    y_hat.append(item)
perf_measure(y_true, y_hat)

(53, 25, 0, 0)

In [5]:
# return count of correctly perdicted docs
nr_correct = (y_test == classifier.predict(X_test)).sum()
print(f'{nr_correct} documents classified correctly')

# retrun incorrectly perdicted docs
nr_incorrect = y_test.size - nr_correct
print(f'{nr_incorrect} documents classified incorrectly')

# return accuracy
acc = nr_correct / (y_test.size)
print(f'The (testing) Accuracy of the model is {acc:.2%}')

# return precision
pre = 53/(53+25)
print(f'The (testing) Precision of the model is {pre:.2%}')

# return Recall
rec = 53/(53+0)
print(f'The (testing) Recall of the model is {rec:.2%}')

58 documents classified correctly
30 documents classified incorrectly
The (testing) Accuracy of the model is 65.91%
The (testing) Precision of the model is 67.95%
The (testing) Recall of the model is 100.00%


In [6]:
#ver 3:

#split data w/different random seed
X_train, X_test, y_train, y_test = train_test_split(all_features, df_game_reviews.emotion_code, 
                                                    test_size=0.3, random_state=87)

#fit new model
classifier.fit(X_train, y_train)

# calculate TP, FP, TN, FN
y_true = []
y_pred = classifier.predict(X_test)
y_hat = []

for item in y_test:
    y_true.append(item)
    
for item in y_pred:
    y_hat.append(item)
perf_measure(y_true, y_hat)

(50, 28, 0, 0)

In [7]:
# return count of correctly perdicted docs
nr_correct = (y_test == classifier.predict(X_test)).sum()
print(f'{nr_correct} documents classified correctly')

# retrun incorrectly perdicted docs
nr_incorrect = y_test.size - nr_correct
print(f'{nr_incorrect} documents classified incorrectly')

# return accuracy
acc = nr_correct / (y_test.size)
print(f'The (testing) Accuracy of the model is {acc:.2%}')

# return precision
pre = 53/(53+25)
print(f'The (testing) Precision of the model is {pre:.2%}')

# return Recall
rec = 53/(53+0)
print(f'The (testing) Recall of the model is {rec:.2%}')

57 documents classified correctly
31 documents classified incorrectly
The (testing) Accuracy of the model is 64.77%
The (testing) Precision of the model is 67.95%
The (testing) Recall of the model is 100.00%


What worked the best? 

ver2

In [8]:
Why do you think it worked better?

Object `better` not found.


In [None]:
Why do you think it worked better

Test and train frequency distributions of classifications are more-even than the other random seeds.

### 5) Super Bonus Question: Create and explain a confusion matrix for one of your model results.

In [9]:
confusion_matrix(y_true, y_hat)

array([[50,  0,  0,  1],
       [15,  4,  0,  1],
       [ 2,  1,  1,  0],
       [11,  0,  0,  2]], dtype=int64)

For binary classification, sure, but for mine I couldn't wrap my head around interpreting its confusion matrix. 