## How to aggregate results across different systems

We are going to load classifiers that we save before, use them to make predictions on some texts and aggregate the results in a pandas data frame to get an overview across systems

In [2]:
import pickle
import pandas as pd
import lab5_util as util

In [4]:
# some utterances
some_chat = ['That is sweet of you', 
               'You are so funny', 
               'Are you a man or a woman?', 
               'Chatbots make me sad and feel lonely.', 
               'Your are stupid and boring.', 
               'Two thumbs up', 
               'I fell asleep halfway through this conversation', 
               'Wow, I am really amazed.', 
               'You are amazing.',
             'I feel so low being in isolation',
             'People dumping waste are horrible',
             'Its awful that you cannot stop smoking',
             'Dogs scare me',
             'I am afraid I will get sick at work',
             'I run away when I see a dog',
             'When do you start your job?'
            ]

some_chat_gold_labels = ['joy', 'joy', 'neutral', 'sadness', 'anger', 'joy', 'anger', 'surprise', 'joy', 'sadness', 'disgust', 'disgust', 'fear', 'fear', 'fear', 'neutral']

### Loading a BoW model

In [6]:
# the countvectorizer
filename_vectorizer = '../lab3.machine_learning/models/utterance_vec.sav'
# the tfidf transformer
filename_transformer = '../lab3.machine_learning/models/utterance_transf.sav'
# the classifier
filename_classifier = '../lab3.machine_learning/models/svm_linear_clf_bow.sav'

loaded_bow_classifier = pickle.load(open(filename_classifier, 'rb'))
loaded_vectorizer = pickle.load(open(filename_vectorizer, 'rb'))
loaded_transformer = pickle.load(open(filename_transformer, 'rb'))

### Converting the test data to get the vector representations

In [7]:
counts_from_loaded_model = loaded_vectorizer.transform(some_chat)
some_chat_tfidf = loaded_transformer.transform(counts_from_loaded_model)

### Getting the predications

In [9]:
pred_from_bow_classifier = loaded_bow_classifier.predict(some_chat_tfidf)
for chat, gold, pred in zip(some_chat, some_chat_gold_labels, pred_from_bow_classifier):
    print(chat, gold, pred)


That is sweet of you joy joy
You are so funny joy joy
Are you a man or a woman? neutral neutral
Chatbots make me sad and feel lonely. sadness sadness
Your are stupid and boring. anger anger
Two thumbs up joy neutral
I fell asleep halfway through this conversation anger neutral
Wow, I am really amazed. surprise surprise
You are amazing. joy joy
I feel so low being in isolation sadness anger
People dumping waste are horrible disgust neutral
Its awful that you cannot stop smoking disgust neutral
Dogs scare me fear anger
I am afraid I will get sick at work fear neutral
I run away when I see a dog fear neutral
When do you start your job? neutral neutral


## Embedding model

### Loading an embedding model

In [11]:
# the classifier
filename_classifier = '../lab3.machine_learning/models/svm_linear_clf_embeddings.sav'
# the frequent keywords
filename_freq_keywords = '../lab3.machine_learning/models/frequent_keywords.sav'
loaded_embedding_classifier = pickle.load(open(filename_classifier, 'rb'))
loaded_frequent_keywords = pickle.load(open(filename_freq_keywords, 'rb'))

### Representing the test data

In [12]:
from gensim.models.word2vec import Word2Vec
import gensim.downloader as api

wordembeddings = "glove-twitter-25"
### this model has 25 dimensions so we set the number of features to 25
num_features = 25

word_embedding_model = api.load(wordembeddings)
print(num_features)

25


In [13]:
def tokenize_data(text):
    ### the first loop gets the utterances
    text_tokens = []
    for utterance in text:
        text_tokens.append(nltk.tokenize.word_tokenize(utterance))
    return text_tokens

In [15]:
import nltk
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
index2word_set = set(word_embedding_model.index_to_key)


some_chat_tokens = tokenize_data(some_chat)
some_chat_embedding_vectors = util.getAvgFeatureVecs(some_chat_tokens, loaded_frequent_keywords,stop_words, word_embedding_model, index2word_set, num_features)  

Shape of our matrix is: (16, 25)
Review 0 of 16


### Making the predictions

In [17]:
pred_from_embedding_classifier = loaded_embedding_classifier.predict(some_chat_embedding_vectors[0])

for chat, gold, pred in zip(some_chat, some_chat_gold_labels, pred_from_embedding_classifier):
    print(chat, gold, pred)


That is sweet of you joy joy
You are so funny joy surprise
Are you a man or a woman? neutral neutral
Chatbots make me sad and feel lonely. sadness neutral
Your are stupid and boring. anger neutral
Two thumbs up joy neutral
I fell asleep halfway through this conversation anger neutral
Wow, I am really amazed. surprise neutral
You are amazing. joy joy
I feel so low being in isolation sadness neutral
People dumping waste are horrible disgust neutral
Its awful that you cannot stop smoking disgust neutral
Dogs scare me fear neutral
I am afraid I will get sick at work fear neutral
I run away when I see a dog fear neutral
When do you start your job? neutral neutral


## Putting the results together in a pandas frame

In [18]:
result_frame = pd.DataFrame()

# We add to this Pandas frame three more columns for the Chat, the Prediction and the Gold 
result_frame['Chat']=some_chat
result_frame['Gold']=some_chat_gold_labels

result_frame['Bow Prediction']=pred_from_bow_classifier
result_frame['Embedding Prediction']=pred_from_embedding_classifier

result_frame

Unnamed: 0,Chat,Gold,Bow Prediction,Embedding Prediction
0,That is sweet of you,joy,joy,joy
1,You are so funny,joy,joy,surprise
2,Are you a man or a woman?,neutral,neutral,neutral
3,Chatbots make me sad and feel lonely.,sadness,sadness,neutral
4,Your are stupid and boring.,anger,anger,neutral
5,Two thumbs up,joy,neutral,neutral
6,I fell asleep halfway through this conversation,anger,neutral,neutral
7,"Wow, I am really amazed.",surprise,surprise,neutral
8,You are amazing.,joy,joy,joy
9,I feel so low being in isolation,sadness,anger,neutral


## End of notebook