# Test Plan
For every algorithm developed it is important to design some test cases that the algorithm should pass. For an algorithm that generates summaries that can be slightly difficult. For systems that take input from one known language to another there are established techniques that are avalable but for our project we need to come up with something different. The solution we came up with for his is to generate test cases "on the fly" and prompt a user to reverse the given summary for comparison. We do this by generating (or reading in) some sentences, summarizing these sentences, and then comparing the human guess as to what the emojis mean to the input sentence. The general flow for this process is as follows

   1. Generate (or read) sentences
   2. Summarize each of the sentences
   3. Take the top 20 sentences, sorted by the certainty score
   4. For each machine translated sentence:
       1. Provide the user with the emojis
       2. Provide the user with an approximate sentence length
       3. Prompt the user to tranlate the emojis into a sentence
   5. For each machine translated sentence-user translated sentence pair:
       1. Calculate the distance between the two sentences using sent2vec (might need another metric)

After we have the list of numerical scores for the translations we can do some analysis on how the algorithm actually performs.

### Sentence Generation
The sentences are gathered from the [Stanford NLP research group's NMT dataset](https://nlp.stanford.edu/projects/nmt/data/iwslt15.en-vi/tst2012.en). All of these sentences will be loaded into memory, filtered based on length, and cleaned. 

In [67]:
# Load the sentences
file_path = "data/tst2012.en"
testing_sentences = []
with open(file_path, "r") as sents:
    testing_sentences = [sent for sent in sents]

38 sentences in dataset


In [None]:
# Filter the sentences based on an upper and lower bound for the sentence length
from nltk import word_tokenize
word_limit_lower = 5
word_limit_upper = 5
testing_sentences = list(filter(lambda sent: len(word_tokenize(sent)) <= word_limit_upper and 
                                             len(word_tokenize(sent)) >= word_limit_lower, testing_sentences))

In [None]:
# Some of the sentences have "&apos;" instead of "'" but our algorithm doesn't handle that so replace with
# regular "'"
testing_sentences = [testing_sentence.replace("&apos;", "'") for testing_sentence in testing_sentences]

In [None]:
# Query how many sentences are in the current dataset
print(f"{len(testing_sentences)} sentences in dataset")

### Sentence Summarization
To do this we will just be using an exported Python V1 program that is just the NaiveEmojiTranslation notebook exported to .py. We summarize with the current best known params based on some limited observation. The sentence will be summarized using the best currently known parameters, and then the summaries scored based on the scoring function.

In [80]:
import warnings; warnings.simplefilter('ignore')               # cosine distance gives warnings when div by 0
from NaiveEmojiTranslation_V1 import summarize, lemmatizerNLTK # Exported NaiveEmojiTranslation to Python file as of October 24th

# Sort the sentences by their uncertainty scores. This is imported as a generic scoring
# function so that it can be swapped in and out easily
from NaiveEmojiTranslation_V1 import score_summarization_result_average as scoring_function

# JUST FOR TESTING ONLY USE TEN
testing_sentences = testing_sentences[:10]

# Summarize each testing sentence with the current best known parameters
summarized_sentences = []
for sentence in testing_sentences:
    summarized_sentences.append(summarize(sentence, keep_stop_words=True, 
                                  lemma_func=lemmatizerNLTK.lemmatize, scoring_func=scoring_function))
    
# Sort the list by the scoring function
summarized_sentences_sorted = list(sorted(summarized_sentences, key=scoring_function))

# Choose only the top 30 summaries
testing_summaries = summarized_sentences_sorted[:30]

# User Input

In [None]:
from NaiveEmojiTranslation_V1 import EmojiSummarizationResult
from dataclasses import dataclass

@dataclass
class UserSummarization:
    """
    Struct-esque data structure that stores the machines summarization and the user's guess in one object.
    This is just syntactic sugar for a python object with some default values and type checking.
    """
    machine_summarization: EmojiSummarizationResult
    user_guess: str = ""
    difference: float = -1

In [None]:
# Loop through all generated summaries
user_summaries = []
for summary in summarized_sentences_sorted:
    # Give the user the emoji summary and the input sentence length to shoot for in summary
    print(f"Emoji Sequence: {summary.emojis}")
    print("Input sentence Length: {}".format(len(word_tokenize(" ".join(summary.n_grams)))))
    
    # Prompt the user for their translation
    translation = input("What's your translation?")
    
    # Append a new UserSummarization object with the machines summary and the users translation to the list
    user_summaries.append(UserSummarization(summary, translation))

# Scoring
This is soon to change so not gonna mess with it too much

In [86]:
# Initialize the sent2vec model
import sent2vec
s2v = sent2vec.Sent2vecModel()
s2v.load_model('../models/wiki_unigrams.bin') # https://drive.google.com/open?id=0B6VhzidiLvjSa19uYWlLUEkzX3c

In [140]:
from scipy.spatial.distance import cosine # Distance between sentence and emoji in sent2vec vector space
import numpy as np

for user_summary in user_summaries:
    user_emb, mach_emb = s2v.embed_sentences([user_summary.user_guess, " ".join(user_summary.machine_summarization.n_grams)])
    user_summary.difference = cosine(user_emb, mach_emb)
    print("Emojis: {}\nUser guessed: {}\nSummary Input: {}\nDifference: {}".format(user_summary.machine_summarization.emojis, user_summary.user_guess, " ".join(user_summary.machine_summarization.n_grams), user_summary.difference))
    print()
    
print()
print("Average cosine difference ", sum([user_summary.difference for user_summary in user_summaries 
                                         if not np.isnan(user_summary.difference)]) / len(user_summaries))

Emojis: 🙏
User guessed: You're the best
Summary Input: thank you very much
Difference: 1.0038026724942029

Emojis: 🙏
User guessed: Thank you so much
Summary Input: thank you very much
Difference: 0.5063760280609131

Emojis: 🚥🌎
User guessed: stop and go see the world
Summary Input: go home to where
Difference: 0.7362043857574463

Emojis: 😶
User guessed: he was very quiet
Summary Input: he retreated into silence
Difference: 0.6807082295417786

Emojis: ♓
User guessed: the fish is good
Summary Input: there were still fish
Difference: 0.5206278562545776

Emojis: 🆘
User guessed: save our souls help
Summary Input: so what happens here
Difference: 0.8867711946368217

Emojis: 👤
User guessed: I
Summary Input: something stiffened inside me
Difference: nan

Emojis: 💯🛅
User guessed: a good trip
Summary Input: after only one trip
Difference: 0.46960216760635376

Emojis: 👪
User guessed: family
Summary Input: he is my grandfather
Difference: 0.7951625138521194

Emojis: 🚯🇧🇪
User guessed: dont be 
Summa