# Neural Network with Bag of Words

In [1]:
from src.nn import main

## Quantitative Analysis

In [2]:
texts, classes, preds = main(n_epochs=100)

	[ 1] Train Loss: 0.603 | Train Prec: 86.66% | Train Rec: 75.94% | Train Fscore: 80.54%
	[ 1]   Val Loss: 0.541 |   Val Prec: 83.92% |   Val Rec: 79.67% |   Val Fscore: 81.57%
	[ 2] Train Loss: 0.486 | Train Prec: 86.46% | Train Rec: 83.93% | Train Fscore: 85.00%
	[ 2]   Val Loss: 0.478 |   Val Prec: 83.92% |   Val Rec: 81.71% |   Val Fscore: 82.64%
	[ 3] Train Loss: 0.426 | Train Prec: 87.02% | Train Rec: 86.10% | Train Fscore: 86.36%
	[ 3]   Val Loss: 0.444 |   Val Prec: 83.32% |   Val Rec: 82.96% |   Val Fscore: 82.94%
	[ 4] Train Loss: 0.390 | Train Prec: 87.63% | Train Rec: 87.01% | Train Fscore: 87.19%
	[ 4]   Val Loss: 0.424 |   Val Prec: 83.87% |   Val Rec: 82.93% |   Val Fscore: 83.24%
	[ 5] Train Loss: 0.364 | Train Prec: 88.04% | Train Rec: 87.97% | Train Fscore: 87.80%
	[ 5]   Val Loss: 0.410 |   Val Prec: 84.01% |   Val Rec: 82.98% |   Val Fscore: 83.33%
	[ 6] Train Loss: 0.344 | Train Prec: 88.85% | Train Rec: 88.36% | Train Fscore: 88.43%
	[ 6]   Val Loss: 0.400 |   Val 

The training loop is performed using up to 100 training epochs, with early stopping implemented to stop training once the loss on the validation increases since the last epoch to prevent overtraining. The quantitative performance metrics are shown per each of the epochs, evaluating both the training and validation data sets. Finally, the metrics are calculated on the test set. 

Inspecting the training loss, we observe that it keeps decreasing with each epoch up until the 17th epoch, as the 18th epoch produces the same loss on the validation set which is when early stopping is evoked.

With that, we obtain a precision of 84.44% and a recall of 82.88% on the test set.

## Qualitative Analysis

In [3]:
missclassified_idxs = [idx for idx, (p, c) in enumerate(zip(classes, preds)) if not c == p]

In [4]:
false_positives = []
false_negatives = []

In [5]:
for idx in missclassified_idxs:
    if classes[idx] == 0:
        false_positives.append(texts[idx])
    else:
        false_negatives.append(texts[idx])

In [6]:
len(false_positives)

362

In [7]:
len(false_negatives)

398

In [8]:
false_negatives

['fist-pumping jared kushner leaves jerusalem embassy refreshed and ready to solve next global crisis',
 'australian parliament gathers to discuss dwindling hemsworth reserves',
 'supreme court upholds bill of rights in 5-4 decision',
 "biden calls dibs on qaddafi's clothes",
 "leno's voicemail message pauses for laughter",
 'item individually wrapped for no reason',
 'fat kid just wants to watch you guys play',
 "mayor daley's son appointed head of illinois nepotist party",
 'underwear worn out of respect for the dead',
 'rookie justice gorsuch assigned to supreme court overnight shift',
 'supporters praise trump for upholding traditional american value of supporting murderous dictators for political gain',
 'giuliani puts odds of trump-mueller interview at 50-65',
 'philip morris lawyers deny cigarettes are cylindrical',
 'rwandan refugees angered over lack of aol access',
 'israel builds new settlement to host palestinian peace talks',
 'woman angered when veiled anger expressed as 

In [9]:
false_positives

['20 struggles every tall girl knows to be true',
 'new york times editorial board endorses john kasich for gop nomination',
 'morocco cracks down on journalists',
 'venezuela hunts for rogue helicopter attackers',
 'interview with louise munson, playwright of luigi',
 'early apple computer sells for almost $1 million at auction',
 "massive filament snakes across sun's surface",
 'fcc votes to undo key roadblocks to media company consolidation',
 'long-shot push to force senate to confirm merrick garland fails in federal court',
 'determined cat goes through a lot to wrestle with stuffed tiger',
 'rebel grandma sneaks out of care home to get a tattoo',
 'damon albarn gets carried off stage in denmark after 5-hour set',
 'man confronts n.j. officer searching van apparently without permission',
 'woman allegedly blows up pee sample in a 7-eleven microwave',
 'kansas city-area waiter gets world series ticket as a tip',
 'attention: sports fans!',
 'leonard cohen at 80',
 'a reset button',

We collect the misclassified texts and inspect each type of misclassification separately. 

First, looking at the false negatives, so the headlines that contain sarcasm but are not classified as containing sarcasm, we note that these headlines don't particularly contain any words that would be indicative of sarcasm but in the context of the sentence as well as the social context they can be understood as sarcastic, however using a simple bag of words representation does not suffice to capture such phenomenons. 

On the other hand, the headlines marked as sarcastic but containing no sarcasm often are relatively hard to distinguish, because since we are using only a bag of words representation, the words that occur here might be more characteristic of sarcastic headlines like swear words or superlatives.  