# Homework 2

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"

This homework covers material from the sentiment classification unit. The primary value of doing the work is that it provides more hands-on experience with the dataset and the models we explored. All the code you write has potential value in the bake-off for this unit as well.

Submission URL: https://canvas.stanford.edu/courses/83399/quizzes/50657

## Questions 1–4: Reproducing a Socher et al's NaiveBayes baseline [4 points]

[Socher et al. 2013](http://www.aclweb.org/anthology/D/D13/D13-1170.pdf) compare against a Naive Bayes baseline with bigram features. See how close you can come to reproducing the performance of this model on the binary, root-only problem (values in the rightmost column of their Table 1, rows 1 and 3).

Specific tasks:

1. Write a bigrams feature function, on the model of `unigrams_phi`. Call this `bigrams_phi`. In writing this function, ensure that each example is padded with start and end symbols (say, `<S>` and `</S>`), so that these contexts are properly reflected in the feature space.

1. Write a function `fit_nb_classifier_with_crossvalidation` that serves as a wrapper for `sklearn.naive_bayes.MultinomialNB` and searches over these values for the smoothing parameter `alpha`: `[0.1, 0.5, 1.0, 2.0]`, using 3-fold cross-validation.

1. Use `sst.experiment` to run the experiments, assessing against `dev_reader`.

__To submit:__

1. Your `bigrams_phi`
1. Your `fit_nb_classifier`
1. Your call to `sst.experiment` 
1. The average F1 score that `sst.experiment` reported.

__A note on performance__: in our experience, the bigrams Naive Bayes model gets around 0.75. It's fine to submit answers with comparable numbers; the Socher et al. baselines are very strong. We're not evaluating how good your model is; we want to see your code, and we're interested to see what the range of F1 scores is across the whole class.

In [3]:
from collections import Counter
from sklearn.linear_model import LogisticRegression
import scipy.stats
from sgd_classifier import BasicSGDClassifier
from tf_shallow_neural_classifier import TfShallowNeuralClassifier
import sst
import utils
import sklearn.naive_bayes
import tf_shallow_neural_classifier
import os
import numpy as np

import pandas as pd
import random
from rnn_classifier import RNNClassifier
from sklearn.metrics import classification_report
import tensorflow as tf
from tf_rnn_classifier import TfRNNClassifier
from tree_nn import TreeNN
import vsm

  from ._conv import register_converters as _register_converters


In [50]:
def bigrams_phi(tree):
    """The basis for a bigrams feature function.
    
    Parameters
    ----------
    tree : nltk.tree
        The tree to represent.
    
    Returns
    -------    
    defaultdict
        A map from strings to their counts in `tree`. (Counter maps a 
        list to a dict of counts of the elements in that list.)
    
    """
    treeLeaves = tree.leaves()
    treeLeaves.insert(0, '<S>')
    treeLeaves.append('<\S>')
    bigrams = Counter([(first, second) for first, \
                      second in zip(treeLeaves, treeLeaves[1:])])
    return bigrams

In [51]:
train_dataset = sst.build_dataset(
    reader=sst.train_reader,
    phi=bigrams_phi,
    class_func=sst.binary_class_func,
    vectorizer=None)

print("Train dataset with bigram features has {:,} examples and {:,} features".format(
        *train_dataset['X'].shape))

Train dataset with bigram features has 6,920 examples and 75,406 features


In [52]:
dev_dataset = sst.build_dataset(
    reader=sst.dev_reader,
    phi=bigrams_phi,
    class_func=sst.binary_class_func,
    vectorizer=train_dataset['vectorizer'])

print("Dev dataset with bigram features has {:,} examples and {:,} features".format(
        *dev_dataset['X'].shape))

Dev dataset with bigram features has 872 examples and 75,406 features


In [53]:
import sklearn.naive_bayes

def fit_nb_classifier_with_crossvalidation(X, y):
    basemod = sklearn.naive_bayes.MultinomialNB()
    cv = 3 #3-fold cross validation
    param_grid = {'alpha': [0.1, 0.5, 1.0, 2.0]}
    return sst.fit_classifier_with_crossvalidation(X, y, basemod, cv, param_grid)

In [54]:
_ = sst.experiment(
    bigrams_phi,
    fit_nb_classifier_with_crossvalidation, 
    assess_reader=sst.dev_reader, 
    class_func=sst.binary_class_func)

Best params {'alpha': 0.5}
Best score: 0.725
Accuracy: 0.748
             precision    recall  f1-score   support

   negative      0.761     0.708     0.734       428
   positive      0.736     0.786     0.760       444

avg / total      0.749     0.748     0.747       872



## Question 5–6: A more powerful vector-summing baseline [4 points]

In the section [Distributed representations as features](sst_03_neural_networks.ipynb#Distributed-representations-as-features), we looked at a baseline for the binary SST problem in which each example is modeled as the sum of its 50-dimensional GloVe representations. A `LogisticRegression` model was used for prediction. A neural network might do better here, since there might be complex relationships between the input feature dimensions that a linear classifier can't learn. 

To address this question, rerun the experiment with `tf_shallow_neural_classifier.TfShallowNeuralClassifier` as the classifier. Specs:
* Use `sst.experiment` to conduct the experiment. 
* Using 3-fold cross-validation, exhaustively explore this set of hyperparameter combinations:
  * The hidden dimensionality at 50, 100, and 200.
  * The hidden activation function as `tf.nn.tanh` or `tf.nn.relu`.
* (For all other parameters to `TfShallowNeuralClassifier`, use the defaults.)

__To submit:__

* Your average F1 score according to `sst.experiment`. 
* The optimal hyperparameters chosen in your experiment. (You can just paste in the dict that `sst._experiment` prints.)

No need to include your supporting code. 

We're not evaluating the quality of your model. (We've specified the protocols completely, but there will still be a  lot of variation in the results.) However, the primary goal of this question is to get you thinking more about this strikingly good baseline feature representation scheme for SST, so we're sort of hoping you feel compelled to try out variations on your own.

In [46]:
vsmdata_home = 'vsmdata'

glove_home = vsmdata_home

glove_lookup = utils.glove2dict(
    os.path.join(glove_home, 'glove.6B.50d.txt'))

def vsm_leaves_phi(tree, lookup, np_func=np.sum):
    """Represent tree as a combination of the vector of its words.
    
    Parameters
    ----------
    tree : nltk.Tree   
    lookup : dict
        From words to vectors.
    np_func : function (default: np.sum)
        A numpy matrix operation that can be applied columnwise, 
        like `np.mean`, `np.sum`, or `np.prod`. The requirement is that 
        the function take `axis=0` as one of its arguments (to ensure
        columnwise combination) and that it return a vector of a 
        fixed length, no matter what the size of the tree is.
    
    Returns
    -------
    np.array, dimension `X.shape[1]`
            
    """
    dim = len(next(iter(lookup.values())))    
    allvecs = np.array([lookup[w] for w in tree.leaves() if w in lookup])
    if len(allvecs) == 0:
        feats = np.zeros(dim)
    else:       
        feats = np_func(allvecs, axis=0)      
    return feats

def glove_leaves_phi(tree, np_func=np.sum):
    return vsm_leaves_phi(tree, glove_lookup, np_func=np_func)

In [47]:
def fit_nn_classifier_with_crossvalidation(X, y):
    basemod = tf_shallow_neural_classifier.TfShallowNeuralClassifier()
    cv = 3 #3-fold cross validation
    param_grid = {'hidden_dim': [50, 100, 200], 
                  'hidden_activation' : [tf.nn.tanh, tf.nn.relu]} 
    return sst.fit_classifier_with_crossvalidation(X, y, basemod, cv, param_grid)


In [48]:


_ = sst.experiment(
    glove_leaves_phi,
    fit_nn_classifier_with_crossvalidation,
    class_func=sst.binary_class_func,
    vectorize=False)  # Tell `experiment` that we already have our feature vectors.

Iteration 100: loss: 2.8306755423545837

Best params {'hidden_activation': <function relu at 0x10a659268>, 'hidden_dim': 200}
Best score: 0.690
Accuracy: 0.724
             precision    recall  f1-score   support

   negative      0.697     0.736     0.716       979
   positive      0.752     0.714     0.732      1097

avg / total      0.726     0.724     0.725      2076



## Questions 7–8: Bidirectional RNN [2 points]

The auxiliary notebook `tensorflow_models.ipynb` [subclasses TfRNNClassifier with a bidirectional RNN](tensorflow_models.ipynb#A-bidirectional-RNN-Classifier). In this model, the RNN is run in both directions and the concatenation of the two final states is used as the basis for the classification decision. Evaluate this model against the SST dev set. You can set up the model however you wish for this.

__To submit:__

* Your call to `TfBidirectionalRNNClassifier` (so that we can see the hyperparmeters you chose).
* Your average F1 score according to a `classification_report` on the `dev` set.

As above, we will not evaluate you based on how good your F1 score is. You just need to submit one. __There is even value in seeing what really doesn't work__, so low scores have interest!

In [4]:
class TfBidirectionalRNNClassifier(TfRNNClassifier):
    
    def build_graph(self):
        self._define_embedding()

        self.inputs = tf.placeholder(
            tf.int32, [None, self.max_length])

        self.ex_lengths = tf.placeholder(tf.int32, [None])

        # Outputs as usual:
        self.outputs = tf.placeholder(
            tf.float32, shape=[None, self.output_dim])

        # This converts the inputs to a list of lists of dense vector
        # representations:
        self.feats = tf.nn.embedding_lookup(
            self.embedding, self.inputs)

        # Same cell structure as the base class, but we have
        # forward and backward versions:
        self.cell_fw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)
        
        self.cell_bw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)

        # Run the RNN:
        outputs, finals = tf.nn.bidirectional_dynamic_rnn(
            self.cell_fw,
            self.cell_bw,
            self.feats,
            dtype=tf.float32,
            sequence_length=self.ex_lengths)
      
        # finals is a pair of `LSTMStateTuple` objects, which are themselves
        # pairs of Tensors (x, y), where y is the output state, according to
        # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMStateTuple
        # Thus, we want the second member of these pairs:
        last_fw, last_bw = finals          
        last_fw, last_bw = last_fw[1], last_bw[1]
        
        last = tf.concat((last_fw, last_bw), axis=1)
        
        self.feat_dim = self.hidden_dim * 2               

        # Softmax classifier on the final hidden state:
        self.W_hy = self.weight_init(
            self.feat_dim, self.output_dim, 'W_hy')
        self.b_y = self.bias_init(self.output_dim, 'b_y')
        self.model = tf.matmul(last, self.W_hy) + self.b_y    

        

In [45]:
def setup() :
    X_rnn_train, y_rnn_train = sst.build_binary_rnn_dataset(sst.train_reader)
    utils.sequence_length_report(X_rnn_train)

    X_rnn_dev, y_rnn_dev = sst.build_binary_rnn_dataset(sst.dev_reader)
    sst_train_vocab = sst.get_vocab(X_rnn_train, n_words=3000)

    tf_rnn = TfBidirectionalRNNClassifier(
    sst_train_vocab,
    embed_dim=100,
    hidden_dim=100,
    max_length=52,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=500,
    eta=0.1) 
    
    _ = tf_rnn.fit(X_rnn_train, y_rnn_train)

    tf_rnn_dev_predictions = tf_rnn.predict(X_rnn_dev)
    print(classification_report(y_rnn_dev, tf_rnn_dev_predictions))

setup()

Max sequence length: 52
Min sequence length: 2
Mean sequence length: 19.30
Median sequence length: 19.00
Sequences longer than 50: 6 of 6,920


Iteration 500: loss: 0.9666926264762878

             precision    recall  f1-score   support

   negative       0.72      0.65      0.68       428
   positive       0.69      0.76      0.72       444

avg / total       0.71      0.70      0.70       872



In [41]:
def fit_Bidirectional_RNN_classifier_with_crossvalidation(X, y):
    vocab = sst.get_vocab(X)
    
    
    #basemod = TfBidirectionalRNNClassifier(hidden_dim=50, hidden_activation=tf.nn.tanh,
    #        batch_size=1000, max_iter=500, eta=0.005, tol=5e-5, display_progress=1)
    print("Vocab is", vocab)
    
    basemod = TfBidirectionalRNNClassifier(sst_train_vocab,
    embed_dim=50,
    hidden_dim=50,
    max_length=52,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=500,
    eta=0.05) 
    
    '''TfRNNClassifier(
            vocab=vocab,
            #embedding=None,
            embed_dim=50,
            max_length=20,
            train_embedding=True,
            cell_class=tf.nn.rnn_cell.LSTMCell,
            eta = 0.005,
            max_iter=500))
            '''
    cv = 3 #3-fold cross validation
    #param_grid = {'hidden_activation' : [tf.nn.tanh, tf.nn.relu]} 
    param_grid = {'embed_dim' : [20, 30]}
    return sst.fit_classifier_with_crossvalidation(X, y, basemod, cv, param_grid)

#sst.get_vocab(X_train) is your vocab
#Check out tf_model_base
#Constructor arguments are in tf_model_base.py
#tf_rnn_classifier.py