# Bake-off: Stanford Sentiment Treebank

In [2]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"

## Contents

0. [Overview](#Overview)
0. [Bake-off submission](#Bake-off-submission)
0. [Methodological note](#Methodological-note)
0. [Set-up](#Set-up)
0. [Baseline](#Baseline)
0. [TfRNNClassifier wrapper](#TfRNNClassifier-wrapper)
0. [TreeNN wrapper](#TreeNN-wrapper)

## Overview

The goal of this in-class bake-off is to __achieve the highest average F1 score__ on the SST development set, with the binary class function.

The only restriction: __you cannot make any use of the subtree labels__.

## Bake-off submission

1. A description of the model you created.
1. The value of `f1-score` in the `avg / total` row of the classification report.

Submission URL: https://docs.google.com/forms/d/1R41Zxxils7lOPzuThMdv2p1TKmFEy8c0DyUg-YkzTa0/edit

## Methodological note

You don't have to use the experimental framework defined below (based on `sst`). However, if you don't use `sst.experiment` as below, then make sure you're training only on `train`, evaluating on `dev`, and that you report with 

```
from sklearn.metrics import classification_report
classification_report(y_dev, predictions)
```
where `y_dev = [y for tree, y in sst.dev_reader(class_func=sst.binary_class_func)]`

## Set-up

See [the first notebook in this unit](sst_01_overview.ipynb#Set-up) for set-up instructions.

In [45]:
from collections import Counter
from rnn_classifier import RNNClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.decomposition import IncrementalPCA
import sst
import tensorflow as tf
from tf_rnn_classifier import TfRNNClassifier
from tree_nn import TreeNN
import numpy as np
from sklearn.metrics import classification_report


## Baseline

In [3]:
def unigrams_phi(tree):
    """The basis for a unigrams feature function.
    
    Parameters
    ----------
    tree : nltk.tree
        The tree to represent.
    
    Returns
    -------    
    defaultdict
        A map from strings to their counts in `tree`. (Counter maps a 
        list to a dict of counts of the elements in that list.)
    
    """
    return Counter(tree.leaves())

In [26]:
def fit_maxent_classifier(X, y):
    mod = LogisticRegression(fit_intercept=True)
    mod.fit(X, y)
    return mod

In [10]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

(6920, 16282)
Accuracy: 0.772
             precision    recall  f1-score   support

   negative      0.783     0.741     0.761       428
   positive      0.762     0.802     0.782       444

avg / total      0.772     0.772     0.772       872



By the way, with some informal hyperparameter search on a GPU machine, I found this model
```
tf_rnn_glove = TfRNNClassifier(
    sst_glove_vocab,
    embedding=glove_embedding, ## 100d version
    hidden_dim=300,
    max_length=52,
    hidden_activation=tf.nn.relu,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=5000,
    batch_size=1028,
    eta=0.001)
```
which finished with almost identical performance to the above:
    
```
             precision    recall  f1-score   support

   negative       0.78      0.75      0.76       428
   positive       0.77      0.80      0.78       444

avg / total       0.77      0.77      0.77       872
```

## TfRNNClassifier wrapper

In [11]:
def rnn_phi(tree):
    return tree.leaves()    

In [12]:
def fit_tf_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=500,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod

In [13]:
_ = sst.experiment(
    rnn_phi,
    fit_tf_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Iteration 500: loss: 2.5067093968391423

Accuracy: 0.654
             precision    recall  f1-score   support

   negative      0.683     0.549     0.609       428
   positive      0.634     0.755     0.689       444

avg / total      0.658     0.654     0.650       872



## TreeNN wrapper

In [14]:
def tree_phi(tree):
    return tree

In [25]:
def fit_tree_nn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TreeNN(
        vocab, 
        embed_dim=100, 
        max_iter=100)
    mod.fit(X, y)
    return mod

In [11]:
_ = sst.experiment(
    rnn_phi,
    fit_tree_nn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Finished epoch 100 of 100; error is 0.8351342778738807

Accuracy: 0.510
             precision    recall  f1-score   support

   negative      0.501     0.498     0.499       428
   positive      0.519     0.523     0.521       444

avg / total      0.510     0.510     0.510       872



# Uni + Bi-Gram Logistic Regression

In [121]:
X_train, y_train = sst.build_binary_rnn_dataset(sst.train_reader)
sst_train_vocab = sst.get_vocab(X_train, n_words=None)
vocab_dict = dict(zip(sst_train_vocab, range(len(sst_train_vocab))))
X_dev, y_dev = sst.build_binary_rnn_dataset(sst.dev_reader)

In [122]:
def build_matrix(lists_of_words, vocab_dict):
    X = np.zeros((len(lists_of_words), len(vocab_dict)))
    for idx, words in enumerate(lists_of_words):
        for w in words:
            if w in vocab_dict:
                X[idx, vocab_dict[w]] += 1
            else:
                X[idx, vocab_dict['$UNK']] += 1
    return X


X_train = build_matrix(X_train, vocab_dict)
X_dev = build_matrix(X_dev, vocab_dict)

In [125]:
%%time
# mod = LogisticRegression()
mod = GradientBoostingClassifier(learning_rate=1, max_depth=5, verbose=True)
mod.fit(X_train, y_train)
print(classification_report(y_dev, mod.predict(X_dev)))

      Iter       Train Loss   Remaining Time 
         1           1.3050           20.97m
         2           1.2673           20.57m
         3           1.2272           20.76m
         4           1.1978           21.25m
         5           1.1642           21.97m
         6           1.1418           21.96m
         7           1.1180           21.88m
         8           1.1003           22.48m
         9           1.0863           22.80m
        10           1.0689           22.82m
        20           0.9375           21.77m
        30           0.8405           18.18m
        40           0.7751           15.80m
        50           0.7076           13.12m
        60           0.6573           10.47m
        70           0.6156            7.83m
        80           0.5868            5.21m
        90           0.5557            2.60m
       100           0.5298            0.00s
             precision    recall  f1-score   support

   negative       0.74      0.61      0.67   

In [124]:
# mod = LogisticRegression()
mod = GradientBoostingClassifier(max_depth=5, verbose=True)
mod.fit(X_train, y_train)
print(classification_report(y_dev, mod.predict(X_dev)))

      Iter       Train Loss   Remaining Time 
         1           1.3702           19.28m
         2           1.3584           19.85m
         3           1.3484           19.94m
         4           1.3395           20.12m
         5           1.3306           20.23m
         6           1.3223           19.43m
         7           1.3157           19.19m
         8           1.3097           19.18m
         9           1.3024           18.73m
        10           1.2972           18.58m
        20           1.2516           18.65m
        30           1.2148           15.95m
        40           1.1857           13.17m
        50           1.1590           10.62m
        60           1.1384            8.34m
        70           1.1166            6.12m
        80           1.0964            4.07m
        90           1.0788            2.04m
       100           1.0636            0.00s
             precision    recall  f1-score   support

   negative       0.75      0.54      0.63   

In [103]:
# mod = LogisticRegression()
mod = GradientBoostingClassifier(verbose=True)
mod.fit(X_train, y_train)
print(classification_report(y_dev, mod.predict(X_dev)))

      Iter       Train Loss   Remaining Time 
         1           1.3545            8.25m
         2           1.3302            8.07m
         3           1.3097            7.64m
         4           1.2927            7.37m
         5           1.2785            7.37m
         6           1.2661            7.33m
         7           1.2554            7.21m
         8           1.2464            7.14m
         9           1.2383            7.12m
        10           1.2317            7.06m
        20           1.1895            6.68m
        30           1.1625            5.95m
        40           1.1421            5.13m
        50           1.1240            4.27m
        60           1.1076            3.37m
        70           1.0945            2.50m
        80           1.0812            1.67m
        90           1.0688           49.92s
       100           1.0588            0.00s
             precision    recall  f1-score   support

   negative       0.54      0.79      0.64   

In [23]:
%%time
def fit_gdbt_classifier(X, y):
    #dim reduction by taking
    X = X[:, np.array(X.sum(axis=0)).ravel().argsort()[-3000:][::-1]]
    mod = LogisticRegression()
#     mod = GradientBoostingClassifier()
    mod.fit(X, y)
    return mod

def uni_bi_gram_phi(tree):
    leaves = tree.leaves()
    d = Counter(['<S>' + val + ' ' + leaves[idx+1] + '</S>' for idx, val in enumerate(leaves) if idx != (len(leaves)-1)])
    d.update(Counter(leaves))
    return d

_ = sst.experiment(
    uni_bi_gram_phi,            
    fit_gdbt_classifier,
    train_reader=sst.train_reader,
    assess_reader=sst.dev_reader,
    class_func=sst.binary_class_func)

ValueError: X has 89960 features per sample; expecting 3000

In [106]:
def uni_bi_gram_phi(tree):
    leaves = tree.leaves()
    d = Counter(['<S>' + val + ' ' + leaves[idx+1] + '</S>' for idx, val in enumerate(leaves) if idx != (len(leaves)-1)])
    d.update(Counter(leaves))
    return d

_ = sst.experiment(
    uni_bi_gram_phi,            
    fit_maxent_classifier,
    train_reader=sst.train_reader,
    assess_reader=sst.dev_reader,
    class_func=sst.binary_class_func)

Accuracy: 0.775
             precision    recall  f1-score   support

   negative      0.786     0.745     0.765       428
   positive      0.766     0.804     0.785       444

avg / total      0.776     0.775     0.775       872



## Bi-Directional RNN

In [15]:
class TfBidirectionalRNNClassifier(TfRNNClassifier):
    
    def build_graph(self):
        self._define_embedding()

        self.inputs = tf.placeholder(
            tf.int32, [None, self.max_length])

        self.ex_lengths = tf.placeholder(tf.int32, [None])

        # Outputs as usual:
        self.outputs = tf.placeholder(
            tf.float32, shape=[None, self.output_dim])

        # This converts the inputs to a list of lists of dense vector
        # representations:
        self.feats = tf.nn.embedding_lookup(
            self.embedding, self.inputs)

        # Same cell structure as the base class, but we have
        # forward and backward versions:
        self.cell_fw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)
        
        self.cell_bw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)

        # Run the RNN:
        outputs, finals = tf.nn.bidirectional_dynamic_rnn(
            self.cell_fw,
            self.cell_bw,
            self.feats,
            dtype=tf.float32,
            sequence_length=self.ex_lengths)
      
        # finals is a pair of `LSTMStateTuple` objects, which are themselves
        # pairs of Tensors (x, y), where y is the output state, according to
        # https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMStateTuple
        # Thus, we want the second member of these pairs:
        last_fw, last_bw = finals          
        last_fw, last_bw = last_fw[1], last_bw[1]
        
        last = tf.concat((last_fw, last_bw), axis=1)
        
        self.feat_dim = self.hidden_dim * 2               

        # Softmax classifier on the final hidden state:
        self.W_hy = self.weight_init(
            self.feat_dim, self.output_dim, 'W_hy')
        self.b_y = self.bias_init(self.output_dim, 'b_y')
        self.model = tf.matmul(last, self.W_hy) + self.b_y    

In [None]:
def fit_tf_bidirectional_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfBidirectionalRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=500,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod

_ = sst.experiment(
    rnn_phi,
    fit_tf_bidirectional_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

In [20]:
def fit_tf_bidirectional_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfBidirectionalRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=500,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod

_ = sst.experiment(
    rnn_phi,
    fit_tf_bidirectional_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Iteration 500: loss: 2.4356548786163334

Accuracy: 0.653
             precision    recall  f1-score   support

   negative      0.646     0.647     0.646       428
   positive      0.659     0.658     0.658       444

avg / total      0.653     0.653     0.653       872



In [22]:
def fit_tf_bidirectional_rnn_classifier_relu(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfBidirectionalRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=500,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.relu,
        train_embedding=True)
    mod.fit(X, y)
    return mod

_ = sst.experiment(
    rnn_phi,
    fit_tf_bidirectional_rnn_classifier_relu, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Iteration 500: loss: 2.4546531438827515

Accuracy: 0.636
             precision    recall  f1-score   support

   negative      0.636     0.607     0.621       428
   positive      0.637     0.664     0.650       444

avg / total      0.636     0.636     0.636       872

