# Intro. to Snorkel: Extracting Spouse Relations from the News

## Part V: Training our End Extraction Model

In this final section of the tutorial, we'll use the noisy training labels we generated in the last tutorial part to train our end extraction model.

For this tutorial, we will be training a simple - but fairly effective - logistic regression model.  More generally, however, Snorkel plugs in with many ML libraries including [TensorFlow](https://www.tensorflow.org/), making it easy to use almost any state-of-the-art model as the end extractor!

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import os

# TO USE A DATABASE OTHER THAN SQLITE, USE THIS LINE
# Note that this is necessary for parallel execution amongst other things...
# os.environ['SNORKELDB'] = 'postgres:///snorkel-intro'

import numpy as np
from snorkel import SnorkelSession
session = SnorkelSession()

We repeat our definition of the `Spouse` `Candidate` subclass, and load the test set:

In [2]:
from snorkel.models import candidate_subclass
Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

And, we reload our **noise-aware training labels** (or _training marginals_) from the previous notebook:

In [3]:
from snorkel.annotations import load_marginals
train_marginals = load_marginals(session, split=0)

## 1. Automatically Creating Features

First, we create features over the candidates in the training set. These features characterize the text and dependency path information related to the two person mentions in the candidate. **Note that we will define the set of features we use based on the training set here.**  Also note that this operation may take 5-10 minutes, so for large sets, parallelism should be used (by using a database like postgres and setting the `parallelism` keyword argument of `apply`:

In [4]:
from snorkel.annotations import FeatureAnnotator
featurizer = FeatureAnnotator()

In [5]:
%time F_train = featurizer.apply(split=0)
F_train

Clearing existing...
Running UDF...

CPU times: user 9min 29s, sys: 3.01 s, total: 9min 32s
Wall time: 9min 40s


<4780x118064 sparse matrix of type '<type 'numpy.float64'>'
	with 281252 stored elements in Compressed Sparse Row format>

Next, we **apply the feature set we just got from the training set to the dev and test sets** by using `apply_existing`: 

In [6]:
%%time
F_dev  = featurizer.apply_existing(split=1)
F_test = featurizer.apply_existing(split=2)

Clearing existing...
Running UDF...

Clearing existing...
Running UDF...

CPU times: user 52.4 s, sys: 447 ms, total: 52.9 s
Wall time: 53.3 s


If we've already computed the features, again we can just use the below step:

In [7]:
F_train = featurizer.load_matrix(session, split=0)
F_dev   = featurizer.load_matrix(session, split=1)
F_test  = featurizer.load_matrix(session, split=2)

## 2. Training the Discriminative Model
We use the training marginals to train a discriminative model that classifies each `Candidate` as a true or false mention. We'll use a random hyperparameter search, evaluated on the development set labels, to find the best hyperparameters for our model. To run a hyperparameter search, we need labels for a development set. If they aren't already available, we can manually create labels using the Viewer.

In [8]:
from snorkel.learning import SparseLogisticRegression
disc_model = SparseLogisticRegression()

because the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



Now we set up and run the hyperparameter search, training our model with different hyperparamters and picking the best model configuration to keep. We'll set the random seed to maintain reproducibility.

Note that we are fitting our model's parameters to the training set generated by our labeling functions, while we are picking hyperparamters with respect to score over the development set labels which we created by hand.

In [9]:
from snorkel.learning.utils import MentionScorer
from snorkel.learning import RandomSearch, ListParameter, RangeParameter

# Searching over learning rate
rate_param = RangeParameter('lr', 1e-6, 1e-2, step=1, log_base=10)
l1_param  = RangeParameter('l1_penalty', 1e-6, 1e-2, step=1, log_base=10)
l2_param  = RangeParameter('l2_penalty', 1e-6, 1e-2, step=1, log_base=10)

searcher = RandomSearch(session, disc_model, F_train, train_marginals, [rate_param, l1_param, l2_param], n=20)

Initialized RandomSearch search of size 20. Search space size = 125.


Next, we'll load in our dev set labels. We will pick the optimal result from the hyperparameter search by testing against these labels:

In [10]:
from snorkel.annotations import load_gold_labels
L_gold_dev = load_gold_labels(session, annotator_name='gold', split=1)

Finally, we run the hyperparameter search / train the end extraction model:

In [11]:
np.random.seed(1701)
searcher.fit(F_dev, L_gold_dev, n_epochs=50, rebalance=0.5, print_freq=25)

[1] Testing lr = 1.00e-02, l1_penalty = 1.00e-03, l2_penalty = 1.00e-04
[SparseLR] lr=0.01 l1=0.001 l2=0.0001
[SparseLR] Building model
[SparseLR] Training model  #epochs=50  batch=100
[SparseLR] Epoch 0 (0.11s)	Avg. loss=0.673512	NNZ=118050
[SparseLR] Epoch 25 (0.73s)	Avg. loss=0.558851	NNZ=118045
[SparseLR] Epoch 49 (1.40s)	Avg. loss=0.555625	NNZ=117906
[SparseLR] Training done (1.40s)
[SparseLR] Model saved. To load, use name
		SparseLR_0
[2] Testing lr = 1.00e-04, l1_penalty = 1.00e-06, l2_penalty = 1.00e-03
[SparseLR] lr=0.0001 l1=1e-06 l2=0.001
[SparseLR] Building model
[SparseLR] Training model  #epochs=50  batch=100
[SparseLR] Epoch 0 (0.16s)	Avg. loss=0.692559	NNZ=118064
[SparseLR] Epoch 25 (0.84s)	Avg. loss=0.662954	NNZ=118064
[SparseLR] Epoch 49 (1.41s)	Avg. loss=0.647119	NNZ=118064
[SparseLR] Training done (1.41s)
[3] Testing lr = 1.00e-03, l1_penalty = 1.00e-05, l2_penalty = 1.00e-05
[SparseLR] lr=0.001 l1=1e-05 l2=1e-05
[SparseLR] Building model
[SparseLR] Training model 

Unnamed: 0,lr,l1_penalty,l2_penalty,Prec.,Rec.,F1
4,0.01,0.0001,1e-05,0.6,0.428571,0.5
0,0.01,0.001,0.0001,1.0,0.285714,0.444444
17,0.01,1e-05,1e-06,1.0,0.285714,0.444444
7,0.01,1e-05,0.01,0.666667,0.285714,0.4
19,0.01,1e-05,0.0001,0.666667,0.285714,0.4
18,1e-05,0.01,1e-06,0.085714,0.428571,0.142857
16,1e-06,0.01,0.001,0.044248,0.714286,0.083333
15,1e-05,1e-06,0.0001,0.042553,0.285714,0.074074
9,1e-06,1e-05,0.001,0.019048,0.285714,0.035714
6,1e-06,0.001,0.01,0.013333,0.285714,0.025478


_Note that to train a model without tuning any hyperparameters (at your own risk) just use the `train` method of the discriminative model. For instance, to train with 20 epochs and a learning rate of 0.001, you could run:_
```
disc_model.train(F_train, train_marginals, n_epochs=20, lr=0.001)
```

## 3. Evaluating on the Test Set

In this last section of the tutorial, we'll get the score we've been after: the performance of the extraction model on the blind test set (`split` 2). First, we load the test set labels and gold candidates we made in Part III.

In [12]:
from snorkel.annotations import load_gold_labels
L_gold_test = load_gold_labels(session, annotator_name='gold', split=2)

Now, we score using the discriminative model:

In [13]:
_, _, _, _ = disc_model.score(session, F_test, L_gold_test)

Scores (Un-adjusted)
Pos. class accuracy: 0.429
Neg. class accuracy: 0.993
Precision            0.6
Recall               0.429
F1                   0.5
----------------------------------------
TP: 3 | FP: 2 | TN: 270 | FN: 4



Note that if this is the final test set that you will be reporting final numbers on, to avoid biasing results you should not inspect results.  However you can run the model on your _development set_ and, as we did in the previous part with the generative labeling function model, inspect examples to do error analysis.

##### More importantly, you've now completed the introduction to Snorkel! Give yourself a pat on the back!