# Intro. to Snorkel: Extracting Spouse Relations from the News

## Part III: Creating or Loading Evaluation Labels

Although one of the main purposes of Snorkel is to enable training of state-of-the-art machine learning models _without_ the burden of hand-labeling training data, it is still critical to have a **small** amount of labeled data to help us develop & evaluate our application.

In particular, we will generally need _two_ small labeled sets:
* A **development set**, which can be a subset of our training set, which we use to help guide us when writing _labeling functions_ (see next part of the tutorial)
* A **test set** which we evaluate our final application performance against.  **Note that for fair evaluation, you should get someone not involved in development of your application to label the test set, so that it is _blind_!**

In [1]:
%load_ext autoreload
%autoreload 2
import os

# TO USE A DATABASE OTHER THAN SQLITE, USE THIS LINE
# Note that this is necessary for parallel execution amongst other things...
# os.environ['SNORKELDB'] = 'postgres:///snorkel-intro'

from snorkel import SnorkelSession
session = SnorkelSession()

We repeat our definition of the `Spouse` `Candidate` subclass from Part II, and load in the dev and test sets (splits 1 and 2 resp.), which we'll be labeling here:

In [2]:
from snorkel.models import candidate_subclass
Spouse = candidate_subclass('Spouse', ['person1', 'person2'])

In [3]:
dev_cands = session.query(Spouse).filter(Spouse.split == 1).all()
len(dev_cands)

228

In [4]:
test_cands = session.query(Spouse).filter(Spouse.split == 2).all()
len(test_cands)

278

## Labeling Candidates in the `Viewer`

The main way to label examples in Snorkel is through the `Viewer`, which we've already seen and used in the previous notebook.

**Note that we load in pre-annotated labels below, so you don't actually need to do any labeling in the `Viewer` in this tutorial!**

In [5]:
from snorkel.viewer import SentenceNgramViewer

# NOTE: This if-then statement is only to avoid opening the viewer during automated testing of this notebook
# You should ignore this!
import os
if 'CI' not in os.environ:
    sv = SentenceNgramViewer(dev_cands, session)
else:
    sv = None

<IPython.core.display.Javascript object>

We now open the Viewer.  You can mark each `Candidate` as true or false. Try it!  These labels are automatically saved in the database backend, and can be accessed using the annotator's name as the AnnotationKey.

In [6]:
sv

## Loading External Evaluation Labels

We have already annotated the dev and test set for this tutorial, and we'll now load it using an externally-defined helper function.

Loading and saving external "gold" labels can be a bit messy, but is often a critical part of development, especially when gold labels are expensive and/or time-consuming to obtain.  Snorkel stores all labels that are manually annotated in a **stable** format (called `StableLabels`), which is somewhat independent from the rest of Snorkel's data model, does not get deleted when you delete the candidates, corpus, or any other objects, and can be recovered even if the rest of the data changes or is deleted.

Our general procedure with external labels is to load them into the `StableLabel` table, then use Snorkel's helpers to load them into the main data model from there. If interested in example implementation details, please see the script we now load:

In [8]:
from load_external_annotations import load_external_labels
load_external_labels(session, Spouse, annotator_name='gold')

0951_3fd94a0c-da65-4a1f-bf1a-b589cfea24d0::span:155:167~~0951_3fd94a0c-da65-4a1f-bf1a-b589cfea24d0::span:369:381
[]
AnnotatorLabels created: 0


KeyboardInterrupt: 

_We note that due to some parsing inconsistencies, you should see 220/223 and 273/279 labels loaded._

If you want to confirm that these labels are loaded, you can reload the `SentenceNgramViewer` with `annotator_name=gold` to see them! Next, in Part IV, we will work towards building a model to predict these labels with high accuracy using data programming.