# Homework 3: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney"
__version__ = "CS224U, Stanford, Spring 2019"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baseline](#Baseline)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 point]](#Different-model-factory-[1-point])
  1. [Directional unigram features [2 points]](#Directional-unigram-features-[2-points])
  1. [The part-of-speech tags of the "middle" words [2 points]](#The-part-of-speech-tags-of-the-"middle"-words-[2-points])
  1. [Your original system [4 points]](#Your-original-system-[4-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to the developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [1]:
from functools import partial
import nltk
import numpy as np
import os
import random
import rel_ext
from nltk.corpus import wordnet as wn
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
import string
import utils

In [2]:
sbt_obj = "Stockholm"
obt_obj = "Bergen"
#feature_counter["sbt_obj"] = 0
#feature_counter["obt_obj"] = 0
text = nltk.word_tokenize(sbt_obj)
nes = nltk.ne_chunk(nltk.pos_tag(text))
for ne in nes:
    if type(ne) is nltk.tree.Tree and ne.label() == "GPE":
        print(ne.label())
        #feature_counter["sbt_obj"] = 1

text = nltk.word_tokenize(obt_obj)
nes = nltk.ne_chunk(nltk.pos_tag(text))
for ne in nes:
    if type(ne) is nltk.tree.Tree and ne.label() == "GPE":
        print("obt")
        #feature_counter["obt_obj"] = 1

GPE
obt


As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [2]:
DATA_HOME = '/home/kd/data/data'
rel_ext_data_home = os.path.join(DATA_HOME, 'rel_ext_data')
GLOVE_HOME = os.path.join(DATA_HOME, 'glove.6B')

In [3]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
splits

{'all': Corpus with 331,696 examples; KB with 45,884 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples}

## Baseline

In [9]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [10]:
featurizers = [simple_bag_of_words_featurizer]

In [11]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')
model_factory_2k = lambda: LogisticRegression(fit_intercept=True, solver='liblinear', max_iter=2000)
model_factory_4k = lambda: LogisticRegression(fit_intercept=True, solver='liblinear', max_iter=4000)

In [14]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
contains                  0.793      0.599      0.745       3904       9280
film_performance          0.800      0.552      0.734        766       6142
capital                   0.567      0.179      0.395         95       5471
place_of_birth            0.706      0.206      0.475        233       5609
parents                   0.852      0.535      0.762        312       5688
is_a                      0.675      0.213      0.471        497       5873
has_sibling               0.874      0.236      0.568        499       5875
author                    0.776      0.511      0.703        509       5885
has_spouse                0.864      0.322      0.646        594       5970
worked_at                 0.700      0.231      0.498        242       5618
place_of_death            0.607      0.107      0.314        159       5535
nationality 

Studying model weights might yield insights:

In [13]:
rel_ext.examine_model_weights(baseline_results)

NameError: name 'baseline_results' is not defined

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 point]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A call to `rel_ext.experiment` training on the 'train' part of `splits` and assessing on its `dev` part, with `featurizers` as defined above in this notebook and the `model_factory` set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values.

In [12]:
svc_model_factory = lambda: SVC(kernel='linear')

In [25]:
svc_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.752      0.627      0.723        766       6142
place_of_birth            0.602      0.215      0.442        233       5609
nationality               0.496      0.189      0.375        301       5677
place_of_death            0.450      0.113      0.282        159       5535
has_spouse                0.842      0.342      0.651        594       5970
genre                     0.516      0.276      0.440        170       5546
founders                  0.725      0.424      0.635        380       5756
has_sibling               0.774      0.255      0.550        499       5875
is_a                      0.608      0.284      0.495        497       5873
profession                0.577      0.259      0.463        247       5623
worked_at                 0.622      0.306      0.515        242       5618
contains    

### Directional unigram features [2 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example. The precise nature of the mark you add for the two cases doesn't make a difference to the model.

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it on Piazza!)

In [13]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter, fwd_prefix='$FWD_DIRECTION: ', 
                                        bwd_prefix='$BWD_DIREECTION: ', use_middle_length=False,
                                        use_entities=False, include_left=False, include_right=False,
                                       use_entities2=False):
    count = 0
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        count += 1
        words = ex.middle.split(' ')
        for word in words:
            word_direction = fwd_prefix + "_middle_" + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter[fwd_prefix + 'NUM_WORD_IN_MIDDLE']  += len(words)
        if include_left:
            words = ex.left.split(' ')
            for word in words:
                word_key = fwd_prefix + "_left_" + word
                feature_counter[word_key] += 1
            if use_middle_length:
                feature_counter[fwd_prefix + 'NUM_WORD_IN_LEFT']  += len(words)
        if include_right:
            words = ex.right.split(' ')
            for word in words:
                word_key = fwd_prefix + "_right_" + word
                feature_counter[word_key] += 1
            if use_middle_length:
                feature_counter[fwd_prefix + 'NUM_WORD_IN_RIGHT']  += len(words)                
        if use_entities:
            feature_counter[fwd_prefix + kbt.sbj] += 1
            feature_counter[fwd_prefix + kbt.obj] += 1
        if use_entities2:
            feature_counter["fwd_kbt.sbj"] += 1
            feature_counter["fwd_kbt.obj"] += 1

    count = max(count, 1)
    if use_middle_length:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_MIDDLE']  /= count
    if include_left:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_LEFT'] /= count
    if include_right:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_RIGHT'] /= count
        
    count = 0
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        count += 1
        words = ex.middle.split(' ')
        for word in words:
            word_direction = bwd_prefix +"_middle_" + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter['BWD_NUM_WORD_IN_MIDDLE']  += len(words)
            
        if include_left:
            words = ex.left.split(' ')
            for word in words:
                word_key = bwd_prefix + "_left_" + word
                feature_counter[word_key] += 1
            if use_middle_length:
                feature_counter['BWD_NUM_WORD_IN_LEFT']  += len(words)
        if include_right:
            words = ex.right.split(' ')
            for word in words:
                word_key = bwd_prefix + "_right_" + word
                feature_counter[word_key] += 1
            if use_middle_length:
                feature_counter[bwd_prefix +"BWD_NUM_WORD_IN_RIGHT'"]  += len(words)
        if use_entities:
            feature_counter[bwd_prefix + kbt.sbj] += 1
            feature_counter[bwd_prefix + kbt.obj] += 1
        if use_entities2:
            feature_counter["bwd_kbt.sbj"] += 1
            feature_counter["bwd_kbt.obj"] += 1
            
    count = max(count, 1)

    if use_middle_length:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_MIDDLE']  /= count
    if include_left:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_LEFT'] /= count
    if include_right:
        feature_counter[fwd_prefix + 'NUM_WORD_IN_RIGHT'] /= count

    return feature_counter

In [16]:
directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
contains                  0.838      0.650      0.792       3904       9280
film_performance          0.849      0.667      0.805        766       6142
capital                   0.667      0.232      0.485         95       5471
place_of_birth            0.687      0.245      0.504        233       5609
parents                   0.862      0.519      0.761        312       5688
is_a                      0.731      0.245      0.524        497       5873
has_sibling               0.866      0.246      0.576        499       5875
author                    0.826      0.568      0.757        509       5885
has_spouse                0.872      0.343      0.667        594       5970
worked_at                 0.776      0.273      0.567        242       5618
place_of_death            0.511      0.151      0.346        159       5535
nationality 

In [45]:
directional_bag_left_right_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(directional_bag_of_words_featurizer, use_middle_length=True, 
                         use_entities=False, include_left=True, include_right=True)],
    model_factory=model_factory_2k,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
founders                  0.774      0.476      0.688        380       5756
has_spouse                0.825      0.657      0.784        594       5970
place_of_birth            0.800      0.446      0.691        233       5609
adjoins                   0.800      0.529      0.726        340       5716
genre                     0.729      0.253      0.530        170       5546
place_of_death            0.753      0.421      0.650        159       5535
parents                   0.842      0.647      0.794        312       5688
nationality               0.783      0.611      0.741        301       5677
is_a                      0.826      0.479      0.722        497       5873
worked_at                 0.814      0.343      0.638        242       5618
profession                0.824      0.397      0.678        247       5623
capital     

In [46]:
directional_bag_left_right_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(directional_bag_of_words_featurizer, use_middle_length=True, 
                         use_entities=True, include_left=True, include_right=True)],
    model_factory=model_factory_2k,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
founders                  0.784      0.468      0.691        380       5756
has_spouse                0.843      0.653      0.797        594       5970
place_of_birth            0.824      0.442      0.703        233       5609
adjoins                   0.808      0.521      0.728        340       5716
genre                     0.789      0.265      0.565        170       5546
place_of_death            0.786      0.415      0.667        159       5535
parents                   0.854      0.638      0.800        312       5688
nationality               0.835      0.638      0.786        301       5677
is_a                      0.864      0.511      0.759        497       5873
worked_at                 0.824      0.347      0.646        242       5618
profession                0.903      0.453      0.754        247       5623
capital     

In [47]:
directional_bag_left_right_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(directional_bag_of_words_featurizer, use_middle_length=True, 
                         use_entities2=True, include_left=True, include_right=True)],
    model_factory=model_factory_2k,
    verbose=True)



KeyboardInterrupt: 

In [None]:
directional_bag_left_right_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(directional_bag_of_words_featurizer, use_middle_length=True, use_entities=True, 
                         use_entities2=True, include_left=True, include_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

### The part-of-speech tags of the "middle" words [2 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.
   
   Don't forget the start and end tags, to model those environments properly!

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

Note: To parse `middle_POS`, one splits on whitespace to get the `word/TAG` pairs. Each of these pairs `s` can be parsed with `s.rsplit('/', 1)`.

In [14]:
def pos_featurize(pos_segments, feature_counter, prefix=""):
    word_POSs = pos_segments.split(' ')
    len_POS = len(word_POSs)
    for i in range(-1, len_POS - 1):
        pos = word_POSs[i].rsplit('/', 1)
        bigram = ""
        if len(pos) > 1:
            if i == -1:
                bigram = '<s> ' + pos[1]
            elif i == len_POS - 2:
                bigram = pos[1] + ' </s>'
            else:
                bigram = pos[1] + " " + word_POSs[i+1].rsplit('/', 1)[1]
        feature_counter[prefix + bigram] += 1
    return feature_counter

In [15]:
def bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=False, use_right=False, use_bt_pos=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        if use_bt_pos:
            mention_pos_1 = ex.mention_1_POS.rsplit('/',1)
            mention_pos_2 = ex.mention_1_POS.rsplit('/',1)
            feature_counter["sbj_"+mention_pos_1[1]] = 1
            feature_counter["obj_"+mention_pos_1[1]] = 1
        feature_counter = pos_featurize(ex.middle_POS, feature_counter, "middle")
        if use_left:
            feature_counter = pos_featurize(ex.left_POS, feature_counter, "left")
        if use_right:
            feature_counter = pos_featurize(ex.right_POS, feature_counter, "right")
    return feature_counter

In [16]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    return feature_counter

In [43]:
pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[middle_bigram_pos_tag_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.653      0.163      0.408        301       5677
author                    0.845      0.246      0.568        509       5885
worked_at                 0.607      0.140      0.365        242       5618
parents                   0.735      0.240      0.521        312       5688
has_spouse                0.777      0.258      0.554        594       5970
place_of_death            0.700      0.132      0.376        159       5535
profession                0.742      0.186      0.465        247       5623
genre                     0.824      0.082      0.294        170       5546
capital                   0.500      0.095      0.269         95       5471
place_of_birth            0.750      0.206      0.491        233       5609
contains                  0.725      0.297      0.563       3904       9280
film_perform

In [45]:
pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(bigram_pos_tag_featurizer, use_left=True, use_right=True)],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.552      0.246      0.442        301       5677
author                    0.717      0.379      0.609        509       5885
worked_at                 0.481      0.260      0.411        242       5618
parents                   0.667      0.269      0.515        312       5688
has_spouse                0.634      0.311      0.525        594       5970
place_of_death            0.471      0.201      0.371        159       5535
profession                0.632      0.271      0.499        247       5623
genre                     0.588      0.118      0.327        170       5546
capital                   0.478      0.116      0.294         95       5471
place_of_birth            0.612      0.258      0.480        233       5609
contains                  0.743      0.298      0.572       3904       9280
film_perform

In [46]:
pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(bigram_pos_tag_featurizer, use_left=True, use_right=False)],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.594      0.252      0.467        301       5677
author                    0.720      0.354      0.596        509       5885
worked_at                 0.585      0.256      0.465        242       5618
parents                   0.682      0.282      0.531        312       5688
has_spouse                0.688      0.301      0.548        594       5970
place_of_death            0.536      0.233      0.425        159       5535
profession                0.635      0.219      0.460        247       5623
genre                     0.621      0.106      0.315        170       5546
capital                   0.600      0.126      0.343         95       5471
place_of_birth            0.574      0.232      0.443        233       5609
contains                  0.740      0.302      0.574       3904       9280
film_perform

In [48]:
pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(bigram_pos_tag_featurizer, use_left=False, use_right=True)],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.634      0.196      0.438        301       5677
author                    0.781      0.358      0.632        509       5885
worked_at                 0.526      0.211      0.405        242       5618
parents                   0.653      0.260      0.501        312       5688
has_spouse                0.713      0.293      0.554        594       5970
place_of_death            0.525      0.132      0.329        159       5535
profession                0.759      0.243      0.533        247       5623
genre                     0.579      0.129      0.342        170       5546
capital                   0.458      0.116      0.288         95       5471
place_of_birth            0.683      0.240      0.499        233       5609
contains                  0.720      0.285      0.552       3904       9280
film_perform

In [38]:
pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(bigram_pos_tag_featurizer, use_left=True, use_right=False, use_bt_pos=True)],
    model_factory=model_factory,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
founders                  0.586      0.216      0.436        380       5756
has_spouse                0.736      0.320      0.584        594       5970
place_of_birth            0.553      0.245      0.442        233       5609
adjoins                   0.823      0.465      0.713        340       5716
genre                     0.562      0.106      0.302        170       5546
place_of_death            0.547      0.258      0.447        159       5535
parents                   0.688      0.276      0.530        312       5688
nationality               0.576      0.252      0.458        301       5677
is_a                      0.652      0.207      0.456        497       5873
worked_at                 0.655      0.298      0.528        242       5618
profession                0.753      0.235      0.523        247       5623
capital     

### Your original system [4 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?
- Consider adding features based on WordNet synsets. Here's a little code to get you started with that:
  ```
  from nltk.corpus import wordnet as wn
  dog_compatible_synsets = wn.synsets('dog', pos='n')
 ```

In [17]:
simple_bag_of_words_middle_featurizer = partial(simple_bag_of_words_featurizer,use_middle_length=True)

In [18]:
bow_middle_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
parents                   0.829      0.545      0.751        312       5688
has_sibling               0.848      0.234      0.557        499       5875
place_of_birth            0.653      0.202      0.451        233       5609
author                    0.842      0.534      0.755        509       5885
founders                  0.776      0.392      0.649        380       5756
genre                     0.596      0.165      0.391        170       5546
film_performance          0.793      0.569      0.735        766       6142
nationality               0.578      0.196      0.416        301       5677
adjoins                   0.821      0.365      0.657        340       5716
place_of_death            0.486      0.107      0.284        159       5535
contains                  0.799      0.602      0.750       3904       9280
worked_at   

In [19]:
def bow_featurize(words, feature_counter, n, prefix="", directional_prefix="", use_middle_length=False):
    for i in range(0, len(words), n):
            end = i + n
            if (len(words) - i) < n:
                end = len(words)
            n_gram = ' '.join(words[i:end])
            n_gram = directional_prefix + n_gram
            feature_counter[prefix + n_gram] += 1
    if use_middle_length:
        feature_counter[directional_prefix+'NUM_WORD_IN_MIDDLE']  += len(words)
    return feature_counter

In [20]:
def ngrams_bag_of_words_featurizer(kbt, corpus, feature_counter, n=2, 
                                   directional=False, use_middle_length=False,
                                   use_left=False, use_right=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = ex.middle.split(' ')
        directional_prefix=""
        if directional:
            directional_prefix = "FWD_"
        feature_counter = bow_featurize(words, feature_counter, n, "middle_", directional_prefix, use_middle_length)
        if use_left:
            words = ex.middle.split(' ')
            feature_counter = bow_featurize(words, feature_counter, n, "left_", directional_prefix, use_middle_length)
        if use_right:
            words = ex.middle.split(' ')
            feature_counter = bow_featurize(words, feature_counter, n, "right_", directional_prefix, use_middle_length)
        
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = ex.middle.split(' ')
        directional_prefix=""
        if directional:
            directional_prefix = "BWD_"
        feature_counter = bow_featurize(words, feature_counter, n, "middle_", directional_prefix, use_middle_length)
        if use_left:
            words = ex.middle.split(' ')
            feature_counter = bow_featurize(words, feature_counter, n, "left_", directional_prefix, use_middle_length)
        if use_right:
            words = ex.middle.split(' ')
            feature_counter = bow_featurize(words, feature_counter, n, "right_", directional_prefix, use_middle_length)
    return feature_counter

In [21]:
bigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=2)
trigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=3)

In [93]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.677      0.146      0.392        301       5677
author                    0.849      0.409      0.698        509       5885
worked_at                 0.700      0.174      0.436        242       5618
parents                   0.910      0.388      0.717        312       5688
has_spouse                0.915      0.273      0.622        594       5970
place_of_death            0.692      0.057      0.213        159       5535
profession                0.774      0.166      0.447        247       5623
genre                     0.690      0.171      0.429        170       5546
capital                   0.500      0.168      0.359         95       5471
place_of_birth            0.836      0.197      0.508        233       5609
contains                  0.799      0.573      0.740       3904       9280
film_perform

In [94]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=True, use_right=True, directional=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.692      0.209      0.474        301       5677
author                    0.824      0.580      0.760        509       5885
worked_at                 0.736      0.219      0.500        242       5618
parents                   0.912      0.433      0.747        312       5688
has_spouse                0.910      0.306      0.653        594       5970
place_of_death            0.692      0.113      0.342        159       5535
profession                0.781      0.231      0.529        247       5623
genre                     0.750      0.282      0.563        170       5546
capital                   0.590      0.242      0.458         95       5471
place_of_birth            0.825      0.223      0.536        233       5609
contains                  0.763      0.747      0.759       3904       9280
film_perform

In [97]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=True, use_right=False, directional=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.698      0.199      0.465        301       5677
author                    0.825      0.574      0.758        509       5885
worked_at                 0.718      0.211      0.485        242       5618
parents                   0.915      0.417      0.739        312       5688
has_spouse                0.914      0.288      0.637        594       5970
place_of_death            0.692      0.113      0.342        159       5535
profession                0.779      0.215      0.511        247       5623
genre                     0.759      0.259      0.547        170       5546
capital                   0.579      0.232      0.445         95       5471
place_of_birth            0.825      0.223      0.536        233       5609
contains                  0.838      0.627      0.785       3904       9280
film_perform

In [98]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=False, use_right=True, directional=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.698      0.199      0.465        301       5677
author                    0.825      0.574      0.758        509       5885
worked_at                 0.718      0.211      0.485        242       5618
parents                   0.915      0.417      0.739        312       5688
has_spouse                0.914      0.288      0.637        594       5970
place_of_death            0.692      0.113      0.342        159       5535
profession                0.779      0.215      0.511        247       5623
genre                     0.759      0.259      0.547        170       5546
capital                   0.579      0.232      0.445         95       5471
place_of_birth            0.825      0.223      0.536        233       5609
contains                  0.838      0.627      0.785       3904       9280
film_perform

In [95]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=True, use_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.667      0.173      0.424        301       5677
author                    0.750      0.519      0.689        509       5885
worked_at                 0.761      0.211      0.500        242       5618
parents                   0.868      0.420      0.715        312       5688
has_spouse                0.909      0.301      0.648        594       5970
place_of_death            0.412      0.044      0.154        159       5535
profession                0.729      0.206      0.484        247       5623
genre                     0.685      0.218      0.479        170       5546
capital                   0.528      0.200      0.397         95       5471
place_of_birth            0.806      0.215      0.520        233       5609
contains                  0.786      0.587      0.736       3904       9280
film_perform

In [99]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=True, use_right=False)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.662      0.163      0.410        301       5677
author                    0.759      0.513      0.692        509       5885
worked_at                 0.731      0.202      0.480        242       5618
parents                   0.882      0.407      0.715        312       5688
has_spouse                0.911      0.291      0.639        594       5970
place_of_death            0.500      0.050      0.179        159       5535
profession                0.750      0.194      0.477        247       5623
genre                     0.706      0.212      0.481        170       5546
capital                   0.529      0.189      0.390         95       5471
place_of_birth            0.817      0.210      0.518        233       5609
contains                  0.791      0.582      0.738       3904       9280
film_perform

In [100]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, use_left=False, use_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.662      0.163      0.410        301       5677
author                    0.759      0.513      0.692        509       5885
worked_at                 0.731      0.202      0.480        242       5618
parents                   0.882      0.407      0.715        312       5688
has_spouse                0.911      0.291      0.639        594       5970
place_of_death            0.500      0.050      0.179        159       5535
profession                0.750      0.194      0.477        247       5623
genre                     0.706      0.212      0.481        170       5546
capital                   0.529      0.189      0.390         95       5471
place_of_birth            0.817      0.210      0.518        233       5609
contains                  0.791      0.582      0.738       3904       9280
film_perform

In [96]:
trigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[trigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.732      0.136      0.390        301       5677
author                    0.777      0.458      0.682        509       5885
worked_at                 0.811      0.124      0.385        242       5618
parents                   0.925      0.276      0.629        312       5688
has_spouse                0.940      0.239      0.593        594       5970
place_of_death            0.714      0.031      0.134        159       5535
profession                0.742      0.093      0.310        247       5623
genre                     0.857      0.141      0.426        170       5546
capital                   0.423      0.116      0.276         95       5471
place_of_birth            0.800      0.103      0.340        233       5609
contains                  0.808      0.478      0.710       3904       9280
film_perform

In [22]:
bigrams_bag_of_words_featurizer_use_middle = partial(ngrams_bag_of_words_featurizer, n=2, use_middle_length=True)
bigrams_bag_of_words_featurizer_use_mid_direction = partial(ngrams_bag_of_words_featurizer, n=2, directional=True, use_middle_length=True)

In [20]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer_use_middle],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.818      0.470      0.713        766       6142
place_of_birth            0.692      0.193      0.456        233       5609
nationality               0.592      0.140      0.359        301       5677
place_of_death            0.467      0.044      0.160        159       5535
has_spouse                0.874      0.256      0.589        594       5970
genre                     0.738      0.182      0.459        170       5546
founders                  0.766      0.276      0.566        380       5756
has_sibling               0.890      0.178      0.495        499       5875
is_a                      0.731      0.175      0.447        497       5873
profession                0.684      0.158      0.411        247       5623
worked_at                 0.682      0.186      0.445        242       5618
contains    

In [122]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, directional=True, 
                        use_left=True, use_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.692      0.209      0.474        301       5677
author                    0.824      0.580      0.760        509       5885
worked_at                 0.736      0.219      0.500        242       5618
parents                   0.912      0.433      0.747        312       5688
has_spouse                0.910      0.306      0.653        594       5970
place_of_death            0.692      0.113      0.342        159       5535
profession                0.781      0.231      0.529        247       5623
genre                     0.750      0.282      0.563        170       5546
capital                   0.590      0.242      0.458         95       5471
place_of_birth            0.825      0.223      0.536        233       5609
contains                  0.763      0.747      0.759       3904       9280
film_perform

In [117]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, directional=True, 
                         use_middle_length=True, use_left=True, use_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.713      0.223      0.495        301       5677
author                    0.853      0.572      0.777        509       5885
worked_at                 0.803      0.202      0.504        242       5618
parents                   0.915      0.413      0.736        312       5688
has_spouse                0.877      0.288      0.622        594       5970
place_of_death            0.609      0.088      0.279        159       5535
profession                0.800      0.227      0.531        247       5623
genre                     0.754      0.288      0.570        170       5546
capital                   0.618      0.221      0.455         95       5471
place_of_birth            0.864      0.219      0.544        233       5609
contains                  0.786      0.688      0.764       3904       9280
film_perform

In [118]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, directional=True, 
                         use_middle_length=True, use_left=False, use_right=True)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.707      0.176      0.441        301       5677
author                    0.858      0.558      0.775        509       5885
worked_at                 0.754      0.190      0.473        242       5618
parents                   0.922      0.417      0.742        312       5688
has_spouse                0.881      0.286      0.622        594       5970
place_of_death            0.619      0.082      0.267        159       5535
profession                0.779      0.215      0.511        247       5623
genre                     0.784      0.235      0.535        170       5546
capital                   0.576      0.200      0.419         95       5471
place_of_birth            0.864      0.219      0.544        233       5609
contains                  0.786      0.688      0.764       3904       9280
film_perform

In [119]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[partial(ngrams_bag_of_words_featurizer, n=2, directional=True, 
                         use_middle_length=True, use_left=True, use_right=False)],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.707      0.176      0.441        301       5677
author                    0.858      0.558      0.775        509       5885
worked_at                 0.754      0.190      0.473        242       5618
parents                   0.922      0.417      0.742        312       5688
has_spouse                0.881      0.286      0.622        594       5970
place_of_death            0.619      0.082      0.267        159       5535
profession                0.779      0.215      0.511        247       5623
genre                     0.784      0.235      0.535        170       5546
capital                   0.576      0.200      0.419         95       5471
place_of_birth            0.864      0.219      0.544        233       5609
contains                  0.786      0.688      0.764       3904       9280
film_perform

In [21]:
simple_bag_of_words_entities_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True)

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_entities_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.827      0.551      0.752        766       6142
place_of_birth            0.724      0.236      0.512        233       5609
nationality               0.678      0.266      0.517        301       5677
place_of_death            0.559      0.119      0.322        159       5535
has_spouse                0.931      0.295      0.650        594       5970
genre                     0.829      0.371      0.665        170       5546
founders                  0.794      0.376      0.650        380       5756
has_sibling               0.930      0.240      0.591        499       5875
is_a                      0.812      0.348      0.641        497       5873
profession                0.800      0.340      0.630        247       5623
worked_at                 0.744      0.277      0.556        242       5618
contains    

In [22]:
left_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='left')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[left_bag_of_words_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.769      0.453      0.675        766       6142
place_of_birth            0.435      0.086      0.240        233       5609
nationality               0.451      0.136      0.308        301       5677
place_of_death            0.130      0.019      0.060        159       5535
has_spouse                0.653      0.133      0.366        594       5970
genre                     0.706      0.212      0.481        170       5546
founders                  0.522      0.092      0.270        380       5756
has_sibling               0.783      0.253      0.551        499       5875
is_a                      0.642      0.282      0.511        497       5873
profession                0.800      0.211      0.513        247       5623
worked_at                 0.659      0.120      0.347        242       5618
contains    

In [23]:
right_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='right')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[right_bag_of_words_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.729      0.386      0.619        766       6142
place_of_birth            0.326      0.060      0.173        233       5609
nationality               0.400      0.100      0.250        301       5677
place_of_death            0.421      0.050      0.170        159       5535
has_spouse                0.537      0.121      0.319        594       5970
genre                     0.750      0.212      0.497        170       5546
founders                  0.517      0.082      0.250        380       5756
has_sibling               0.676      0.146      0.392        499       5875
is_a                      0.619      0.219      0.454        497       5873
profession                0.765      0.158      0.432        247       5623
worked_at                 0.526      0.083      0.254        242       5618
contains    

In [26]:
simple_bag_of_words_middle_entities_featurizer = partial(simple_bag_of_words_featurizer, 
                                                         use_entities=True, use_middle_length=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_entities_featurizer],
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.751      0.617      0.720        766       6142
place_of_birth            0.562      0.253      0.452        233       5609
nationality               0.521      0.326      0.465        301       5677
place_of_death            0.383      0.145      0.288        159       5535
has_spouse                0.872      0.332      0.658        594       5970
genre                     0.764      0.553      0.710        170       5546
founders                  0.803      0.418      0.678        380       5756
has_sibling               0.884      0.305      0.640        499       5875
is_a                      0.742      0.493      0.674        497       5873
profession                0.756      0.478      0.677        247       5623
worked_at                 0.692      0.343      0.575        242       5618
contains    

In [23]:
def simple_bag_of_words_featurizer2(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    synset_prefix = "synset_:"
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [28]:
simple_bag_of_words_synsets_featurizer = partial(simple_bag_of_words_featurizer2, 
                                                         use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_synsets_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.698      0.615      0.679        766       6142
place_of_birth            0.460      0.223      0.380        233       5609
nationality               0.374      0.243      0.338        301       5677
place_of_death            0.329      0.145      0.262        159       5535
has_spouse                0.795      0.327      0.618        594       5970
genre                     0.400      0.306      0.377        170       5546
founders                  0.630      0.447      0.582        380       5756
has_sibling               0.730      0.261      0.537        499       5875
is_a                      0.496      0.262      0.421        497       5873
profession                0.458      0.219      0.376        247       5623
worked_at                 0.565      0.306      0.483        242       5618
contains    

In [29]:
simple_bag_of_words_all_featurizer = partial(simple_bag_of_words_featurizer2, 
                                            use_entities=True, use_middle_length=True, use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_all_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.738      0.598      0.705        766       6142
place_of_birth            0.481      0.219      0.388        233       5609
nationality               0.484      0.306      0.434        301       5677
place_of_death            0.403      0.157      0.307        159       5535
has_spouse                0.877      0.300      0.633        594       5970
genre                     0.574      0.412      0.532        170       5546
founders                  0.647      0.400      0.576        380       5756
has_sibling               0.772      0.244      0.539        499       5875
is_a                      0.676      0.356      0.573        497       5873
profession                0.730      0.328      0.586        247       5623
worked_at                 0.615      0.277      0.494        242       5618
contains    

In [37]:
simple_bag_of_words_all_featurizer = partial(simple_bag_of_words_featurizer2, 
                                            use_entities=True, use_middle_length=True, use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_all_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.738      0.598      0.705        766       6142
place_of_birth            0.481      0.219      0.388        233       5609
nationality               0.484      0.306      0.434        301       5677
place_of_death            0.403      0.157      0.307        159       5535
has_spouse                0.877      0.300      0.633        594       5970
genre                     0.574      0.412      0.532        170       5546
founders                  0.647      0.400      0.576        380       5756
has_sibling               0.772      0.244      0.539        499       5875
is_a                      0.676      0.356      0.573        497       5873
profession                0.730      0.328      0.586        247       5623
worked_at                 0.615      0.277      0.494        242       5618
contains    

In [77]:
directional_middle_featurizer = partial(directional_bag_of_words_featurizer, use_middle_length=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.847      0.645      0.797        766       6142
place_of_birth            0.692      0.232      0.495        233       5609
nationality               0.623      0.219      0.455        301       5677
place_of_death            0.550      0.138      0.345        159       5535
has_spouse                0.896      0.347      0.680        594       5970
genre                     0.657      0.259      0.502        170       5546
founders                  0.823      0.416      0.688        380       5756
has_sibling               0.897      0.244      0.585        499       5875
is_a                      0.747      0.225      0.510        497       5873
profession                0.740      0.231      0.514        247       5623
worked_at                 0.727      0.264      0.539        242       5618
contains    

In [78]:
directional_middle_entities_featurizer = partial(directional_bag_of_words_featurizer, 
                                                 use_middle_length=True, use_entities=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_entities_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.855      0.633      0.799        766       6142
place_of_birth            0.772      0.262      0.556        233       5609
nationality               0.705      0.326      0.572        301       5677
place_of_death            0.565      0.164      0.379        159       5535
has_spouse                0.944      0.311      0.671        594       5970
genre                     0.873      0.406      0.710        170       5546
founders                  0.828      0.379      0.669        380       5756
has_sibling               0.947      0.251      0.609        499       5875
is_a                      0.811      0.372      0.656        497       5873
profession                0.828      0.389      0.675        247       5623
worked_at                 0.756      0.281      0.565        242       5618
contains    

In [24]:
def ensembled_bow_pos_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True) 
    return bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=True)

In [54]:
ensembled_bow_pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_featurizer],
    model_factory=model_factory_4k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.877      0.664      0.824        301       5677
author                    0.878      0.723      0.842        509       5885
worked_at                 0.777      0.302      0.591        242       5618
parents                   0.881      0.667      0.828        312       5688
has_spouse                0.822      0.646      0.780        594       5970
place_of_death            0.779      0.421      0.666        159       5535
profession                0.917      0.534      0.802        247       5623
genre                     0.949      0.329      0.690        170       5546
capital                   0.462      0.126      0.302         95       5471
place_of_birth            0.841      0.498      0.739        233       5609
contains                  0.865      0.690      0.823       3904       9280
film_perform

In [25]:
def ensembled_bow_ngrams_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True) 
    return bigrams_bag_of_words_featurizer(kbt, corpus, feature_counter)

In [102]:
ensembled_bow_ngrams_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_ngrams_featurizer],
    model_factory=model_factory_4k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
nationality               0.898      0.645      0.833        301       5677
author                    0.872      0.721      0.837        509       5885
worked_at                 0.813      0.252      0.563        242       5618
parents                   0.889      0.667      0.833        312       5688
has_spouse                0.846      0.603      0.783        594       5970
place_of_death            0.847      0.384      0.682        159       5535
profession                0.923      0.486      0.782        247       5623
genre                     0.925      0.288      0.641        170       5546
capital                   0.480      0.126      0.308         95       5471
place_of_birth            0.895      0.438      0.740        233       5609
contains                  0.870      0.674      0.822       3904       9280
film_perform

In [26]:
def ensembled_bow_pos_ngrams_featurizer2(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True)
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=True)
    return bigrams_bag_of_words_featurizer(kbt, corpus, feature_counter)

In [89]:
ensembled_bow_pos_ngrams_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_ngrams_featurizer2],
    model_factory=model_factory_4k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.858      0.692      0.818        766       6142
place_of_birth            0.779      0.468      0.687        233       5609
nationality               0.871      0.648      0.815        301       5677
place_of_death            0.739      0.409      0.636        159       5535
has_spouse                0.836      0.616      0.780        594       5970
genre                     0.908      0.347      0.686        170       5546
founders                  0.825      0.458      0.711        380       5756
has_sibling               0.909      0.637      0.837        499       5875
is_a                      0.861      0.547      0.772        497       5873
profession                0.899      0.543      0.795        247       5623
worked_at                 0.758      0.285      0.569        242       5618
contains    

In [27]:
def ensembled_bow_pos_ngrams_direct_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True)
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=True)
    return bigrams_bag_of_words_featurizer_use_mid_direction(kbt, corpus, feature_counter)

In [None]:
ensembled_bow_pos_ngrams_direct_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_ngrams_direct_featurizer],
    model_factory=model_factory_4k,
    verbose=True)

In [28]:
def ensembled_bow_pos_ngrams_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True)
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    return bigrams_bag_of_words_featurizer(kbt, corpus, feature_counter)

In [28]:
ensembled_bow_pos_ngrams_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_ngrams_featurizer],
    model_factory=model_factory_4k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
founders                  0.794      0.476      0.700        380       5756
has_spouse                0.814      0.618      0.765        594       5970
place_of_birth            0.766      0.464      0.678        233       5609
adjoins                   0.839      0.306      0.622        340       5716
genre                     0.871      0.359      0.678        170       5546
place_of_death            0.723      0.428      0.636        159       5535
parents                   0.886      0.670      0.832        312       5688
nationality               0.848      0.668      0.805        301       5677
is_a                      0.903      0.561      0.805        497       5873
worked_at                 0.870      0.277      0.609        242       5618
profession                0.944      0.547      0.824        247       5623
capital     

In [29]:
def ensembled_bow_pos_ngrams_final(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True)
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=True, use_bt_pos=True)
    return ngrams_bag_of_words_featurizer(kbt, corpus, feature_counter, n=2, directional=True, 
                                        use_left=True, use_right=True)

In [30]:
ensembled_bow_pos_ngrams_direct_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_ngrams_final],
    model_factory=model_factory_4k,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
parents                   0.879      0.673      0.828        312       5688
has_sibling               0.870      0.681      0.824        499       5875
place_of_birth            0.764      0.485      0.685        233       5609
author                    0.884      0.762      0.857        509       5885
founders                  0.811      0.553      0.742        380       5756
genre                     0.875      0.329      0.657        170       5546
film_performance          0.891      0.749      0.859        766       6142
nationality               0.817      0.698      0.790        301       5677
adjoins                   0.753      0.512      0.688        340       5716
place_of_death            0.716      0.491      0.655        159       5535
contains                  0.876      0.769      0.852       3904       9280
worked_at   

In [20]:
def glove_bag_of_words_featurizer(kbt, corpus, feature_counter, glove_lookup,
                                context_section='middle',
                                use_middle_length=False,
                                glove_dims=300): # can be 'left', 'right', or 'middle'
    fwd_glove_vector = np.zeros(glove_dims)
    bwd_glove_vector = np.zeros(glove_dims)

    sbj_glove = glove_lookup.get(kbt.sbj, np.array([random.uniform(-0.5, 0.5) for i in range(glove_dims)]))
    obj_glove = glove_lookup.get(kbt.obj, np.array([random.uniform(-0.5, 0.5) for i in range(glove_dims)]))

    feature_prefix = "sbj_glove:"
    for i, feature in enumerate(sbj_glove):
        feature_counter[feature_prefix + str(i)] = feature
        
    feature_prefix = "obj_glove:"
    for i, feature in enumerate(obj_glove):
        feature_counter[feature_prefix + str(i)] = feature
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
        for word in words:
            fwd_glove_vector += glove_lookup.get(word, np.array([random.uniform(-0.5, 0.5) for i in range(glove_dims)]))/len(words)
        if use_middle_length:
            feature_counter['FWD_NUM_WORD_IN_MIDDLE']  += len(words)
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
        for word in words:
            bwd_glove_vector += glove_lookup.get(word, np.array([random.uniform(-0.5, 0.5) for i in range(glove_dims)]))/len(words)
        if use_middle_length:
            feature_counter['BWD_NUM_WORD_IN_MIDDLE']  += len(words)
    
    feature_prefix = "fwd_glove_:"
    for i, feature in enumerate(fwd_glove_vector):
        feature_counter[feature_prefix + str(i)] = feature

    feature_prefix = "bwd_glove_:"
    for i, feature in enumerate(bwd_glove_vector):
        feature_counter[feature_prefix + str(i)] = feature
    
    return feature_counter

In [129]:
'''glove_featurizer = partial(glove_bag_of_words_featurizer, context_section='all', glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_featurizer],
    model_factory=model_factory_2k,
    verbose=True)'''

"glove_featurizer = partial(glove_bag_of_words_featurizer, context_section='all', glove_lookup=glove_lookup)\n\nglove_results = rel_ext.experiment(\n    splits,\n    train_split='train',\n    test_split='dev',\n    featurizers=[glove_featurizer],\n    model_factory=model_factory_2k,\n    verbose=True)"

In [None]:
glove_length_featurizer = partial(glove_bag_of_words_featurizer, 
                                  context_section='all', 
                                  use_middle_length=True, glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_length_featurizer],
    model_factory=model_factory_2k,
    verbose=True)

In [None]:
def ensembled_bow_pos_glove_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True, include_left=True, include_right=True)
    feature_counter = bigram_pos_tag_featurizer(kbt, corpus, feature_counter, use_left=True)
    feature_counter =  ngrams_bag_of_words_featurizer(kbt, corpus, feature_counter, n=2, directional=True, 
                                        use_left=True, use_right=True)
    return glove_bag_of_words_featurizer(kbt, corpus, feature_counter, 
                                         context_section='all', glove_lookup=glove_lookup)

In [None]:
glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_glove_featurizer],
    model_factory=model_factory_4k,
    verbose=True)

In [None]:
import dill
dill.dump_session('notebook_env.db')
#dill.load_session('notebook_env.db')

## Bake-off [1 point]

For the bake-off, we will release a test set right after class on April 29. The announcement will go out on Piazza. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

To enter the bake-off, upload this notebook on Canvas:

https://canvas.stanford.edu/courses/99711/assignments/187248

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

The bake-off will close at 4:30 pm on May 1. Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

In [35]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.
rel_ext.bake_off_experiment(ensembled_bow_pos_ngrams_direct_results,
    rel_ext_data_home,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
parents                   0.909      0.705      0.860        427       7111
has_sibling               0.885      0.709      0.843        717       7401
place_of_birth            0.820      0.533      0.740        291       6975
author                    0.891      0.726      0.852        645       7329
founders                  0.758      0.565      0.710        444       7128
genre                     0.871      0.324      0.652        188       6872
film_performance          0.871      0.743      0.842       1011       7695
nationality               0.841      0.593      0.776        383       7067
adjoins                   0.829      0.587      0.766        438       7122
place_of_death            0.775      0.465      0.684        200       6884
contains                  0.862      0.774      0.843       3808      10492
worked_at   

In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
0.755