# Homework and bake-off: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney and Christopher Potts"
__version__ = "CS224u, Stanford, Fall 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baselines](#Baselines)
  1. [Hand-build feature functions](#Hand-build-feature-functions)
  1. [Distributed representations](#Distributed-representations)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 points]](#Different-model-factory-[1-points])
  1. [Directional unigram features [1.5 points]](#Directional-unigram-features-[1.5-points])
  1. [The part-of-speech tags of the "middle" words [1.5 points]](#The-part-of-speech-tags-of-the-"middle"-words-[1.5-points])
  1. [Bag of Synsets [2 points]](#Bag-of-Synsets-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [2]:
import numpy as np
import os
import rel_ext
from sklearn.linear_model import LogisticRegression
import utils

In [3]:
# Set all the random seeds for reproducibility. Only the
# system seed is relevant for this notebook.

utils.fix_random_seeds()

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [4]:
rel_ext_data_home = os.path.join('data', 'rel_ext_data')

In [5]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [6]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [7]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [8]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [9]:
splits

{'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'all': Corpus with 331,696 examples; KB with 45,884 triples}

## Baselines

### Hand-build feature functions

In [10]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [11]:
featurizers = [simple_bag_of_words_featurizer]

In [12]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

In [13]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.868      0.385      0.694        340       5716
author                    0.819      0.552      0.747        509       5885
capital                   0.654      0.179      0.427         95       5471
contains                  0.801      0.589      0.747       3904       9280
film_performance          0.752      0.557      0.703        766       6142
founders                  0.808      0.387      0.663        380       5756
genre                     0.528      0.165      0.366        170       5546
has_sibling               0.816      0.230      0.541        499       5875
has_spouse                0.840      0.318      0.633        594       5970
is_a                      0.766      0.223      0.515        497       5873
nationality               0.598      0.173      0.401        301       5677
parents     

Studying model weights might yield insights:

In [14]:
rel_ext.examine_model_weights(baseline_results)

Highest and lowest feature weights for relation adjoins:

     2.494 Córdoba
     2.382 Taluks
     2.253 Valais
     ..... .....
    -1.592 who
    -1.595 6th
    -1.689 century

Highest and lowest feature weights for relation author:

     2.678 books
     2.642 author
     2.446 musical
     ..... .....
    -2.050 directed
    -2.136 1774
    -2.725 infamous

Highest and lowest feature weights for relation capital:

     3.275 capital
     1.633 city
     1.631 posted
     ..... .....
    -1.336 and
    -2.597 Province
    -2.605 Isfahan

Highest and lowest feature weights for relation contains:

     2.408 bordered
     2.236 affiliated
     2.134 bounded
     ..... .....
    -2.968 Midlands
    -3.047 Lancashire
    -3.440 6th

Highest and lowest feature weights for relation film_performance:

     4.076 starring
     3.809 alongside
     3.103 co-starring
     ..... .....
    -1.993 Zigzag
    -2.015 spy
    -3.983 double

Highest and lowest feature weights for relation founders:

### Distributed representations

This simple baseline sums the GloVe vector representations for all of the words in the "middle" span and feeds those representations into the standard `LogisticRegression`-based `model_factory`. The crucial parameter that enables this is `vectorize=False`. This essentially says to `rel_ext.experiment` that your featurizer or your model will do the work of turning examples into vectors; in that case, `rel_ext.experiment` just organizes these representations by relation type.

In [15]:
GLOVE_HOME = os.path.join('data', 'glove.6B')

In [16]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [17]:
def glove_middle_featurizer(kbt, corpus, np_func=np.sum):
    reps = []
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split():
            rep = glove_lookup.get(word)
            if rep is not None:
                reps.append(rep)
    # A random representation of the right dimensionality if the
    # example happens not to overlap with GloVe's vocabulary:
    if len(reps) == 0:
        dim = len(next(iter(glove_lookup.values())))
        return utils.randvec(n=dim)
    else:
        return np_func(reps, axis=0)

In [18]:
glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_middle_featurizer],
    vectorize=False, # Crucial for this featurizer!
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.890      0.453      0.746        340       5716
author                    0.873      0.446      0.733        509       5885
capital                   0.607      0.179      0.411         95       5471
contains                  0.662      0.407      0.588       3904       9280
film_performance          0.817      0.333      0.633        766       6142
founders                  0.770      0.247      0.541        380       5756
genre                     0.526      0.059      0.203        170       5546
has_sibling               0.824      0.244      0.559        499       5875
has_spouse                0.835      0.342      0.648        594       5970
is_a                      0.809      0.145      0.422        497       5873
nationality               0.635      0.179      0.421        301       5677
parents     

With the same basic code design, one can also use the PyTorch models included in the course repo, or write new ones that are better aligned with the task. For those models, it's likely that the featurizer will just return a list of tokens (or perhaps a list of lists of tokens), and the model will map those into vectors using an embedding.

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 points]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A wrapper function `run_svm_model_factory` that does the following: 

1. Uses `rel_ext.experiment` with the model factory set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values. 
1. Trains on the 'train' part of `splits`.
1. Assesses on the `dev` part of `splits`.
1. Uses `featurizers` as defined above. 
1. Returns the return value of `rel_ext.experiment` for this set-up.

The function `test_run_svm_model_factory` will check that your function conforms to these general specifications.

In [19]:
from sklearn.svm import SVC
svm_model = lambda: SVC(kernel='linear')

In [20]:
def run_svm_model_factory():
    results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=featurizers,
        model_factory=svm_model,
        verbose=True)
    return results

In [21]:
def test_run_svm_model_factory(run_svm_model_factory):
    results = run_svm_model_factory()
    assert 'featurizers' in results, \
        "The return value of `run_svm_model_factory` seems not to be correct"
    # Check one of the models to make sure it's an SVC:
    assert 'SVC' in results['models']['adjoins'].__class__.__name__, \
        "It looks like the model factor wasn't set to use an SVC."

In [22]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_svm_model_factory(run_svm_model_factory)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.810      0.365      0.651        340       5716
author                    0.763      0.607      0.726        509       5885
capital                   0.730      0.284      0.556         95       5471
contains                  0.785      0.603      0.741       3904       9280
film_performance          0.732      0.612      0.704        766       6142
founders                  0.748      0.437      0.655        380       5756
genre                     0.558      0.253      0.450        170       5546
has_sibling               0.758      0.251      0.539        499       5875
has_spouse                0.829      0.350      0.651        594       5970
is_a                      0.612      0.270      0.488        497       5873
nationality               0.510      0.176      0.370        301       5677
parents     

### Directional unigram features [1.5 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example.  The included function `test_directional_bag_of_words_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? Include the code needed for getting this value. (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it!)

In [23]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter):
    # Append these to the end of the keys you add/access in
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
directional_bag_of_words_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_bag_of_words_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.884      0.403      0.714        340       5716
author                    0.860      0.601      0.792        509       5885
capital                   0.742      0.242      0.525         95       5471
contains                  0.832      0.648      0.787       3904       9280
film_performance          0.822      0.651      0.781        766       6142
founders                  0.827      0.403      0.683        380       5756
genre                     0.696      0.229      0.495        170       5546
has_sibling               0.828      0.251      0.567        499       5875
has_spouse                0.851      0.347      0.659        594       5970
is_a                      0.781      0.237      0.536        497       5873
nationality               0.618      0.209      0.444        301       5677
parents     

In [24]:
def test_directional_bag_of_words_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['is_OS'] += 5
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'is_OS':6,'a_OS':1,'webcomic_OS':1,'created_OS':1,'by_OS':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [25]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_directional_bag_of_words_featurizer(corpus)

In [26]:
# How many feature names does the vectorizer have for the experiment run in the previous step? 40537
directional_bag_of_words_vectorizer = directional_bag_of_words_results['vectorizer']
directional_bag_of_words_features_names = directional_bag_of_words_vectorizer.get_feature_names()
print('number of feature names: {}'.format(len(directional_bag_of_words_features_names)))

number of feature names: 40688


### The part-of-speech tags of the "middle" words [1.5 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.  Don't forget the start and end tags, to model those environments properly! The included function `test_middle_bigram_pos_tag_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

In [27]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for bigram in get_tag_bigrams(ex.middle_POS):
            feature_counter[bigram] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for bigram in get_tag_bigrams(ex.middle_POS):
            feature_counter[bigram] += 1

    return feature_counter


def get_tag_bigrams(s):
    """Suggested helper method for `middle_bigram_pos_tag_featurizer`.
    This should be defined so that it returns a list of str, where each
    element is a POS bigram."""
    # The values of `start_symbol` and `end_symbol` are defined
    # here so that you can use `test_middle_bigram_pos_tag_featurizer`.
    start_symbol = "<s>"
    end_symbol = "</s>"
    
    tags = [start_symbol] + get_tags(s) + [end_symbol]
    return get_bigrams(tags)


def get_bigrams(tags):
    """Given a list of POS sequences, returns a list of bigram POS sequences"""
    return [t[0] + ' ' + t[1] for t in zip(tags[:-1], tags[1:])]


def get_tags(s):
    """Given a sequence of word/POS elements (lemmas), this function
    returns a list containing just the POS elements, in order.
    """
    return [parse_lem(lem)[1] for lem in s.strip().split(' ') if lem]


def parse_lem(lem):
    """Helper method for parsing word/POS elements. It just splits
    on the rightmost / and returns (word, POS) as a tuple of str."""
    return lem.strip().rsplit('/', 1)

# Call to `rel_ext.experiment`:
middle_bigram_pos_tag_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[middle_bigram_pos_tag_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.861      0.347      0.664        340       5716
author                    0.747      0.336      0.600        509       5885
capital                   0.438      0.147      0.314         95       5471
contains                  0.752      0.589      0.713       3904       9280
film_performance          0.696      0.443      0.625        766       6142
founders                  0.555      0.174      0.386        380       5756
genre                     0.577      0.176      0.397        170       5546
has_sibling               0.661      0.164      0.412        499       5875
has_spouse                0.749      0.256      0.541        594       5970
is_a                      0.607      0.165      0.395        497       5873
nationality               0.434      0.076      0.224        301       5677
parents     

In [28]:
def test_middle_bigram_pos_tag_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['<s> VBZ'] += 5
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'<s> VBZ':6,'VBZ DT':1,'DT JJ':1,'JJ VBN':1,'VBN IN':1,'IN </s>':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [29]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_middle_bigram_pos_tag_featurizer(corpus)

### Bag of Synsets [2 points]

The following allows you to use NLTK's WordNet API to get the synsets compatible with _dog_ as used as a noun:

```
from nltk.corpus import wordnet as wn
dog = wn.synsets('dog', pos='n')
dog
[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01')]
```

This question asks you to create synset-based features from the word/tag pairs in `middle_POS`.

__To submit:__

1. A feature function `synset_featurizer` that is just like `simple_bag_of_words_featurizer` except that it returns a list of synsets derived from `middle_POS`. Stringify these objects with `str` so that they can be `dict` keys. Use `convert_tag` (included below) to convert tags to `pos` arguments usable by `wn.synsets`. The included function `test_synset_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `synset_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment`.)

In [30]:
from nltk.corpus import wordnet as wn

def synset_featurizer(kbt, corpus, feature_counter):

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for synset in get_synsets(ex.middle_POS):
            feature_counter[synset] += 1

    return feature_counter


def get_synsets(s):
    """Suggested helper method for `synset_featurizer`. This should
    be completed so that it returns a list of stringified Synsets
    associated with elements of `s`.
    """
    # Use `parse_lem` from the previous question to get a list of
    # (word, POS) pairs. Remember to convert the POS strings.
    wt = [parse_lem(lem) for lem in s.strip().split(' ') if lem]

    return [str(synset) for p in wt for synset in wn.synsets(p[0], convert_tag(p[1]))]


def convert_tag(t):
    """Converts tags so that they can be used by WordNet:

    | Tag begins with | WordNet tag |
    |-----------------|-------------|
    | `N`             | `n`         |
    | `V`             | `v`         |
    | `J`             | `a`         |
    | `R`             | `r`         |
    | Otherwise       | `None`      |
    """
    if t[0].lower() in {'n', 'v', 'r'}:
        return t[0].lower()
    elif t[0].lower() == 'j':
        return 'a'
    else:
        return None


# Call to `rel_ext.experiment`:
synset_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[synset_featurizer],
        verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.792      0.359      0.638        340       5716
author                    0.792      0.450      0.688        509       5885
capital                   0.762      0.168      0.447         95       5471
contains                  0.792      0.586      0.740       3904       9280
film_performance          0.763      0.554      0.709        766       6142
founders                  0.739      0.366      0.614        380       5756
genre                     0.500      0.206      0.389        170       5546
has_sibling               0.775      0.220      0.515        499       5875
has_spouse                0.812      0.306      0.611        594       5970
is_a                      0.615      0.231      0.462        497       5873
nationality               0.455      0.153      0.326        301       5677
parents     

In [31]:
def test_synset_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter["Synset('be.v.01')"] += 5
    feature_counter = synset_featurizer(kbt, corpus, feature_counter)
    # The full return values for this tend to be long, so we just
    # test a few examples to avoid cluttering up this notebook.
    test_cases = {
        "Synset('be.v.01')": 6,
        "Synset('embody.v.02')": 1
    }
    for ss, expected in test_cases.items():
        result = feature_counter[ss]
        assert result == expected, \
            "Incorrect count for {}: Expected {}; Got {}".format(ss, expected, result)

In [32]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_synset_featurizer(corpus)

In [62]:
import nltk
from nltk import word_tokenize
from nltk.util import ngrams

In [96]:
def get_ngrams(tokens, n=1):
    if n == 1:
        return [' '.join(ngram) for ngram in ngrams(tokens, n)]
    else:
        return [' '.join(ngram) for ngram 
                in ngrams(tokens, n, pad_left=True, pad_right=True, 
                          left_pad_symbol='<s>', right_pad_symbol='</s>')]

In [114]:
def directional_middle_unigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(ex.middle.split(' '), 1):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(ex.middle.split(' '), 1):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter

# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.884      0.403      0.714        340       5716
author                    0.860      0.601      0.792        509       5885
capital                   0.742      0.242      0.525         95       5471
contains                  0.832      0.648      0.787       3904       9280
film_performance          0.822      0.651      0.781        766       6142
founders                  0.827      0.403      0.683        380       5756
genre                     0.696      0.229      0.495        170       5546
has_sibling               0.828      0.251      0.567        499       5875
has_spouse                0.851      0.347      0.659        594       5970
is_a                      0.781      0.237      0.536        497       5873
nationality               0.618      0.209      0.444        301       5677
parents     

In [115]:
def directional_middle_bigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(ex.middle.split(' '), 2):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(ex.middle.split(' '), 2):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_bigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.878      0.403      0.711        340       5716
author                    0.890      0.619      0.818        509       5885
capital                   0.765      0.274      0.563         95       5471
contains                  0.848      0.674      0.806       3904       9280
film_performance          0.895      0.634      0.827        766       6142
founders                  0.877      0.395      0.705        380       5756
genre                     0.824      0.329      0.633        170       5546
has_sibling               0.865      0.244      0.574        499       5875
has_spouse                0.895      0.332      0.668        594       5970
is_a                      0.791      0.260      0.561        497       5873
nationality               0.739      0.216      0.498        301       5677
parents     

In [116]:
def directional_middle_trigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"

    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(ex.middle.split(' '), 3):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(ex.middle.split(' '), 3):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_trigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.905      0.394      0.719        340       5716
author                    0.911      0.605      0.828        509       5885
capital                   0.778      0.221      0.517         95       5471
contains                  0.781      0.737      0.772       3904       9280
film_performance          0.895      0.624      0.824        766       6142
founders                  0.871      0.355      0.675        380       5756
genre                     0.831      0.288      0.603        170       5546
has_sibling               0.899      0.214      0.549        499       5875
has_spouse                0.910      0.306      0.653        594       5970
is_a                      0.826      0.247      0.563        497       5873
nationality               0.762      0.203      0.491        301       5677
parents     

In [108]:
def directional_middle_pos_unigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 1):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 1):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter

# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_pos_unigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.867      0.365      0.680        340       5716
author                    0.761      0.281      0.567        509       5885
capital                   0.516      0.168      0.365         95       5471
contains                  0.786      0.587      0.736       3904       9280
film_performance          0.667      0.354      0.567        766       6142
founders                  0.609      0.139      0.364        380       5756
genre                     0.579      0.065      0.224        170       5546
has_sibling               0.678      0.122      0.355        499       5875
has_spouse                0.772      0.194      0.483        594       5970
is_a                      0.645      0.080      0.268        497       5873
nationality               0.455      0.050      0.173        301       5677
parents     

In [109]:
def directional_middle_pos_bigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 2):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 2):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_pos_bigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.859      0.376      0.684        340       5716
author                    0.825      0.462      0.713        509       5885
capital                   0.500      0.179      0.368         95       5471
contains                  0.806      0.640      0.766       3904       9280
film_performance          0.765      0.547      0.708        766       6142
founders                  0.649      0.258      0.498        380       5756
genre                     0.712      0.218      0.489        170       5546
has_sibling               0.669      0.182      0.436        499       5875
has_spouse                0.763      0.288      0.574        594       5970
is_a                      0.667      0.221      0.475        497       5873
nationality               0.516      0.159      0.357        301       5677
parents     

In [105]:
def directional_middle_pos_trigram_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 3):
            feature_counter[ngram + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for ngram in get_ngrams(get_tags(ex.middle_POS), 3):
            feature_counter[ngram + object_subject_suffix] += 1

    return feature_counter


# Call to `rel_ext.experiment`:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_pos_trigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.854      0.394      0.692        340       5716
author                    0.805      0.470      0.704        509       5885
capital                   0.552      0.168      0.379         95       5471
contains                  0.793      0.670      0.765       3904       9280
film_performance          0.800      0.590      0.747        766       6142
founders                  0.710      0.316      0.568        380       5756
genre                     0.762      0.282      0.569        170       5546
has_sibling               0.695      0.182      0.445        499       5875
has_spouse                0.762      0.274      0.562        594       5970
is_a                      0.711      0.288      0.550        497       5873
nationality               0.545      0.179      0.387        301       5677
parents     

In [86]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.881      0.391      0.704        340       5716
author                    0.898      0.640      0.831        509       5885
capital                   0.793      0.242      0.545         95       5471
contains                  0.787      0.761      0.781       3904       9280
film_performance          0.886      0.692      0.839        766       6142
founders                  0.864      0.418      0.712        380       5756
genre                     0.805      0.365      0.649        170       5546
has_sibling               0.853      0.244      0.570        499       5875
has_spouse                0.897      0.352      0.685        594       5970
is_a                      0.840      0.296      0.614        497       5873
nationality               0.634      0.236      0.474        301       5677
parents     

In [100]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.879      0.385      0.700        340       5716
author                    0.893      0.639      0.827        509       5885
capital                   0.759      0.232      0.521         95       5471
contains                  0.788      0.754      0.781       3904       9280
film_performance          0.881      0.678      0.831        766       6142
founders                  0.865      0.421      0.714        380       5756
genre                     0.822      0.353      0.649        170       5546
has_sibling               0.831      0.236      0.553        499       5875
has_spouse                0.912      0.347      0.688        594       5970
is_a                      0.824      0.310      0.618        497       5873
nationality               0.657      0.292      0.526        301       5677
parents     

In [113]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.895      0.400      0.717        340       5716
author                    0.891      0.658      0.832        509       5885
capital                   0.742      0.242      0.525         95       5471
contains                  0.783      0.764      0.779       3904       9280
film_performance          0.875      0.693      0.831        766       6142
founders                  0.830      0.424      0.696        380       5756
genre                     0.798      0.418      0.675        170       5546
has_sibling               0.858      0.255      0.582        499       5875
has_spouse                0.884      0.360      0.685        594       5970
is_a                      0.785      0.346      0.626        497       5873
nationality               0.603      0.302      0.503        301       5677
parents     

In [112]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.893      0.391      0.710        340       5716
author                    0.873      0.662      0.821        509       5885
capital                   0.724      0.221      0.498         95       5471
contains                  0.786      0.759      0.781       3904       9280
film_performance          0.873      0.697      0.831        766       6142
founders                  0.850      0.461      0.727        380       5756
genre                     0.830      0.429      0.699        170       5546
has_sibling               0.831      0.246      0.564        499       5875
has_spouse                0.886      0.354      0.681        594       5970
is_a                      0.801      0.364      0.646        497       5873
nationality               0.626      0.306      0.517        301       5677
parents     

In [131]:
def detailed_entity_mention_pos_featurizer(kbt, corpus, feature_counter):
    first_entity_subject_object_suffix = "_1_SO"
    second_entity_subject_object_suffix = "_2_SO"
    first_entity_object_subject_suffix = "_1_OS"
    second_entity_object_subject_suffix = "_2_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        feature_counter[' '.join(get_tags(ex.mention_1_POS)) + first_entity_subject_object_suffix] += 1
        feature_counter[' '.join(get_tags(ex.mention_2_POS)) + second_entity_subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        feature_counter[' '.join(get_tags(ex.mention_1_POS)) + first_entity_object_subject_suffix] += 1
        feature_counter[' '.join(get_tags(ex.mention_2_POS)) + second_entity_object_subject_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[detailed_entity_mention_pos_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.853      0.291      0.616        340       5716
author                    0.848      0.405      0.695        509       5885
capital                   0.435      0.105      0.267         95       5471
contains                  0.713      0.427      0.628       3904       9280
film_performance          0.784      0.499      0.704        766       6142
founders                  0.767      0.182      0.466        380       5756
genre                     0.894      0.347      0.680        170       5546
has_sibling               0.484      0.030      0.120        499       5875
has_spouse                0.757      0.131      0.388        594       5970
is_a                      0.716      0.213      0.487        497       5873
nationality               0.586      0.056      0.204        301       5677
parents     

In [None]:
def merged_detailed_entity_mention_pos_featurizer(kbt, corpus, feature_counter):
    first_entity_subject_object_suffix = "_1_SO"
    second_entity_subject_object_suffix = "_2_SO"
    first_entity_object_subject_suffix = "_1_OS"
    second_entity_object_subject_suffix = "_2_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        feature_counter[' '.join(set(get_tags(ex.mention_1_POS))) + first_entity_subject_object_suffix] += 1
        feature_counter[' '.join(set(get_tags(ex.mention_2_POS))) + second_entity_subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        feature_counter[' '.join(set(get_tags(ex.mention_1_POS))) + first_entity_subject_object_suffix] += 1
        feature_counter[' '.join(set(get_tags(ex.mention_2_POS))) + second_entity_subject_object_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[merged_detailed_entity_mention_pos_featurizer],
        verbose=True)

In [132]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer],
        verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.872      0.379      0.692        340       5716
author                    0.883      0.680      0.833        509       5885
capital                   0.750      0.284      0.565         95       5471
contains                  0.853      0.819      0.846       3904       9280
film_performance          0.890      0.783      0.867        766       6142
founders                  0.806      0.524      0.727        380       5756
genre                     0.882      0.618      0.813        170       5546
has_sibling               0.815      0.265      0.575        499       5875
has_spouse                0.863      0.370      0.682        594       5970
is_a                      0.728      0.505      0.669        497       5873
nationality               0.674      0.385      0.586        301       5677
parents     

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     merged_detailed_entity_mention_pos_featurizer],
        verbose=True)

In [136]:
from nltk.tag import StanfordNERTagger

stanford_ner_home = os.path.join('', 'stanford-ner-4.0.0')

st = StanfordNERTagger(os.path.join(stanford_ner_home, 'classifiers', 'english.conll.4class.distsim.crf.ser.gz'), 
                       os.path.join(stanford_ner_home, 'stanford-ner.jar'))

[('Rami', 'PERSON'),
 ('Eid', 'PERSON'),
 ('is', 'O'),
 ('studying', 'O'),
 ('at', 'O'),
 ('Stony', 'ORGANIZATION'),
 ('Brook', 'ORGANIZATION'),
 ('University', 'ORGANIZATION'),
 ('in', 'O'),
 ('NY', 'O')]

In [None]:
def get_sent(ex):
    return ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right))

def get_sent_token(ex):
    return get_sent(ex).split(' ')

def get_mention_ner_tags(mention, sent_tokens, ner_tags):
    mention_tokens = mention.split(' ')
    return set(ner_tags[sent_tokens.index(t)][1] for t in mention_tokens)

def entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags))] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags))] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags))] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags))] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     entity_mention_ner_featurizer],
        verbose=True)

In [None]:
def prime_entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[prime_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     prime_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
def directional_entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags)) + subject_object_suffix] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags)) + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags)) + object_subject_suffix] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags)) + object_subject_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     directional_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
def prime_directional_entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner + subject_object_suffix] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner + subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner + object_subject_suffix] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner + object_subject_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[prime_directional_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     prime_directional_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
def detailed_entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    first_entity_subject_object_suffix = "_1_SO"
    second_entity_subject_object_suffix = "_2_SO"
    first_entity_object_subject_suffix = "_1_OS"
    second_entity_object_subject_suffix = "_2_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags)) + first_entity_subject_object_suffix] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags)) + second_entity_subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags)) + first_entity_object_subject_suffix] += 1
        feature_counter[' '.join(get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags)) + second_entity_object_subject_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[detailed_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     detailed_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
def prime_detailed_entity_mention_ner_featurizer(kbt, corpus, feature_counter):
    first_entity_subject_object_suffix = "_1_SO"
    second_entity_subject_object_suffix = "_2_SO"
    first_entity_object_subject_suffix = "_1_OS"
    second_entity_object_subject_suffix = "_2_OS"
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner + first_entity_subject_object_suffix] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner + second_entity_subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        sent_tokens = get_sent_token(ex)
        ner_tags = st.tag(sent_tokens) 
        for ner in get_mention_ner_tags(ex.mention_1, sent_tokens, ner_tags):
            feature_counter[ner + first_entity_object_subject_suffix] += 1
        for ner in get_mention_ner_tags(ex.mention_2, sent_tokens, ner_tags):
            feature_counter[ner + second_entity_object_subject_suffix] += 1
            
    return feature_counter


_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[prime_detailed_entity_mention_ner_featurizer],
        verbose=True)

In [None]:
_ = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=[directional_middle_unigram_featurizer, 
                     directional_middle_bigram_featurizer, 
                     directional_middle_trigram_featurizer,
                     directional_middle_pos_unigram_featurizer, 
                     directional_middle_pos_bigram_featurizer,
                     directional_middle_pos_trigram_featurizer,
                     detailed_entity_mention_pos_featurizer,
                     prime_detailed_entity_mention_ner_featurizer],
        verbose=True)

### Your original system [3 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle. (NO)
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams). (Yes)
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies. We also ask that you report the best score your system got during development, just to help us understand how systems performed overall.

In [33]:
# PLEASE MAKE SURE TO INCLUDE THE FOLLOWING BETWEEN THE START AND STOP COMMENTS:
#   1) Textual description of your system.
#   2) The code for your original system.
#   3) The score achieved by your system in place of MY_NUMBER.
#        With no other changes to that line.
#        You should report your score as a decimal value <=1.0
# PLEASE MAKE SURE NOT TO DELETE OR EDIT THE START AND STOP COMMENTS

# IMPORT ANY MODULES BELOW THE 'IS_GRADESCOPE_ENV' CHECK CONDITION. DOING
# SO ABOVE THE CHECK MAY CAUSE THE AUTOGRADER TO FAIL.

# START COMMENT: Enter your system description in this cell.
# My peak score was: MY_NUMBER
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass

# STOP COMMENT: Please do not remove this comment.

## Bake-off [1 point]

For the bake-off, we will release a test set. The announcement will go out on the discussion forum. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [39]:
# Enter your bake-off assessment code in this cell.
# Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your code in the scope of the above conditional.
    ##### YOUR CODE HERE


In [40]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported
# by the code above. Please enter only a number between
# 0 and 1 inclusive. Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your score in the scope of the above conditional.
    ##### YOUR CODE HERE
