# Homework 3: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney"
__version__ = "CS224U, Stanford, Spring 2019"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baseline](#Baseline)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 point]](#Different-model-factory-[1-point])
  1. [Directional unigram features [2 points]](#Directional-unigram-features-[2-points])
  1. [The part-of-speech tags of the "middle" words [2 points]](#The-part-of-speech-tags-of-the-"middle"-words-[2-points])
  1. [Your original system [4 points]](#Your-original-system-[4-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to the developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [1]:
from functools import partial
import numpy as np
import os
import rel_ext
from nltk.corpus import wordnet as wn
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
import string
import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [2]:
DATA_HOME = '/home/kd/data/data'
rel_ext_data_home = os.path.join(DATA_HOME, 'rel_ext_data')
GLOVE_HOME = os.path.join(DATA_HOME, 'glove.6B')

In [3]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
splits

{'all': Corpus with 331,696 examples; KB with 45,884 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples}

## Baseline

In [36]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [10]:
featurizers = [simple_bag_of_words_featurizer]

In [11]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')
model_factory_400 = lambda: LogisticRegression(fit_intercept=True, solver='liblinear', max_iter=2000)

In [13]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.864      0.374      0.684        340       5716
parents                   0.834      0.532      0.749        312       5688
founders                  0.835      0.413      0.693        380       5756
genre                     0.558      0.171      0.384        170       5546
worked_at                 0.739      0.269      0.547        242       5618
contains                  0.797      0.605      0.750       3904       9280
author                    0.830      0.507      0.736        509       5885
nationality               0.675      0.179      0.435        301       5677
film_performance          0.783      0.547      0.721        766       6142
place_of_birth            0.657      0.197      0.448        233       5609
is_a                      0.671      0.225      0.481        497       5873
place_of_dea

Studying model weights might yield insights:

In [14]:
rel_ext.examine_model_weights(baseline_results)

Highest and lowest feature weights for relation adjoins:

     2.542 Taluks
     2.473 Córdoba
     2.453 Valais
     ..... .....
    -1.336 America
    -1.382 Spain
    -2.202 Earth

Highest and lowest feature weights for relation parents:

     5.260 son
     4.418 daughter
     4.296 father
     ..... .....
    -1.355 Tyndareus
    -1.597 Tina
    -1.843 played

Highest and lowest feature weights for relation founders:

     4.087 founder
     3.728 founded
     3.064 co-founder
     ..... .....
    -1.589 MD
    -1.600 novel
    -1.926 band

Highest and lowest feature weights for relation genre:

     2.787 series
     2.647 movie
     2.456 album
     ..... .....
    -1.702 starring
    -1.783 at
    -1.955 original

Highest and lowest feature weights for relation worked_at:

     3.077 CEO
     2.860 professor
     2.672 president
     ..... .....
    -1.374 India
    -1.651 or
    -1.746 state

Highest and lowest feature weights for relation contains:

     2.313 districts
     

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 point]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A call to `rel_ext.experiment` training on the 'train' part of `splits` and assessing on its `dev` part, with `featurizers` as defined above in this notebook and the `model_factory` set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values.

In [25]:
svc_model_factory = lambda: SVC(kernel='linear')

svc_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.752      0.627      0.723        766       6142
place_of_birth            0.602      0.215      0.442        233       5609
nationality               0.496      0.189      0.375        301       5677
place_of_death            0.450      0.113      0.282        159       5535
has_spouse                0.842      0.342      0.651        594       5970
genre                     0.516      0.276      0.440        170       5546
founders                  0.725      0.424      0.635        380       5756
has_sibling               0.774      0.255      0.550        499       5875
is_a                      0.608      0.284      0.495        497       5873
profession                0.577      0.259      0.463        247       5623
worked_at                 0.622      0.306      0.515        242       5618
contains    

### Directional unigram features [2 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example. The precise nature of the mark you add for the two cases doesn't make a difference to the model.

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it on Piazza!)

In [58]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter, fwd_prefix='$FWD_DIRECTION: ', 
                                        bwd_prefix='$BWD_DIREECTION: ', use_middle_length=False,
                                        use_entities=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = ex.middle.split(' ')
        for word in words:
            word_direction = fwd_prefix + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter['FWD_NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = ex.middle.split(' ')
        for word in words:
            word_direction = bwd_prefix + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter['FWD_NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
    return feature_counter

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.848      0.646      0.798        766       6142
place_of_birth            0.692      0.232      0.495        233       5609
nationality               0.615      0.213      0.446        301       5677
place_of_death            0.537      0.138      0.341        159       5535
has_spouse                0.880      0.345      0.672        594       5970
genre                     0.652      0.265      0.504        170       5546
founders                  0.822      0.413      0.686        380       5756
has_sibling               0.857      0.253      0.580        499       5875
is_a                      0.735      0.229      0.510        497       5873
profession                0.722      0.231      0.506        247       5623
worked_at                 0.727      0.264      0.539        242       5618
contains    

### The part-of-speech tags of the "middle" words [2 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.
   
   Don't forget the start and end tags, to model those environments properly!

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

Note: To parse `middle_POS`, one splits on whitespace to get the `word/TAG` pairs. Each of these pairs `s` can be parsed with `s.rsplit('/', 1)`.

In [13]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        word_POSs = ex.middle_POS.split(' ')
        len_POS = len(word_POSs)
        for i in range(-1, len_POS - 1):
            pos = word_POSs[i].rsplit('/', 1)
            bigram = ""
            if len(pos) > 1:
                if i == -1:
                    bigram = '<s> ' + pos[1]
                elif i == len_POS - 2:
                    bigram = pos[1] + ' </s>'
                else:
                    bigram = pos[1] + " " + word_POSs[i+1].rsplit('/', 1)[1]
            feature_counter[bigram] += 1
    return feature_counter

ngram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[middle_bigram_pos_tag_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.696      0.239      0.503        766       6142
place_of_birth            0.667      0.206      0.461        233       5609
nationality               0.564      0.176      0.391        301       5677
place_of_death            0.615      0.151      0.381        159       5535
has_spouse                0.795      0.254      0.558        594       5970
genre                     0.750      0.071      0.256        170       5546
founders                  0.746      0.139      0.399        380       5756
has_sibling               0.745      0.158      0.428        499       5875
is_a                      0.690      0.157      0.411        497       5873
profession                0.758      0.190      0.475        247       5623
worked_at                 0.578      0.153      0.371        242       5618
contains    

### Your original system [4 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?
- Consider adding features based on WordNet synsets. Here's a little code to get you started with that:
  ```
  from nltk.corpus import wordnet as wn
  dog_compatible_synsets = wn.synsets('dog', pos='n')
 ```

In [14]:
simple_bag_of_words_middle_featurizer = partial(simple_bag_of_words_featurizer,use_middle_length=True)

In [15]:
bow_middle_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.781      0.551      0.721        766       6142
place_of_birth            0.623      0.206      0.444        233       5609
nationality               0.594      0.199      0.426        301       5677
place_of_death            0.526      0.126      0.322        159       5535
has_spouse                0.880      0.320      0.652        594       5970
genre                     0.604      0.188      0.419        170       5546
founders                  0.788      0.392      0.656        380       5756
has_sibling               0.861      0.236      0.564        499       5875
is_a                      0.707      0.199      0.468        497       5873
profession                0.578      0.194      0.415        247       5623
worked_at                 0.706      0.248      0.515        242       5618
contains    

In [16]:
def ngrams_bag_of_words_featurizer(kbt, corpus, feature_counter, n=2, use_middle_length=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = ex.middle.split(' ')
        for i in range(0, len(words), n):
            end = i + n
            if (len(words) - i) < n:
                end = len(words)
            n_gram = ' '.join(words[i:end])
            feature_counter[n_gram] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = ex.middle.split(' ')
        for i in range(0, len(words), n):
            end = i + n
            if (len(words) - i) < n:
                end = len(words)
            n_gram = ' '.join(words[i:end])
            feature_counter[n_gram] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
            
    return feature_counter

In [17]:
bigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=2)
trigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=3)

In [18]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.824      0.465      0.714        766       6142
place_of_birth            0.652      0.193      0.442        233       5609
nationality               0.554      0.136      0.343        301       5677
place_of_death            0.471      0.050      0.176        159       5535
has_spouse                0.898      0.268      0.611        594       5970
genre                     0.738      0.182      0.459        170       5546
founders                  0.757      0.287      0.570        380       5756
has_sibling               0.849      0.202      0.518        499       5875
is_a                      0.710      0.177      0.443        497       5873
profession                0.649      0.150      0.389        247       5623
worked_at                 0.683      0.178      0.435        242       5618
contains    

In [19]:
trigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[trigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.795      0.384      0.654        766       6142
place_of_birth            0.719      0.099      0.319        233       5609
nationality               0.677      0.140      0.383        301       5677
place_of_death            0.385      0.031      0.118        159       5535
has_spouse                0.914      0.232      0.576        594       5970
genre                     0.862      0.147      0.437        170       5546
founders                  0.757      0.213      0.501        380       5756
has_sibling               0.852      0.138      0.419        499       5875
is_a                      0.769      0.121      0.371        497       5873
profession                0.742      0.093      0.310        247       5623
worked_at                 0.560      0.116      0.317        242       5618
contains    

In [20]:
bigrams_bag_of_words_featurizer_use_middle = partial(ngrams_bag_of_words_featurizer, n=2, use_middle_length=True)

bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer_use_middle],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.818      0.470      0.713        766       6142
place_of_birth            0.692      0.193      0.456        233       5609
nationality               0.592      0.140      0.359        301       5677
place_of_death            0.467      0.044      0.160        159       5535
has_spouse                0.874      0.256      0.589        594       5970
genre                     0.738      0.182      0.459        170       5546
founders                  0.766      0.276      0.566        380       5756
has_sibling               0.890      0.178      0.495        499       5875
is_a                      0.731      0.175      0.447        497       5873
profession                0.684      0.158      0.411        247       5623
worked_at                 0.682      0.186      0.445        242       5618
contains    

In [21]:
simple_bag_of_words_entities_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True)

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_entities_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.827      0.551      0.752        766       6142
place_of_birth            0.724      0.236      0.512        233       5609
nationality               0.678      0.266      0.517        301       5677
place_of_death            0.559      0.119      0.322        159       5535
has_spouse                0.931      0.295      0.650        594       5970
genre                     0.829      0.371      0.665        170       5546
founders                  0.794      0.376      0.650        380       5756
has_sibling               0.930      0.240      0.591        499       5875
is_a                      0.812      0.348      0.641        497       5873
profession                0.800      0.340      0.630        247       5623
worked_at                 0.744      0.277      0.556        242       5618
contains    

In [22]:
left_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='left')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[left_bag_of_words_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.769      0.453      0.675        766       6142
place_of_birth            0.435      0.086      0.240        233       5609
nationality               0.451      0.136      0.308        301       5677
place_of_death            0.130      0.019      0.060        159       5535
has_spouse                0.653      0.133      0.366        594       5970
genre                     0.706      0.212      0.481        170       5546
founders                  0.522      0.092      0.270        380       5756
has_sibling               0.783      0.253      0.551        499       5875
is_a                      0.642      0.282      0.511        497       5873
profession                0.800      0.211      0.513        247       5623
worked_at                 0.659      0.120      0.347        242       5618
contains    

In [23]:
right_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='right')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[right_bag_of_words_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.729      0.386      0.619        766       6142
place_of_birth            0.326      0.060      0.173        233       5609
nationality               0.400      0.100      0.250        301       5677
place_of_death            0.421      0.050      0.170        159       5535
has_spouse                0.537      0.121      0.319        594       5970
genre                     0.750      0.212      0.497        170       5546
founders                  0.517      0.082      0.250        380       5756
has_sibling               0.676      0.146      0.392        499       5875
is_a                      0.619      0.219      0.454        497       5873
profession                0.765      0.158      0.432        247       5623
worked_at                 0.526      0.083      0.254        242       5618
contains    

In [26]:
simple_bag_of_words_middle_entities_featurizer = partial(simple_bag_of_words_featurizer, 
                                                         use_entities=True, use_middle_length=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_entities_featurizer],
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.751      0.617      0.720        766       6142
place_of_birth            0.562      0.253      0.452        233       5609
nationality               0.521      0.326      0.465        301       5677
place_of_death            0.383      0.145      0.288        159       5535
has_spouse                0.872      0.332      0.658        594       5970
genre                     0.764      0.553      0.710        170       5546
founders                  0.803      0.418      0.678        380       5756
has_sibling               0.884      0.305      0.640        499       5875
is_a                      0.742      0.493      0.674        497       5873
profession                0.756      0.478      0.677        247       5623
worked_at                 0.692      0.343      0.575        242       5618
contains    

In [27]:
def simple_bag_of_words_featurizer2(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    synset_prefix = "synset_:"
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [28]:
simple_bag_of_words_synsets_featurizer = partial(simple_bag_of_words_featurizer2, 
                                                         use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_synsets_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.698      0.615      0.679        766       6142
place_of_birth            0.460      0.223      0.380        233       5609
nationality               0.374      0.243      0.338        301       5677
place_of_death            0.329      0.145      0.262        159       5535
has_spouse                0.795      0.327      0.618        594       5970
genre                     0.400      0.306      0.377        170       5546
founders                  0.630      0.447      0.582        380       5756
has_sibling               0.730      0.261      0.537        499       5875
is_a                      0.496      0.262      0.421        497       5873
profession                0.458      0.219      0.376        247       5623
worked_at                 0.565      0.306      0.483        242       5618
contains    

In [29]:
simple_bag_of_words_all_featurizer = partial(simple_bag_of_words_featurizer2, 
                                            use_entities=True, use_middle_length=True, use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_all_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.738      0.598      0.705        766       6142
place_of_birth            0.481      0.219      0.388        233       5609
nationality               0.484      0.306      0.434        301       5677
place_of_death            0.403      0.157      0.307        159       5535
has_spouse                0.877      0.300      0.633        594       5970
genre                     0.574      0.412      0.532        170       5546
founders                  0.647      0.400      0.576        380       5756
has_sibling               0.772      0.244      0.539        499       5875
is_a                      0.676      0.356      0.573        497       5873
profession                0.730      0.328      0.586        247       5623
worked_at                 0.615      0.277      0.494        242       5618
contains    

In [37]:
simple_bag_of_words_all_featurizer = partial(simple_bag_of_words_featurizer2, 
                                            use_entities=True, use_middle_length=True, use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_all_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.738      0.598      0.705        766       6142
place_of_birth            0.481      0.219      0.388        233       5609
nationality               0.484      0.306      0.434        301       5677
place_of_death            0.403      0.157      0.307        159       5535
has_spouse                0.877      0.300      0.633        594       5970
genre                     0.574      0.412      0.532        170       5546
founders                  0.647      0.400      0.576        380       5756
has_sibling               0.772      0.244      0.539        499       5875
is_a                      0.676      0.356      0.573        497       5873
profession                0.730      0.328      0.586        247       5623
worked_at                 0.615      0.277      0.494        242       5618
contains    

In [39]:
directional_middle_featurizer = partial(directional_bag_of_words_featurizer, use_middle_length=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.847      0.645      0.797        766       6142
place_of_birth            0.696      0.236      0.501        233       5609
nationality               0.619      0.216      0.451        301       5677
place_of_death            0.550      0.138      0.345        159       5535
has_spouse                0.896      0.347      0.680        594       5970
genre                     0.687      0.271      0.525        170       5546
founders                  0.824      0.418      0.690        380       5756
has_sibling               0.897      0.244      0.585        499       5875
is_a                      0.752      0.225      0.512        497       5873
profession                0.744      0.235      0.519        247       5623
worked_at                 0.727      0.264      0.539        242       5618
contains    

In [40]:
directional_middle_entities_featurizer = partial(directional_bag_of_words_featurizer, 
                                                 use_middle_length=True, use_entities=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_entities_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.854      0.633      0.798        766       6142
place_of_birth            0.776      0.253      0.549        233       5609
nationality               0.709      0.316      0.568        301       5677
place_of_death            0.578      0.164      0.383        159       5535
has_spouse                0.944      0.311      0.671        594       5970
genre                     0.875      0.412      0.714        170       5546
founders                  0.833      0.382      0.674        380       5756
has_sibling               0.947      0.251      0.609        499       5875
is_a                      0.816      0.374      0.660        497       5873
profession                0.828      0.389      0.675        247       5623
worked_at                 0.758      0.285      0.569        242       5618
contains    

In [61]:
def ensembled_bow_pos_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True) 
    return middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)

In [62]:
ensembled_bow_pos_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.864      0.638      0.807        766       6142
place_of_birth            0.838      0.288      0.606        233       5609
nationality               0.769      0.365      0.630        301       5677
place_of_death            0.689      0.195      0.457        159       5535
has_spouse                0.959      0.316      0.682        594       5970
genre                     0.857      0.424      0.711        170       5546
founders                  0.853      0.382      0.684        380       5756
has_sibling               0.955      0.255      0.616        499       5875
is_a                      0.835      0.416      0.695        497       5873
profession                0.860      0.421      0.711        247       5623
worked_at                 0.762      0.318      0.596        242       5618
contains    

In [63]:
def ensembled_bow_ngrams_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True) 
    return bigrams_bag_of_words_featurizer(kbt, corpus, feature_counter)

In [65]:
ensembled_bow_ngrams_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_ngrams_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.877      0.653      0.821        766       6142
place_of_birth            0.759      0.258      0.546        233       5609
nationality               0.772      0.382      0.641        301       5677
place_of_death            0.657      0.145      0.385        159       5535
has_spouse                0.917      0.296      0.646        594       5970
genre                     0.899      0.418      0.730        170       5546
founders                  0.848      0.366      0.671        380       5756
has_sibling               0.960      0.242      0.603        499       5875
is_a                      0.857      0.386      0.689        497       5873
profession                0.868      0.401      0.704        247       5623
worked_at                 0.784      0.285      0.581        242       5618
contains    

In [66]:
def ensembled_bow_pos_ngrams_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True)
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    return bigrams_bag_of_words_featurizer(kbt, corpus, feature_counter)

In [67]:
ensembled_bow_pos_ngrams_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_ngrams_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.872      0.648      0.815        766       6142
place_of_birth            0.797      0.270      0.574        233       5609
nationality               0.803      0.392      0.664        301       5677
place_of_death            0.682      0.189      0.448        159       5535
has_spouse                0.907      0.296      0.642        594       5970
genre                     0.862      0.441      0.724        170       5546
founders                  0.843      0.382      0.679        380       5756
has_sibling               0.960      0.242      0.603        499       5875
is_a                      0.847      0.412      0.700        497       5873
profession                0.863      0.409      0.706        247       5623
worked_at                 0.781      0.310      0.599        242       5618
contains    

In [41]:
def glove_bag_of_words_featurizer(kbt, corpus, feature_counter, glove_lookup,
                                context_section='middle',
                                use_middle_length=False,
                                glove_dims=300): # can be 'left', 'right', or 'middle'
    glove_vector = np.zeros(glove_dims)
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
            for word in words:
                glove_vector += glove_lookup.get(word, np.zeros(glove_dims))
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
            for word in words:
                glove_vector += glove_lookup.get(word, np.zeros(glove_dims))
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
    
    feature_prefix = "glove_:"
    for i, feature in enumerate(glove_vector):
        feature_counter[feature_prefix + str(i)] = feature
    return feature_counter

In [42]:
glove_featurizer = partial(glove_bag_of_words_featurizer, context_section='all', glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.768      0.556      0.713        766       6142
place_of_birth            0.563      0.210      0.422        233       5609
nationality               0.485      0.163      0.348        301       5677
place_of_death            0.389      0.132      0.280        159       5535
has_spouse                0.813      0.359      0.649        594       5970
genre                     0.693      0.465      0.631        170       5546
founders                  0.731      0.421      0.637        380       5756
has_sibling               0.724      0.263      0.536        499       5875
is_a                      0.630      0.322      0.529        497       5873
profession                0.596      0.377      0.534        247       5623
worked_at                 0.559      0.236      0.438        242       5618
contains    

In [43]:
glove_length_featurizer = partial(glove_bag_of_words_featurizer, 
                                  context_section='all', 
                                  use_middle_length=True, glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_length_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.774      0.550      0.715        766       6142
place_of_birth            0.605      0.210      0.440        233       5609
nationality               0.475      0.159      0.340        301       5677
place_of_death            0.389      0.132      0.280        159       5535
has_spouse                0.813      0.374      0.658        594       5970
genre                     0.669      0.476      0.619        170       5546
founders                  0.741      0.429      0.647        380       5756
has_sibling               0.716      0.263      0.532        499       5875
is_a                      0.636      0.334      0.539        497       5873
profession                0.605      0.385      0.543        247       5623
worked_at                 0.569      0.256      0.457        242       5618
contains    

In [69]:
def ensembled_bow_pos_glove_featurizer(kbt, corpus, feature_counter):
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter, use_middle_length=True,
                                        use_entities=True)
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    return glove_bag_of_words_featurizer(kbt, corpus, feature_counter, 
                                         context_section='all', glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[ensembled_bow_pos_glove_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
film_performance          0.851      0.674      0.809        766       6142
place_of_birth            0.681      0.339      0.567        233       5609
nationality               0.725      0.402      0.624        301       5677
place_of_death            0.519      0.264      0.435        159       5535
has_spouse                0.850      0.343      0.656        594       5970
genre                     0.735      0.588      0.700        170       5546
founders                  0.775      0.453      0.678        380       5756
has_sibling               0.819      0.281      0.592        499       5875
is_a                      0.744      0.473      0.667        497       5873
profession                0.725      0.490      0.661        247       5623
worked_at                 0.750      0.384      0.630        242       5618
contains    

## Bake-off [1 point]

For the bake-off, we will release a test set right after class on April 29. The announcement will go out on Piazza. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

To enter the bake-off, upload this notebook on Canvas:

https://canvas.stanford.edu/courses/99711/assignments/187248

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

The bake-off will close at 4:30 pm on May 1. Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

In [None]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.


In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
