# Homework 3: Relation extraction using distant supervision

In [1]:
__author__ = "Bill MacCartney"
__version__ = "CS224U, Stanford, Spring 2019"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baseline](#Baseline)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 point]](#Different-model-factory-[1-point])
  1. [Directional unigram features [2 points]](#Directional-unigram-features-[2-points])
  1. [The part-of-speech tags of the "middle" words [2 points]](#The-part-of-speech-tags-of-the-"middle"-words-[2-points])
  1. [Your original system [4 points]](#Your-original-system-[4-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to the developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [1]:
from functools import partial
import numpy as np
import os
import rel_ext
from nltk.corpus import wordnet as wn
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
import string
import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [2]:
DATA_HOME = '/home/kd/data/data'
rel_ext_data_home = os.path.join(DATA_HOME, 'rel_ext_data')
GLOVE_HOME = os.path.join(DATA_HOME, 'glove.6B')

In [3]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
splits

{'all': Corpus with 331,696 examples; KB with 45,884 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples}

## Baseline

In [10]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            print("word[1]="+words[0])
            print("pos_s[1]="+pos_s[0])
            for word, pos in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for pos_pair in pos_s:
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word, pos_word)
                    for syn in synsets:
                        feature_counter[syn.lemma()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [11]:
featurizers = [simple_bag_of_words_featurizer]

In [12]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')
model_factory_400 = lambda: LogisticRegression(fit_intercept=True, solver='liblinear', max_iter=2000)

In [13]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.864      0.374      0.684        340       5716
parents                   0.834      0.532      0.749        312       5688
founders                  0.835      0.413      0.693        380       5756
genre                     0.558      0.171      0.384        170       5546
worked_at                 0.739      0.269      0.547        242       5618
contains                  0.797      0.605      0.750       3904       9280
author                    0.830      0.507      0.736        509       5885
nationality               0.675      0.179      0.435        301       5677
film_performance          0.783      0.547      0.721        766       6142
place_of_birth            0.657      0.197      0.448        233       5609
is_a                      0.671      0.225      0.481        497       5873
place_of_dea

Studying model weights might yield insights:

In [14]:
rel_ext.examine_model_weights(baseline_results)

Highest and lowest feature weights for relation adjoins:

     2.542 Taluks
     2.473 Córdoba
     2.453 Valais
     ..... .....
    -1.336 America
    -1.382 Spain
    -2.202 Earth

Highest and lowest feature weights for relation parents:

     5.260 son
     4.418 daughter
     4.296 father
     ..... .....
    -1.355 Tyndareus
    -1.597 Tina
    -1.843 played

Highest and lowest feature weights for relation founders:

     4.087 founder
     3.728 founded
     3.064 co-founder
     ..... .....
    -1.589 MD
    -1.600 novel
    -1.926 band

Highest and lowest feature weights for relation genre:

     2.787 series
     2.647 movie
     2.456 album
     ..... .....
    -1.702 starring
    -1.783 at
    -1.955 original

Highest and lowest feature weights for relation worked_at:

     3.077 CEO
     2.860 professor
     2.672 president
     ..... .....
    -1.374 India
    -1.651 or
    -1.746 state

Highest and lowest feature weights for relation contains:

     2.313 districts
     

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 point]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A call to `rel_ext.experiment` training on the 'train' part of `splits` and assessing on its `dev` part, with `featurizers` as defined above in this notebook and the `model_factory` set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values.

In [15]:
svc_model_factory = lambda: SVC(kernel='linear')

svc_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.788      0.318      0.608        340       5716
parents                   0.780      0.590      0.732        312       5688
founders                  0.790      0.445      0.684        380       5756
genre                     0.558      0.253      0.450        170       5546
worked_at                 0.627      0.306      0.518        242       5618
contains                  0.780      0.608      0.739       3904       9280
author                    0.777      0.601      0.734        509       5885
nationality               0.571      0.199      0.416        301       5677
film_performance          0.767      0.610      0.729        766       6142
place_of_birth            0.578      0.223      0.438        233       5609
is_a                      0.609      0.270      0.487        497       5873
place_of_dea

### Directional unigram features [2 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example. The precise nature of the mark you add for the two cases doesn't make a difference to the model.

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it on Piazza!)

In [67]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter, fwd_prefix='$FWD_DIRECTION: ', 
                                        bwd_prefix='$BWD_DIREECTION: ', use_middle_length=False,
                                        use_entities=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = ex.middle.split(' ')
        for word in words:
            word_direction = fwd_prefix + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter['FWD_NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = ex.middle.split(' ')
        for word in words:
            word_direction = bwd_prefix + word
            feature_counter[word_direction] += 1
        if use_middle_length:
            feature_counter['FWD_NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
    return feature_counter

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.733      0.273      0.548        242       5618
profession                0.716      0.235      0.508        247       5623
capital                   0.571      0.253      0.456         95       5471
genre                     0.784      0.235      0.535        170       5546
adjoins                   0.833      0.412      0.692        340       5716
place_of_death            0.512      0.138      0.332        159       5535
is_a                      0.717      0.249      0.521        497       5873
parents                   0.844      0.519      0.750        312       5688
has_spouse                0.847      0.354      0.662        594       5970
film_performance          0.842      0.653      0.796        766       6142
place_of_birth            0.684      0.232      0.492        233       5609
founders    

### The part-of-speech tags of the "middle" words [2 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.
   
   Don't forget the start and end tags, to model those environments properly!

2. The macro-average F-score on the `dev` set that you obtain from running `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `experiment` as exemplified above in this notebook.)

Note: To parse `middle_POS`, one splits on whitespace to get the `word/TAG` pairs. Each of these pairs `s` can be parsed with `s.rsplit('/', 1)`.

In [17]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        word_POSs = ex.middle_POS.split(' ')
        len_POS = len(word_POSs)
        for i in range(-1, len_POS - 1):
            pos = word_POSs[i].rsplit('/', 1)
            bigram = ""
            if len(pos) > 1:
                if i == -1:
                    bigram = '<s> ' + pos[1]
                elif i == len_POS - 2:
                    bigram = pos[1] + ' </s>'
                else:
                    bigram = pos[1] + " " + word_POSs[i+1].rsplit('/', 1)[1]
            feature_counter[bigram] += 1
    return feature_counter

ngram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[middle_bigram_pos_tag_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.899      0.391      0.714        340       5716
parents                   0.732      0.228      0.507        312       5688
founders                  0.691      0.147      0.398        380       5756
genre                     0.824      0.082      0.294        170       5546
worked_at                 0.614      0.145      0.372        242       5618
contains                  0.707      0.297      0.554       3904       9280
author                    0.817      0.246      0.558        509       5885
nationality               0.662      0.163      0.410        301       5677
film_performance          0.749      0.242      0.527        766       6142
place_of_birth            0.723      0.202      0.477        233       5609
is_a                      0.707      0.165      0.427        497       5873
place_of_dea

### Your original system [4 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?
- Consider adding features based on WordNet synsets. Here's a little code to get you started with that:
  ```
  from nltk.corpus import wordnet as wn
  dog_compatible_synsets = wn.synsets('dog', pos='n')
 ```

In [23]:
dog_compatible_synsets = wn.synsets('dog', pos='n')

In [27]:
dog_compatible_synsets[0]

SyntaxError: invalid syntax (<ipython-input-27-4fb974f47ea6>, line 1)

In [18]:
svc_rbf_model_factory = lambda: SVC(kernel='rbf')

svc_rbf_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=svc_rbf_model_factory,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.924      0.250      0.600        340       5716
parents                   0.000      0.000      0.000        312       5688
founders                  0.000      0.000      0.000        380       5756
genre                     0.000      0.000      0.000        170       5546
worked_at                 0.000      0.000      0.000        242       5618
contains                  0.891      0.140      0.431       3904       9280
author                    0.833      0.020      0.090        509       5885
nationality               0.000      0.000      0.000        301       5677
film_performance          0.000      0.000      0.000        766       6142
place_of_birth            0.000      0.000      0.000        233       5609
is_a                      0.000      0.000      0.000        497       5873
place_of_dea

  'precision', 'predicted', average, warn_for)


In [19]:
svc_poly_model_factory = lambda: SVC(kernel='poly')

svc_poly_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=svc_poly_model_factory,
    verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   1.000      0.091      0.334        340       5716
parents                   0.000      0.000      0.000        312       5688
founders                  0.000      0.000      0.000        380       5756
genre                     0.000      0.000      0.000        170       5546
worked_at                 0.000      0.000      0.000        242       5618
contains                  1.000      0.001      0.003       3904       9280
author                    0.000      0.000      0.000        509       5885
nationality               0.000      0.000      0.000        301       5677
film_performance          0.000      0.000      0.000        766       6142
place_of_birth            0.000      0.000      0.000        233       5609
is_a                      0.000      0.000      0.000        497       5873
place_of_dea

  'precision', 'predicted', average, warn_for)


In [20]:
gaussian_nb_model_factory = lambda: MultinomialNB()

gaussian_nb_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=gaussian_nb_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.462      0.524      0.473        340       5716
parents                   0.910      0.356      0.694        312       5688
founders                  0.866      0.271      0.602        380       5756
genre                     0.800      0.024      0.105        170       5546
worked_at                 0.955      0.087      0.318        242       5618
contains                  0.723      0.710      0.721       3904       9280
author                    0.779      0.485      0.695        509       5885
nationality               0.815      0.073      0.269        301       5677
film_performance          0.817      0.529      0.736        766       6142
place_of_birth            0.500      0.004      0.021        233       5609
is_a                      0.743      0.105      0.335        497       5873
place_of_dea

  'precision', 'predicted', average, warn_for)


In [21]:
simple_bag_of_words_middle_featurizer = partial(simple_bag_of_words_featurizer,use_middle_length=True)

In [22]:
bow_middle_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.870      0.353      0.673        340       5716
parents                   0.838      0.532      0.752        312       5688
founders                  0.856      0.408      0.702        380       5756
genre                     0.580      0.171      0.392        170       5546
worked_at                 0.765      0.256      0.548        242       5618
contains                  0.801      0.595      0.749       3904       9280
author                    0.846      0.507      0.746        509       5885
nationality               0.679      0.176      0.432        301       5677
film_performance          0.790      0.546      0.725        766       6142
place_of_birth            0.667      0.197      0.452        233       5609
is_a                      0.681      0.219      0.479        497       5873
place_of_dea

In [23]:
def ngrams_bag_of_words_featurizer(kbt, corpus, feature_counter, n=2, use_middle_length=False):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = ex.middle.split(' ')
        for i in range(0, len(words), n):
            end = i + n
            if (len(words) - i) < n:
                end = len(words)
            n_gram = ' '.join(words[i:end])
            feature_counter[n_gram] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = ex.middle.split(' ')
        for i in range(0, len(words), n):
            end = i + n
            if (len(words) - i) < n:
                end = len(words)
            n_gram = ' '.join(words[i:end])
            feature_counter[n_gram] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
            
    return feature_counter

In [24]:
bigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=2)
trigrams_bag_of_words_featurizer = partial(ngrams_bag_of_words_featurizer, n=3)

In [25]:
bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.906      0.341      0.681        340       5716
parents                   0.871      0.391      0.700        312       5688
founders                  0.809      0.279      0.586        380       5756
genre                     0.725      0.171      0.439        170       5546
worked_at                 0.793      0.190      0.485        242       5618
contains                  0.807      0.567      0.744       3904       9280
author                    0.810      0.470      0.708        509       5885
nationality               0.625      0.133      0.359        301       5677
film_performance          0.853      0.456      0.726        766       6142
place_of_birth            0.683      0.185      0.443        233       5609
is_a                      0.773      0.199      0.491        497       5873
place_of_dea

In [26]:
trigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[trigrams_bag_of_words_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.936      0.303      0.660        340       5716
parents                   0.879      0.279      0.614        312       5688
founders                  0.792      0.200      0.497        380       5756
genre                     0.839      0.153      0.442        170       5546
worked_at                 0.805      0.136      0.406        242       5618
contains                  0.793      0.481      0.702       3904       9280
author                    0.814      0.438      0.695        509       5885
nationality               0.707      0.136      0.385        301       5677
film_performance          0.857      0.376      0.682        766       6142
place_of_birth            0.733      0.094      0.312        233       5609
is_a                      0.674      0.117      0.345        497       5873
place_of_dea

In [27]:
bigrams_bag_of_words_featurizer_use_middle = partial(ngrams_bag_of_words_featurizer, n=2, use_middle_length=True)

bigram_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[bigrams_bag_of_words_featurizer_use_middle],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.916      0.321      0.668        340       5716
parents                   0.865      0.391      0.696        312       5688
founders                  0.835      0.279      0.597        380       5756
genre                     0.794      0.159      0.441        170       5546
worked_at                 0.797      0.194      0.492        242       5618
contains                  0.814      0.549      0.742       3904       9280
author                    0.835      0.458      0.717        509       5885
nationality               0.707      0.136      0.385        301       5677
film_performance          0.838      0.461      0.720        766       6142
place_of_birth            0.731      0.163      0.431        233       5609
is_a                      0.746      0.189      0.470        497       5873
place_of_dea

In [28]:
simple_bag_of_words_entities_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True)

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_entities_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.919      0.432      0.750        340       5716
parents                   0.860      0.532      0.766        312       5688
founders                  0.867      0.379      0.690        380       5756
genre                     0.884      0.359      0.684        170       5546
worked_at                 0.765      0.269      0.558        242       5618
contains                  0.777      0.341      0.619       3904       9280
author                    0.831      0.521      0.742        509       5885
nationality               0.623      0.269      0.493        301       5677
film_performance          0.822      0.531      0.741        766       6142
place_of_birth            0.667      0.215      0.469        233       5609
is_a                      0.816      0.392      0.671        497       5873
place_of_dea

In [29]:
left_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='left')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[left_bag_of_words_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.890      0.379      0.701        340       5716
parents                   0.533      0.103      0.290        312       5688
founders                  0.604      0.084      0.270        380       5756
genre                     0.795      0.206      0.506        170       5546
worked_at                 0.605      0.107      0.314        242       5618
contains                  0.698      0.413      0.613       3904       9280
author                    0.764      0.318      0.597        509       5885
nationality               0.513      0.130      0.322        301       5677
film_performance          0.741      0.462      0.661        766       6142
place_of_birth            0.365      0.082      0.215        233       5609
is_a                      0.659      0.276      0.515        497       5873
place_of_dea

In [30]:
right_bag_of_words_featurizer = partial(simple_bag_of_words_featurizer, use_entities=True, context_section='right')

use_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[right_bag_of_words_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.886      0.388      0.705        340       5716
parents                   0.478      0.106      0.281        312       5688
founders                  0.473      0.068      0.217        380       5756
genre                     0.860      0.218      0.541        170       5546
worked_at                 0.500      0.079      0.241        242       5618
contains                  0.645      0.385      0.569       3904       9280
author                    0.672      0.242      0.496        509       5885
nationality               0.424      0.120      0.281        301       5677
film_performance          0.689      0.393      0.599        766       6142
place_of_birth            0.280      0.060      0.162        233       5609
is_a                      0.647      0.243      0.486        497       5873
place_of_dea

In [31]:
simple_bag_of_words_middle_entities_featurizer = partial(simple_bag_of_words_featurizer, 
                                                         use_entities=True, use_middle_length=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_middle_entities_featurizer],
    model_factory=svc_model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.838      0.503      0.740        340       5716
parents                   0.800      0.538      0.729        312       5688
founders                  0.818      0.426      0.691        380       5756
genre                     0.781      0.588      0.733        170       5546
worked_at                 0.653      0.335      0.549        242       5618
contains                  0.774      0.403      0.654       3904       9280
author                    0.787      0.574      0.733        509       5885
nationality               0.510      0.326      0.458        301       5677
film_performance          0.776      0.610      0.736        766       6142
place_of_birth            0.496      0.262      0.421        233       5609
is_a                      0.765      0.551      0.710        497       5873
place_of_dea

In [44]:
def simple_bag_of_words_featurizer2(kbt, corpus, feature_counter, 
                                    use_middle_length=False, 
                                    use_entities=False,
                                    context_section='middle', # can be 'left', 'right', or 'middle'
                                    use_synsets=False):
    synset_prefix = "synset_:"
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')
        
        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        else:
            words = ex.middle.split(' ')

        if use_synsets:            
            pos_s = ex.middle_POS.split(' ')
            for word, pos_pair in zip(words,pos_s):
                if word not in string.punctuation:
                    feature_counter[word] += 1
                    pos_split = pos_pair.rsplit('/', 1)
                    word, pos_word = pos_split[0], pos_split[1]
                    synsets = wn.synsets(word)
                    for syn in synsets:
                        feature_counter[word] += 1
                        for hyponym in syn.hyponyms():
                            feature_counter[synset_prefix+hyponym.name()] += 1
        else: 
            for word in words:
                feature_counter[word] += 1
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
        if use_entities:
            feature_counter[kbt.sbj] += 1
            feature_counter[kbt.obj] += 1
            
    return feature_counter

In [45]:
simple_bag_of_words_synsets_featurizer = partial(simple_bag_of_words_featurizer2, 
                                                         use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_synsets_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.526      0.293      0.454        242       5618
profession                0.442      0.215      0.365        247       5623
capital                   0.397      0.284      0.368         95       5471
genre                     0.468      0.300      0.421        170       5546
adjoins                   0.684      0.344      0.571        340       5716
place_of_death            0.267      0.126      0.218        159       5535
is_a                      0.494      0.264      0.421        497       5873
parents                   0.752      0.554      0.702        312       5688
has_spouse                0.722      0.328      0.582        594       5970
film_performance          0.722      0.624      0.700        766       6142
place_of_birth            0.441      0.223      0.369        233       5609
founders    

In [39]:
simple_bag_of_words_all_featurizer = partial(simple_bag_of_words_featurizer2, 
                                            use_entities=True, use_middle_length=True, use_synsets=True)

use_middle_entities_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[simple_bag_of_words_all_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.742      0.298      0.571        242       5618
profession                0.734      0.368      0.612        247       5623
capital                   0.542      0.274      0.453         95       5471
genre                     0.800      0.400      0.667        170       5546
adjoins                   0.869      0.450      0.733        340       5716
place_of_death            0.375      0.132      0.274        159       5535
is_a                      0.786      0.384      0.650        497       5873
parents                   0.796      0.564      0.736        312       5688
has_spouse                0.864      0.320      0.645        594       5970
film_performance          0.807      0.584      0.749        766       6142
place_of_birth            0.583      0.240      0.454        233       5609
founders    

In [32]:
directional_middle_featurizer = partial(directional_bag_of_words_featurizer, use_middle_length=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_featurizer],
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.873      0.385      0.697        340       5716
parents                   0.846      0.510      0.747        312       5688
founders                  0.856      0.421      0.709        380       5756
genre                     0.710      0.259      0.526        170       5546
worked_at                 0.821      0.264      0.578        242       5618
contains                  0.846      0.665      0.803       3904       9280
author                    0.868      0.580      0.789        509       5885
nationality               0.649      0.203      0.451        301       5677
film_performance          0.863      0.648      0.809        766       6142
place_of_birth            0.736      0.227      0.509        233       5609
is_a                      0.773      0.239      0.535        497       5873
place_of_dea

In [68]:
directional_middle_entities_featurizer = partial(directional_bag_of_words_featurizer, 
                                                 use_middle_length=True, use_entities=True)

directional_bag_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[directional_middle_entities_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.762      0.264      0.554        242       5618
profession                0.867      0.421      0.715        247       5623
capital                   0.632      0.253      0.486         95       5471
genre                     0.883      0.400      0.711        170       5546
adjoins                   0.902      0.432      0.741        340       5716
place_of_death            0.583      0.176      0.399        159       5535
is_a                      0.842      0.396      0.687        497       5873
parents                   0.859      0.510      0.756        312       5688
has_spouse                0.894      0.313      0.652        594       5970
film_performance          0.854      0.634      0.799        766       6142
place_of_birth            0.659      0.249      0.496        233       5609
founders    

In [63]:
def glove_bag_of_words_featurizer(kbt, corpus, feature_counter, glove_lookup,
                                context_section='middle',
                                use_middle_length=False,
                                glove_dims=300): # can be 'left', 'right', or 'middle'
    glove_vector = np.zeros(glove_dims)
    
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
            for word in words:
                glove_vector += glove_lookup.get(word, np.zeros(glove_dims))
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        words = None
        if context_section == 'left':
            words = ex.left.split(' ')
        elif context_section == 'right':
            words = ex.right.split(' ')
        elif context_section == 'middle':
            words = ex.middle.split(' ')
        else:
            #words = ' '.join((ex.left, ex.mention_1, ex.middle, ex.mention_2, ex.right)).split(' ')
            words = ' '.join((ex.mention_1, ex.middle, ex.mention_2)).split(' ')
            for word in words:
                glove_vector += glove_lookup.get(word, np.zeros(glove_dims))
        if use_middle_length:
            feature_counter['NUM_WORD_IN_MIDDLE']  += len(words)
    
    feature_prefix = "glove_:"
    for i, feature in enumerate(glove_vector):
        feature_counter[feature_prefix + str(i)] = feature
    return feature_counter

In [60]:
glove_lookup['sweden'].shape

(300,)

In [61]:
glove_featurizer = partial(glove_bag_of_words_featurizer, context_section='all', glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.619      0.248      0.476        242       5618
profession                0.624      0.377      0.552        247       5623
capital                   0.481      0.274      0.418         95       5471
genre                     0.647      0.453      0.596        170       5546
adjoins                   0.850      0.450      0.722        340       5716
place_of_death            0.317      0.119      0.238        159       5535
is_a                      0.699      0.346      0.581        497       5873
parents                   0.740      0.519      0.682        312       5688
has_spouse                0.748      0.340      0.603        594       5970
film_performance          0.767      0.554      0.712        766       6142
place_of_birth            0.584      0.223      0.441        233       5609
founders    

In [66]:
glove_length_featurizer = partial(glove_bag_of_words_featurizer, 
                                  context_section='all', 
                                  use_middle_length=True, glove_lookup=glove_lookup)

glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_length_featurizer],
    model_factory=model_factory_400,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
worked_at                 0.604      0.252      0.472        242       5618
profession                0.628      0.397      0.563        247       5623
capital                   0.466      0.284      0.413         95       5471
genre                     0.610      0.441      0.566        170       5546
adjoins                   0.851      0.453      0.724        340       5716
place_of_death            0.317      0.119      0.238        159       5535
is_a                      0.654      0.350      0.557        497       5873
parents                   0.750      0.529      0.692        312       5688
has_spouse                0.750      0.359      0.616        594       5970
film_performance          0.773      0.559      0.718        766       6142
place_of_birth            0.622      0.219      0.455        233       5609
founders    

## Bake-off [1 point]

For the bake-off, we will release a test set right after class on April 29. The announcement will go out on Piazza. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

To enter the bake-off, upload this notebook on Canvas:

https://canvas.stanford.edu/courses/99711/assignments/187248

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

The bake-off will close at 4:30 pm on May 1. Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

In [None]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.


In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
