# Homework and bake-off: Relation extraction using distant supervision

In [2]:
__author__ = "Bill MacCartney and Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Baselines](#Baselines)
  1. [Hand-build feature functions](#Hand-build-feature-functions)
  1. [Distributed representations](#Distributed-representations)
1. [Homework questions](#Homework-questions)
  1. [Different model factory [1 points]](#Different-model-factory-[1-points])
  1. [Directional unigram features [1.5 points]](#Directional-unigram-features-[1.5-points])
  1. [The part-of-speech tags of the "middle" words [1.5 points]](#The-part-of-speech-tags-of-the-"middle"-words-[1.5-points])
  1. [Bag of Synsets [2 points]](#Bag-of-Synsets-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

This homework and associated bake-off are devoted to the developing really effective relation extraction systems using distant supervision. 

As with the previous assignments, this notebook first establishes a baseline system. The initial homework questions ask you to create additional baselines and suggest areas for innovation, and the final homework question asks you to develop an original system for you to enter into the bake-off.

## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [1]:
import numpy as np
import os
import rel_ext
from sklearn.linear_model import LogisticRegression
import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [3]:
rel_ext_data_home = os.path.join('data', 'rel_ext_data')

In [4]:
corpus = rel_ext.Corpus(os.path.join(rel_ext_data_home, 'corpus.tsv.gz'))

In [5]:
kb = rel_ext.KB(os.path.join(rel_ext_data_home, 'kb.tsv.gz'))

In [6]:
dataset = rel_ext.Dataset(corpus, kb)

You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [7]:
splits = dataset.build_splits(
    split_names=['tiny', 'train', 'dev'],
    split_fracs=[0.01, 0.79, 0.20],
    seed=1)

In [8]:
splits

{'tiny': Corpus with 3,474 examples; KB with 445 triples,
 'train': Corpus with 263,285 examples; KB with 36,191 triples,
 'dev': Corpus with 64,937 examples; KB with 9,248 triples,
 'all': Corpus with 331,696 examples; KB with 45,884 triples}

## Baselines

### Hand-build feature functions

In [9]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [17]:
featurizers = [simple_bag_of_words_featurizer]

In [18]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

In [12]:
baseline_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.854      0.379      0.683        340       5716
author                    0.797      0.538      0.727        509       5885
capital                   0.559      0.200      0.411         95       5471
contains                  0.793      0.601      0.745       3904       9280
film_performance          0.779      0.567      0.725        766       6142
founders                  0.782      0.397      0.655        380       5756
genre                     0.540      0.159      0.365        170       5546
has_sibling               0.872      0.246      0.579        499       5875
has_spouse                0.879      0.318      0.650        594       5970
is_a                      0.667      0.225      0.479        497       5873
nationality               0.667      0.186      0.440        301       5677
parents     

Studying model weights might yield insights:

In [None]:
rel_ext.examine_model_weights(baseline_results)

### Distributed representations

This simple baseline sums the GloVe vector representations for all of the words in the "middle" span and feeds those representations into the standard `LogisticRegression`-based `model_factory`. The crucial parameter that enables this is `vectorize=False`. This essentially says to `rel_ext.experiment` that your featurizer or your model will do the work of turning examples into vectors; in that case, `rel_ext.experiment` just organizes these representations by relation type.

In [10]:
GLOVE_HOME = os.path.join('data', 'glove.6B')

In [11]:
glove_lookup = utils.glove2dict(
    os.path.join(GLOVE_HOME, 'glove.6B.300d.txt'))

In [12]:
def glove_middle_featurizer(kbt, corpus, np_func=np.sum):
    reps = []
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split():
            rep = glove_lookup.get(word)
            if rep is not None:
                reps.append(rep)
    # A random representation of the right dimensionality if the
    # example happens not to overlap with GloVe's vocabulary:
    if len(reps) == 0:
        dim = len(next(iter(glove_lookup.values())))                
        return utils.randvec(n=dim)
    else:
        return np_func(reps, axis=0)

In [16]:
glove_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=[glove_middle_featurizer],    
    vectorize=False, # Crucial for this featurizer!
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.883      0.444      0.737        340       5716
author                    0.867      0.424      0.718        509       5885
capital                   0.639      0.242      0.481         95       5471
contains                  0.652      0.412      0.584       3904       9280
film_performance          0.800      0.334      0.626        766       6142
founders                  0.795      0.234      0.537        380       5756
genre                     0.370      0.059      0.180        170       5546
has_sibling               0.822      0.240      0.554        499       5875
has_spouse                0.857      0.354      0.667        594       5970
is_a                      0.748      0.161      0.432        497       5873
nationality               0.646      0.206      0.453        301       5677
parents     

With the same basic code design, one can also use the PyTorch models included in the course repo, or write new ones that are better aligned with the task. For those models, it's likely that the featurizer will just return a list of tokens (or perhaps a list of lists of tokens), and the model will map those into vectors using an embedding.

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Different model factory [1 points]

The code in `rel_ext` makes it very easy to experiment with other classifier models: one need only redefine the `model_factory` argument. This question asks you to assess a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

__To submit:__ A wrapper function `run_svm_model_factory` that does the following: 

1. Uses `rel_ext.experiment` with the model factory set to one based in an `SVC` with `kernel='linear'` and all other arguments left with default values. 
1. Trains on the 'train' part of `splits`.
1. Assesses on the `dev` part of `splits`.
1. Uses `featurizers` as defined above. 
1. Returns the return value of `rel_ext.experiment` for this set-up.

The function `test_run_svm_model_factory` will check that your function conforms to these general specifications.

In [17]:
def run_svm_model_factory():
    from sklearn.svm import SVC
    ##### YOUR CODE HERE
    model_factory = lambda: SVC(kernel='linear')
    
    svc_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=featurizers,
        model_factory=model_factory,
        verbose=True)
    
    return svc_results


In [18]:
def test_run_svm_model_factory(run_svm_model_factory):
    results = run_svm_model_factory()
    assert 'featurizers' in results, \
        "The return value of `run_svm_model_factory` seems not to be correct"
    # Check one of the models to make sure it's an SVC:
    assert 'SVC' in results['models']['adjoins'].__class__.__name__, \
        "It looks like the model factor wasn't set to use an SVC."    

In [19]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_svm_model_factory(run_svm_model_factory)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.757      0.385      0.635        340       5716
author                    0.728      0.595      0.697        509       5885
capital                   0.556      0.263      0.455         95       5471
contains                  0.783      0.603      0.739       3904       9280
film_performance          0.750      0.619      0.719        766       6142
founders                  0.739      0.463      0.661        380       5756
genre                     0.543      0.259      0.445        170       5546
has_sibling               0.778      0.238      0.536        499       5875
has_spouse                0.839      0.360      0.663        594       5970
is_a                      0.597      0.272      0.482        497       5873
nationality               0.539      0.183      0.388        301       5677
parents     

### Directional unigram features [1.5 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example.  The included function `test_directional_bag_of_words_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? Include the code needed for getting this value. (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it!)

In [19]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter): 
    # Append these to the end of the keys you add/access in 
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass 
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word+subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word+object_subject_suffix] += 1
    return feature_counter


# Call to `rel_ext.experiment`:
##### YOUR CODE HERE    
directional_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers = [directional_bag_of_words_featurizer],
        model_factory=model_factory,
        verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.884      0.426      0.728        340       5716
author                    0.825      0.591      0.764        509       5885
capital                   0.656      0.221      0.471         95       5471
contains                  0.830      0.641      0.784       3904       9280
film_performance          0.829      0.653      0.787        766       6142
founders                  0.797      0.403      0.666        380       5756
genre                     0.677      0.259      0.512        170       5546
has_sibling               0.848      0.246      0.570        499       5875
has_spouse                0.839      0.352      0.657        594       5970
is_a                      0.738      0.243      0.525        497       5873
nationality               0.581      0.203      0.423        301       5677
parents     

In [22]:
def test_directional_bag_of_words_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['is_OS'] += 5
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'is_OS':6,'a_OS':1,'webcomic_OS':1,'created_OS':1,'by_OS':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [23]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_directional_bag_of_words_featurizer(corpus)

### The part-of-speech tags of the "middle" words [1.5 points]

Our corpus distribution contains part-of-speech (POS) tagged versions of the core text spans. Let's begin to explore whether there is information in these sequences, focusing on `middle_POS`.

__To submit:__

1. A feature function `middle_bigram_pos_tag_featurizer` that is just like `simple_bag_of_words_featurizer` except that it creates a feature for bigram POS sequences. For example, given 

  `The/DT dog/N napped/V`
  
   we obtain the list of bigram POS sequences
  
   `b = ['<s> DT', 'DT N', 'N V', 'V </s>']`. 
   
   Of course, `middle_bigram_pos_tag_featurizer` should return count dictionaries defined in terms of such bigram POS lists, on the model of `simple_bag_of_words_featurizer`.  Don't forget the start and end tags, to model those environments properly! The included function `test_middle_bigram_pos_tag_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `middle_bigram_pos_tag_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

In [20]:
def middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        #get a list of bigram str 
        #count appearance of each bigram pair 
        bigram_list = get_tag_bigrams(ex.middle_POS)
        for bigram in bigram_list:
            feature_counter[bigram] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        bigram_list = get_tag_bigrams(ex.middle_POS)   
        for bigram in bigram_list:
            feature_counter[bigram] += 1

    return feature_counter


def get_tag_bigrams(s):
    """Suggested helper method for `middle_bigram_pos_tag_featurizer`.
    This should be defined so that it returns a list of str, where each 
    element is a POS bigram."""
    # The values of `start_symbol` and `end_symbol` are defined
    # here so that you can use `test_middle_bigram_pos_tag_featurizer`.
    start_symbol = "<s>"
    end_symbol = "</s>"
    
    ##### YOUR CODE HERE
    #a list of tags from sentence 
    result =[]
    tag = get_tags(s)    
    for i in zip([start_symbol]+tag, tag+[end_symbol]):
        result.append(i[0] + ' ' +i[1])
        
    return result

    
def get_tags(s): 
    """Given a sequence of word/POS elements (lemmas), this function
    returns a list containing just the POS elements, in order.    
    """
    return [parse_lem(lem)[1] for lem in s.strip().split(' ') if lem]


def parse_lem(lem):
    """Helper method for parsing word/POS elements. It just splits
    on the rightmost / and returns (word, POS) as a tuple of str."""
    return lem.strip().rsplit('/', 1)  

# Call to `rel_ext.experiment`:
##### YOUR CODE HERE
bigram_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers = [middle_bigram_pos_tag_featurizer],
        model_factory=model_factory,
        verbose=True)



relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.884      0.335      0.666        340       5716
author                    0.697      0.334      0.572        509       5885
capital                   0.500      0.158      0.349         95       5471
contains                  0.761      0.600      0.722       3904       9280
film_performance          0.706      0.450      0.634        766       6142
founders                  0.591      0.179      0.405        380       5756
genre                     0.596      0.182      0.410        170       5546
has_sibling               0.623      0.162      0.397        499       5875
has_spouse                0.775      0.266      0.560        594       5970
is_a                      0.622      0.179      0.416        497       5873
nationality               0.382      0.070      0.202        301       5677
parents     

In [25]:
def test_middle_bigram_pos_tag_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['<s> VBZ'] += 5
    feature_counter = middle_bigram_pos_tag_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'<s> VBZ':6,'VBZ DT':1,'DT JJ':1,'JJ VBN':1,'VBN IN':1,'IN </s>':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [26]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_middle_bigram_pos_tag_featurizer(corpus)

### Bag of Synsets [2 points]

The following allows you to use NLTK's WordNet API to get the synsets compatible with _dog_ as used as a noun:

```
from nltk.corpus import wordnet as wn
dog = wn.synsets('dog', pos='n')
dog
[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01')]
```

This question asks you to create synset-based features from the word/tag pairs in `middle_POS`.

__To submit:__

1. A feature function `synset_featurizer` that is just like `simple_bag_of_words_featurizer` except that it returns a list of synsets derived from `middle_POS`. Stringify these objects with `str` so that they can be `dict` keys. Use `convert_tag` (included below) to convert tags to `pos` arguments usable by `wn.synsets`. The included function `test_synset_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `synset_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment`.)

In [21]:
from nltk.corpus import wordnet as wn

def synset_featurizer(kbt, corpus, feature_counter):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj): 
        #get a list of synsets
        synset_list = get_synsets(ex.middle_POS)
        #flatten the list 
        flatten_synset = [item for items in synset_list for item in items]
        for synet in flatten_synset:
            feature_counter[str(synet)] += 1
            
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        synset_list = get_synsets(ex.middle_POS)   
        flatten_synset = [item for items in synset_list for item in items]
        for synet in flatten_synset:
            feature_counter[str(synet)] += 1

    return feature_counter



def get_synsets(s):
    """Suggested helper method for `synset_featurizer`. This should
    be completed so that it returns a list of stringified Synsets 
    associated with elements of `s`.
    """   
    # Use `parse_lem` from the previous question to get a list of
    # (word, POS) pairs. Remember to convert the POS strings.
    wt = [parse_lem(lem) for lem in s.strip().split(' ') if lem]
    
    ##### YOUR CODE HERE
    converted_wt = []
    for i in wt:
        converted_tag = convert_tag(i[1])
        word =i[0]
        converted_wt.append([word,converted_tag])
        
    return [wn.synsets(i[0], pos=i[1]) for i in converted_wt]

    
    
def convert_tag(t):
    """Converts tags so that they can be used by WordNet:
    
    | Tag begins with | WordNet tag |
    |-----------------|-------------|
    | `N`             | `n`         |
    | `V`             | `v`         |
    | `J`             | `a`         |
    | `R`             | `r`         |
    | Otherwise       | `None`      |
    """        
    if t[0].lower() in {'n', 'v', 'r'}:
        return t[0].lower()
    elif t[0].lower() == 'J':
        return 'a'
    else:
        return None    


# Call to `rel_ext.experiment`:
##### YOUR CODE HERE    
synsets_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers = [synset_featurizer],
        model_factory=model_factory,
        verbose=True)





relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.839      0.353      0.658        340       5716
author                    0.758      0.456      0.669        509       5885
capital                   0.588      0.211      0.433         95       5471
contains                  0.792      0.590      0.742       3904       9280
film_performance          0.781      0.557      0.723        766       6142
founders                  0.718      0.382      0.610        380       5756
genre                     0.552      0.218      0.422        170       5546
has_sibling               0.873      0.220      0.548        499       5875
has_spouse                0.812      0.298      0.604        594       5970
is_a                      0.615      0.225      0.457        497       5873
nationality               0.488      0.140      0.326        301       5677
parents     

In [14]:
def test_synset_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter["Synset('be.v.01')"] += 5
    feature_counter = synset_featurizer(kbt, corpus, feature_counter)
    # The full return values for this tend to be long, so we just
    # test a few examples to avoid cluttering up this notebook.
    test_cases = {
        "Synset('be.v.01')": 6,
        "Synset('embody.v.02')": 1
    }
    for ss, expected in test_cases.items():   
        result = feature_counter[ss]
        assert result == expected, \
            "Incorrect count for {}: Expected {}; Got {}".format(ss, expected, result)

In [29]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_synset_featurizer(corpus)

### Your original system [3 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

In [None]:
# Enter your system description in this cell.
# Please do not remove this comment.

'''
From my own experience, desigining better feature will give the relation extraction system a boost than 
stacking comlicated Machine Learning Models.
So the focus of my system will be on experiencing different featurizers.

Featurizer ideas: 
1. So far we are just buidling feature based on the [mideel] and [middle_POS], we should be using the 
same techniques to see if we can identify some common pattern on the [left] or [right] words. 
2. From bidirectional give a simple system a noticible boost, so this technique will be applied to most 
of the featurizers. 
3. We have tried bigram, should also try trigram ? 
4. From experience, SCV don't give a good result and took a long time to train, so i will be mainly using 
logistic regression for modeling. If have time will try random forest. 


'''

In [22]:
#1. try on stacking existing featurizer 
featurizers_1 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer]
model_factory_1 = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

model1_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_1,
    model_factory=model_factory_1,
    #vectorize=False,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.857      0.406      0.701        340       5716
author                    0.841      0.642      0.792        509       5885
capital                   0.706      0.253      0.519         95       5471
contains                  0.836      0.669      0.796       3904       9280
film_performance          0.832      0.715      0.805        766       6142
founders                  0.830      0.424      0.696        380       5756
genre                     0.690      0.341      0.573        170       5546
has_sibling               0.798      0.261      0.565        499       5875
has_spouse                0.871      0.374      0.688        594       5970
is_a                      0.731      0.334      0.591        497       5873
nationality               0.589      0.252      0.465        301       5677
parents     

In [None]:
#2. try different model
from sklearn.ensemble import RandomForestClassifier
model_factory_2 = lambda: RandomForestClassifier(max_depth=2, random_state=0) #

model2_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_1,
    model_factory=model_factory_2,
    #vectorize=False,
    verbose=True)

#!!!! :( bad result, need more tuning 

In [26]:
def left_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.left.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.left.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [27]:
def right_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.right.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.right.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [28]:
#3. try on stacking other featurizers, synset_featurizer introduce noise rather than information to the model 
featurizers_3 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,left_bag_of_words_featurizer,right_bag_of_words_featurizer]

model_factory_3 = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

model3_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_3,
    model_factory=model_factory_3,
    #vectorize=False,
    verbose=True)


relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.880      0.474      0.751        340       5716
author                    0.857      0.766      0.837        509       5885
capital                   0.550      0.232      0.431         95       5471
contains                  0.833      0.747      0.814       3904       9280
film_performance          0.848      0.736      0.823        766       6142
founders                  0.670      0.384      0.583        380       5756
genre                     0.722      0.306      0.568        170       5546
has_sibling               0.812      0.675      0.780        499       5875
has_spouse                0.810      0.658      0.774        594       5970
is_a                      0.703      0.565      0.670        497       5873
nationality               0.701      0.615      0.682        301       5677
parents     

In [29]:
def middle_unigram_pos_tag_featurizer(kbt, corpus, feature_counter):
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        #get a list of bigram str 
        #count appearance of each bigram pair 
        unigram_list = get_tags(ex.middle_POS)
        for unigram in unigram_list:
            feature_counter[unigram] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        unigram_list = get_tags(ex.middle_POS)   
        for unigram in unigram_list:
            feature_counter[unigram] += 1

    return feature_counter

In [30]:
#4. try on stacking other featurizers, synset_featurizer introduce noise rather than information to the model 
featurizers_4 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,left_bag_of_words_featurizer,right_bag_of_words_featurizer,
                middle_unigram_pos_tag_featurizer]

model_factory_4 = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

model3_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_4,
    model_factory=model_factory_4,
    #vectorize=False,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.883      0.488      0.760        340       5716
author                    0.847      0.764      0.829        509       5885
capital                   0.550      0.232      0.431         95       5471
contains                  0.837      0.747      0.817       3904       9280
film_performance          0.847      0.736      0.822        766       6142
founders                  0.681      0.387      0.591        380       5756
genre                     0.714      0.294      0.556        170       5546
has_sibling               0.814      0.683      0.784        499       5875
has_spouse                0.802      0.657      0.768        594       5970
is_a                      0.711      0.563      0.675        497       5873
nationality               0.705      0.611      0.684        301       5677
parents     

In [40]:
def get_bigrams(s):
    result =[] 
    for i in zip(s[:-1], s[1:]):
        result.append(i[0] + ' ' +i[1])        
    return result

def simple_bigram_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        unigram = ex.middle.split(' ')
        bigram = get_bigrams(unigram)
        for i in bigram:
            feature_counter[i] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        unigram = ex.middle.split(' ')
        bigram = get_bigrams(unigram)
        for i in bigram:
            feature_counter[i] += 1
    return feature_counter

In [42]:
#5. adding bigram  
featurizers_5 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 simple_bigram_featurizer,
                 left_bag_of_words_featurizer,right_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,middle_unigram_pos_tag_featurizer
                ]

model5_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_5,
    model_factory=model_factory_4,
    #vectorize=False,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.873      0.485      0.753        340       5716
author                    0.862      0.774      0.843        509       5885
capital                   0.590      0.242      0.458         95       5471
contains                  0.840      0.751      0.821       3904       9280
film_performance          0.841      0.739      0.818        766       6142
founders                  0.718      0.395      0.617        380       5756
genre                     0.750      0.335      0.601        170       5546
has_sibling               0.816      0.677      0.784        499       5875
has_spouse                0.807      0.662      0.773        594       5970
is_a                      0.727      0.561      0.686        497       5873
nationality               0.732      0.635      0.710        301       5677
parents     

In [44]:
def directional_bigram_featurizer(kbt, corpus, feature_counter): 
    # Append these to the end of the keys you add/access in 
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass 
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"
    
    ##### YOUR CODE HERE
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        unigram = ex.middle.split(' ')
        bigram = get_bigrams(unigram)
        for i in bigram:
            feature_counter[i+subject_object_suffix] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        unigram = ex.middle.split(' ')
        bigram = get_bigrams(unigram)
        for i in bigram:
            feature_counter[i+object_subject_suffix] += 1
    return feature_counter

In [47]:
#6. adding bigram  
featurizers_6 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 simple_bigram_featurizer,directional_bigram_featurizer,
                 left_bag_of_words_featurizer,right_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,middle_unigram_pos_tag_featurizer
                ]

model6_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_6,
    model_factory=model_factory_4,
    #vectorize=False,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.885      0.474      0.754        340       5716
author                    0.866      0.776      0.847        509       5885
capital                   0.585      0.253      0.463         95       5471
contains                  0.842      0.752      0.822       3904       9280
film_performance          0.841      0.736      0.817        766       6142
founders                  0.737      0.405      0.633        380       5756
genre                     0.734      0.341      0.597        170       5546
has_sibling               0.819      0.679      0.787        499       5875
has_spouse                0.746      0.401      0.636        594       5970
is_a                      0.733      0.563      0.691        497       5873
nationality               0.746      0.645      0.723        301       5677
parents     

In [52]:
def middle_length_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        feature_counter[kbt.sbj+kbt.obj]+= len(ex.middle)
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        feature_counter[kbt.sbj+kbt.obj]+= len(ex.middle)
    return feature_counter

In [55]:
#7. adding bigram  
featurizers_7 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 simple_bigram_featurizer,
                 left_bag_of_words_featurizer,right_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,middle_unigram_pos_tag_featurizer,
                 middle_length_featurizer
                ]

model6_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='dev',
    featurizers=featurizers_7,
    model_factory=model_factory_4,
    #vectorize=False,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.950      0.224      0.576        340       5716
author                    0.856      0.631      0.799        509       5885
capital                   0.500      0.032      0.126         95       5471
contains                  0.835      0.754      0.818       3904       9280
film_performance          0.815      0.530      0.736        766       6142
founders                  0.809      0.300      0.604        380       5756
genre                     0.750      0.159      0.430        170       5546
has_sibling               0.839      0.553      0.760        499       5875
has_spouse                0.837      0.500      0.737        594       5970
is_a                      0.798      0.412      0.672        497       5873
nationality               0.827      0.492      0.728        301       5677
parents     

In [81]:
from nltk import ngrams

def get_trigrams(s):
    result =[] 
    n = 3
    trigrams = ngrams(s.split(), 3)
    for i in trigrams:
        result.append(i[0] + ' ' +i[1] +' '+i[2])        
    return result 

def simple_trigram_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        trigram = get_trigrams(ex.middle)
        for i in trigram:
            feature_counter[i] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        trigram = get_trigrams(ex.middle)
        for i in trigram:
            feature_counter[i] += 1
    return feature_counter

In [82]:
featurizers_8 = [simple_bag_of_words_featurizer,directional_bag_of_words_featurizer,
                 simple_bigram_featurizer, simple_trigram_featurizer,
                 left_bag_of_words_featurizer,right_bag_of_words_featurizer,
                 middle_bigram_pos_tag_featurizer,middle_unigram_pos_tag_featurizer
                ]

## Bake-off [1 point]

For the bake-off, we will release a test set. The announcement will go out on the discussion forum. You will evaluate your custom model from the previous question on these new datasets using the function `rel_ext.bake_off_experiment`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [97]:
# Enter your bake-off assessment code in this cell. 
# Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your code in the scope of the above conditional.
    ##### YOUR CODE HERE
    bakeoff_results = rel_ext.experiment(
        splits,
        train_split='train',
        test_split='dev',
        featurizers=featurizers_8,
        model_factory=model_factory_1,
        verbose=False) # We don't care about this eval, so skip its summary.

    rel_ext_data_home_test = os.path.join(rel_ext_data_home, 'bakeoff-rel_ext-test-data')
    rel_ext.bake_off_experiment(bakeoff_results, rel_ext_data_home_test)
    





relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.887      0.539      0.786        438       7122
author                    0.835      0.747      0.816        645       7329
capital                   0.640      0.278      0.508        115       6799
contains                  0.811      0.763      0.801       3808      10492
film_performance          0.857      0.733      0.829       1011       7695
founders                  0.652      0.410      0.583        444       7128
genre                     0.633      0.330      0.534        188       6872
has_sibling               0.879      0.660      0.824        717       7401
has_spouse                0.815      0.687      0.785        780       7464
is_a                      0.735      0.522      0.680        611       7295
nationality               0.703      0.582      0.675        383       7067
parents     

In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-average f-score (an F_0.5 score) as reported 
# by the code above. Please enter only a number between 
# 0 and 1 inclusive. Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your score in the scope of the above conditional.
    ##### YOUR CODE HERE
    0.694

