## Set-up

See [the first notebook in this unit](rel_ext_01_task.ipynb#Set-up) for set-up instructions.

In [4]:
import numpy as np
import os
import rel_ext
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
#import utils

As usual, we unite our corpus and KB into a dataset, and create some splits for experimentation:

In [1]:
#rel_ext_data_home = os.path.join('data', 'rel_ext_data')

In [5]:
corpus = rel_ext.Corpus('washington_post_test.tsv.gz')

In [6]:
kb = rel_ext.KB('Atsuko_filtered_KB.tsv.gz')

In [7]:
kb.all_relations

['adjoins',
 'capital',
 'contains',
 'has_spouse',
 'nationality',
 'place_of_birth',
 'place_of_death',
 'worked_at']

In [8]:
dataset = rel_ext.Dataset(corpus, kb)

In [9]:
dataset.count_examples()

                                             examples
relation               examples    triples    /triple
--------               --------    -------    -------
adjoins                     901       1702       0.53
capital                     121        522       0.23
contains                    363      18681       0.02
has_spouse                    4       2994       0.00
nationality                  28       1598       0.02
place_of_birth                1       1097       0.00
place_of_death                4        831       0.00
worked_at                     1       1150       0.00


You are not wedded to this set-up for splits. The bake-off will be conducted on a previously unseen test-set, so all of the data in `dataset` is fair game:

In [10]:
splits = dataset.build_splits(
    split_names=['train', 'test'],
    split_fracs=[0.80, 0.20],
    seed=1)

In [11]:
splits

{'train': Corpus with 54,237 examples; KB with 22,313 triples,
 'test': Corpus with 14,543 examples; KB with 6,262 triples,
 'all': Corpus with 68,780 examples; KB with 28,575 triples}

In [11]:
# train_result = rel_ext.train_models(
#     splits, split_name='train',
#     featurizers=[simple_bag_of_words_featurizer])

In [12]:
# predictions, true_labels = rel_ext.predict(
#     splits, train_result, split_name='test')

In [12]:
splits['train'].count_examples()

                                             examples
relation               examples    triples    /triple
--------               --------    -------    -------
adjoins                     641       1283       0.50
capital                      90        406       0.22
contains                    271      14461       0.02
has_spouse                    4       2419       0.00
nationality                  14       1296       0.01
place_of_birth                0        874       0.00
place_of_death                4        676       0.01
worked_at                     1        898       0.00


In [13]:
splits['test'].count_examples()

                                             examples
relation               examples    triples    /triple
--------               --------    -------    -------
adjoins                     145        419       0.35
capital                      19        116       0.16
contains                     59       4220       0.01
has_spouse                    0        575       0.00
nationality                   1        302       0.00
place_of_birth                1        223       0.00
place_of_death                0        155       0.00
worked_at                     0        252       0.00


## Baselines

### Hand-build feature functions

In [14]:
def simple_bag_of_words_featurizer(kbt, corpus, feature_counter):
    for ex in corpus.get_examples_for_entities(kbt.sbj, kbt.obj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    for ex in corpus.get_examples_for_entities(kbt.obj, kbt.sbj):
        for word in ex.middle.split(' '):
            feature_counter[word] += 1
    return feature_counter

In [15]:
featurizers = [simple_bag_of_words_featurizer]

In [16]:
model_factory = lambda: LogisticRegression(fit_intercept=True, solver='liblinear')

In [17]:
lr_BOW_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='test',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.927      0.912      0.924        419       2795
capital                   1.000      0.905      0.979        116       2492
contains                  0.994      0.990      0.993       4220       6596
has_spouse                0.997      1.000      0.997        575       2951
nationality               1.000      0.997      0.999        302       2678
place_of_birth            0.991      0.996      0.992        223       2599
place_of_death            0.987      1.000      0.990        155       2531
worked_at                 0.992      1.000      0.994        252       2628
------------------    ---------  ---------  ---------  ---------  ---------
macro-average             0.986      0.975      0.984       6262      25270


In [19]:
model_factory = lambda: SVC(kernel='linear')

In [20]:
svc_BOW_results = rel_ext.experiment(
    splits,
    train_split='train',
    test_split='test',
    featurizers=featurizers,
    model_factory=model_factory,
    verbose=True)

relation              precision     recall    f-score    support       size
------------------    ---------  ---------  ---------  ---------  ---------
adjoins                   0.946      0.912      0.939        419       2795
capital                   0.955      0.905      0.944        116       2492
contains                  0.997      0.990      0.996       4220       6596
has_spouse                1.000      1.000      1.000        575       2951
nationality               0.990      0.997      0.991        302       2678
place_of_birth            1.000      0.996      0.999        223       2599
place_of_death            0.994      1.000      0.995        155       2531
worked_at                 1.000      1.000      1.000        252       2628
------------------    ---------  ---------  ---------  ---------  ---------
macro-average             0.985      0.975      0.983       6262      25270


Studying model weights might yield insights:

### Directional unigram features [1.5 points]

The current bag-of-words representation makes no distinction between "forward" and "reverse" examples. But, intuitively, there is big difference between _X and his son Y_ and _Y and his son X_. This question asks you to modify `simple_bag_of_words_featurizer` to capture these differences. 

__To submit:__

1. A feature function `directional_bag_of_words_featurizer` that is just like `simple_bag_of_words_featurizer` except that it distinguishes "forward" and "reverse". To do this, you just need to mark each word feature for whether it is derived from a subject–object example or from an object–subject example.  The included function `test_directional_bag_of_words_featurizer` should help verify that you've done this correctly.

2. A call to `rel_ext.experiment` with `directional_bag_of_words_featurizer` as the only featurizer. (Aside from this, use all the default values for `rel_ext.experiment` as exemplified above in this notebook.)

3. `rel_ext.experiment` returns some of the core objects used in the experiment. How many feature names does the `vectorizer` have for the experiment run in the previous step? Include the code needed for getting this value. (Note: we're partly asking you to figure out how to get this value by using the sklearn documentation, so please don't ask how to do it!)

In [None]:
def directional_bag_of_words_featurizer(kbt, corpus, feature_counter):
    # Append these to the end of the keys you add/access in
    # `feature_counter` to distinguish the two orders. You'll
    # need to use exactly these strings in order to pass
    # `test_directional_bag_of_words_featurizer`.
    subject_object_suffix = "_SO"
    object_subject_suffix = "_OS"

    ##### YOUR CODE HERE


    return feature_counter


# Call to `rel_ext.experiment`:
##### YOUR CODE HERE




In [None]:
def test_directional_bag_of_words_featurizer(corpus):
    from collections import defaultdict
    kbt = rel_ext.KBTriple(rel='worked_at', sbj='Randall_Munroe', obj='xkcd')
    feature_counter = defaultdict(int)
    # Make sure `feature_counter` is being updated, not reinitialized:
    feature_counter['is_OS'] += 5
    feature_counter = directional_bag_of_words_featurizer(kbt, corpus, feature_counter)
    expected = defaultdict(
        int, {'is_OS':6,'a_OS':1,'webcomic_OS':1,'created_OS':1,'by_OS':1})
    assert feature_counter == expected, \
        "Expected:\n{}\nGot:\n{}".format(expected, feature_counter)

In [None]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_directional_bag_of_words_featurizer(corpus)

### Your original system [3 points]

There are many options, and this could easily grow into a project. Here are a few ideas:

- Try out different classifier models, from `sklearn` and elsewhere.
- Add a feature that indicates the length of the middle.
- Augment the bag-of-words representation to include bigrams or trigrams (not just unigrams).
- Introduce features based on the entity mentions themselves. <!-- \[SPOILER: it helps a lot, maybe 4% in F-score. And combines nicely with the directional features.\] -->
- Experiment with features based on the context outside (rather than between) the two entity mentions — that is, the words before the first mention, or after the second.
- Try adding features which capture syntactic information, such as the dependency-path features used by Mintz et al. 2009. The [NLTK](https://www.nltk.org/) toolkit contains a variety of [parsing algorithms](http://www.nltk.org/api/nltk.parse.html) that may help.
- The bag-of-words representation does not permit generalization across word categories such as names of people, places, or companies. Can we do better using word embeddings such as [GloVe](https://nlp.stanford.edu/projects/glove/)?

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies. We also ask that you report the best score your system got during development, just to help us understand how systems performed overall.