**Table of contents**

* [Task](#task)
* [Naive Bayes](#NB)
    * [Feature function](#ff)
    * [MLE](#MLE)
    * [Evaluation](#eval)

    
**Table of Exercises**


* [Exercise 5-1](#ex5-1) (-/1)
* [Exercise 5-2](#ex5-2) (-/2)
* [Exercise 5-3](#ex5-3) (-/3)
* [Exercise 5-4](#ex5-4) (-/2)
* [Exercise 5-5](#ex5-5) (-/4)



**General notes**

* In this notebook you are expected to use $\LaTeX$. 
* Use python3.
* Use NLTK to read annotated data.
* **Document your code**: TAs are more likely to understand the steps if you document them. If you don't, it's also difficult to give you partial points for exercises that are not completely correct.

After completing this lab you should be able to 

* develop Naive Bayes text classifiers
* estimate parameters via MLE
* predict and evaluate models using precision/recall

# <a name="task"> Task

We will be looking into binary sentiment analysis where we have to decide whether a document $x$ (a list of tokens) is positive (class $y=1$) or negative (class $y=0$) towards a subject.

The dataset we will use comes from NLTK [nltk.corpus.sentence_polarity](http://www.nltk.org/howto/corpus.html):

In [1]:
from nltk.corpus import sentence_polarity

This dataset contains 5331 positive and 5331 negative sentences, which you can obtain as shown below:

In [2]:
pos_sents = sentence_polarity.sents(categories='pos')
neg_sents = sentence_polarity.sents(categories='neg')
print(len(pos_sents), 'positive sentences such as:\n', ' '.join(pos_sents[0]))
print(len(neg_sents), 'negative sentences such as:\n', ' '.join(neg_sents[0]))

5331 positive sentences such as:
 the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
5331 negative sentences such as:
 simplistic , silly and tedious .


We will use the first 4000 sentences from each class for training, the next 331 for development, and the last 1000 for test:

In [3]:
training_pos = pos_sents[:4000]
training_neg = neg_sents[:4000]
dev_pos = pos_sents[4000:4331]
dev_neg = neg_sents[4000:4331]
test_pos = pos_sents[4331:]
test_neg = neg_sents[4331:]
print('Training: %d pos and %d neg' % (len(training_pos), len(training_neg)))
print('Development: %d pos and %d neg' % (len(dev_pos), len(dev_neg)))
print('Test: %d pos and %d neg' % (len(test_pos), len(test_neg)))

Training: 4000 pos and 4000 neg
Development: 331 pos and 331 neg
Test: 1000 pos and 1000 neg


In [4]:
training_docs = training_pos + training_neg
dev_docs = dev_pos + dev_neg
test_docs = test_pos + test_neg

# <a name="NB"> Naive Bayes


Feature-rich models are used to model the distribution $P_{Y|X}(y|x)$ of a target variable $y$ conditioned on some high-dimensional data $x$.

One way of doing it is to summarise aspects of $x$ that are relevant to the problem by means of a feature function which returns a vector in some subset of $\mathbb R^D$. For example, this feature function may retain sentiment words in $x$ or some other important aspects of the input. Then instead of modelling $P_{Y|X}(y|x)$ we can, for example, model $P_{Y|F_1^n}(y|f_1^n)$ where we condition on a collection of $n$ features instead.

Conditioning on features of the input, rather than the input directly, does not address the problem on its own, that is, the conditioning context remains high-dimensional. But here is where we can use probability calculus and independence assumptions to make our task simpler.

We can use Bayes rule to invert this conditional:

\begin{align}
(1) \quad P_{Y|F_1^n}(y|f_1^n) = \frac{P_Y(y)P_{F_1^n|Y}(f_1^n|y)}{P_{F_1^n}(f_1^n)}
\end{align}

Now note that the numerator has a conditional where the high-dimensional feature representation of the input is modelled from the target class. That is a problem we can address by making conditional independence assumptions. In particular, by making $F_i$ independent on every other $F_j$ with $i \neq j$ given the target label $y$ we can simplify the problem a lot. We denote this by $F_i \perp F_j \mid y$ for $i\neq j$. Equation (2) shows the resulting model:

\begin{align}
(2) \quad P_{Y|F_1^n}(y|f_1^n) \overset{\text{ind}}{=} \frac{P_Y(y)\prod_{i=1}^n P_{F|Y}(f_i|y)}{P_{F_1^n}(f_1^n)}
\end{align}


Note that we have to model fairly small cpds now:
* a *prior* distribution over classes $P_Y(y)$ 
* a set of cpds $P_{F|Y}$, one per class, over the possible features (these distributions are also called likelihoods, but do not confuse it with the *likelihood function* which is a function of parameters of a statistical model for fixed data)
* the denominator can be inferred by marginalisation, see Equation (3)

\begin{align}
(2) \quad P_{F_1^n}(f_1^n) = \sum_{y \in \mathcal Y} P_{YF_1^n}(y, f_1^n) = \sum_{y \in \mathcal Y} P_{Y}(y)P_{F_1^n|Y}(f_1^n|y) = \sum_{y \in \mathcal Y} P_{Y}(y) \prod_{i=1}^n P_{F|Y}(f_i|y)
\end{align}



## <a name="ff"> Feature function

An important part of a feature-rich model such as Naive Bayes (NB) classifiers is the *feature function*. Here we will develop one. 

In NB classification, features are themselves random variables defined over a certain set $\mathcal F$. We need to first determine this set. In this notebook we will focus on *unigram features*, that is, features defined at the token level.

We will take every token that occurs more than a pre-specified number of times as a potential feature.

Here is an example of how you can do this:

In [5]:
from collections import Counter


def make_unigram_feature_set(documents, min_freq=1, mark_negation=False):
    """
    This function goes through a corpus and retains all candidate unigram features
     making a feature set. 
    
    :param documents: all documents, each a list of words
    :param min_freq: minimum frequency of a token for it to be part of the feature set
    :param mark_negation: **IGNORE THIS FOR NOW**
    :returns: unigram feature set
    """
    counter = Counter()
    for doc in documents:
        counter.update(doc)
    features = []
    for f, n in counter.most_common():
        if n >= min_freq:
            features.append(f)
        else:
            break
    return frozenset(features)

<a name="ex5-1" style="color:red">**Exercise 5-1**</a> **[1 points]** Modify `make_unigram_feature_set` to optionally pre-process documents by marking words in the scope of a negation with the suffix `_NEG`. For example,  `I am not sure I like the acting` becomes `I am not sure_NEG I_NEG like_NEG the_NEG acting_NEG`. You can use NLTK support for that, see for example, `nltk.sentiment.util.mark_negation`.

In [6]:
# You can check python documentation by using the following syntax
from nltk.sentiment import util
# util.mark_negation?

In [37]:
from collections import Counter

def make_unigram_feature_set(documents, min_freq=1, mark_negation=False):
    """
    This function goes through a corpus and retains all candidate unigram features
     making a feature set. Optionally, it can also preprocess the corpus annotating
     with _NEG words that are in the scope of a negation (using NLTK helper functions).
    
    :param documents: all documents, each a list of words
    :param min_freq: minimum frequency of a token for it to be part of the feature set
    :param mark_negation: whether to preprocess the document using NLTK's nltk.sentiment.util.mark_negation
        see the documentation `nltk.sentiment.util.mark_negation?`
    :returns: unigram feature set
    """
    # create a counter
    counter = Counter()
    
    # loop over the documents
    for doc in documents:
        
        # mark negations
        if mark_negation:
            
            # update the counter with the negations
            counter.update(util.mark_negation(doc))
            
        # do not mark negations
        else:
            
            # update the counter without negations
            counter.update(doc)
            
    # features list
    features = []
    
    # get the feature and occurences from the counter
    # in order from highest to lowest n
    for f, n in counter.most_common():
        
        # only include features that are frequent enough
        if n >= min_freq:
            
            # add the feature to the list 
            features.append(f)
        
        # lower, so everything that follows will be lower
        else:
            
            # break the loop
            break
    
    # freeze the set of features
    return frozenset(features)

In [38]:
# This is just some helper code for better visualization of examples
def inspect_set(input_set, k=5, neg=False):
    """
    Helper function to inspect a few elements in a set of features
         with _NEG words that are in the scope of a negation (using NLTK helper functions).
    :param documents: all documents, each a list of words
    :param input_set: a set of features
    :param k: how many elements to select
    :param neg: return `*_NEG` features only
    :returns: up to k elements 
    """
    selected = set()
    for w in input_set:
        if len(selected) < k:            
            if not neg:
                selected.add(w)
            elif '_NEG' in w:
                selected.add(w)
        else:
            break
    return selected

Here are some of the features (without marking negation):

```python
>>> unigram_features = make_unigram_feature_set(training_docs+dev_docs, min_freq=2)
>>> print(len(unigram_features), 'features such as:\n', inspect_set(unigram_features))
```
```
9059 features such as:
 {'white', 'fearless', 'ear', 'tempting', 'meat'}
```

In [39]:
unigram_features = make_unigram_feature_set(training_docs+dev_docs, min_freq=2)
print(len(unigram_features), 'features such as:\n', inspect_set(unigram_features))

9059 features such as:
 {'fast-moving', 'dullness', "something's", 'strain', 'versus'}


Here are some of the features with pre-processing negation scope.

```python
>>> unigram_features_with_negation = make_unigram_feature_set(training_docs+dev_docs, min_freq=2, mark_negation=True)
>>> print(len(unigram_features_with_negation), 'features such as:\n', 
      inspect_set(unigram_features_with_negation), 
      '\nand:\n', inspect_set(unigram_features_with_negation, neg=True))
```

```
10143 features such as:
 {'white', 'message_NEG', 'fearless', 'ear', 'tempting'} 
and:
 {'ticket_NEG', 'message_NEG', 'street_NEG', 'determined_NEG', 'stereotype_NEG'}     
```

In [40]:
unigram_features_with_negation = make_unigram_feature_set(training_docs+dev_docs, min_freq=2, mark_negation=True)
print(len(unigram_features_with_negation), 'features such as:\n', 
      inspect_set(unigram_features_with_negation), 
      '\nand:\n', inspect_set(unigram_features_with_negation, neg=True))

10143 features such as:
 {'fast-moving', 'dullness', "something's", 'strain', 'situation_NEG'} 
and:
 {'preachy_NEG', 'situation_NEG', 'may_NEG', 'casting_NEG', 'future_NEG'}


Now that we know which features will form the basis of our classifier, we need to implement a feature function. Here we call it a feature map (as we will be using a python dictionary).

In NB classification only the features that occur in an input matter for classification, thus we use a dictionary that maps features to their values if they occur and not otherwise.

This function should take a document $x$ and produce a dict where `f` (a feature) is either 1 (for binary features) or a count (for count features). For the purpose of readability we like to represent features with strings, for example:

* `contains(like) = 1` means that the input contains the word `like`
* `count(like) = 3` means that the input contains 3 occurrences of the word `like`
* `EMPTY() = 1` means that the input contains no known feature

<a name="ex5-2" style="color:red">**Exercise 5-2**</a> **[2 points]** Read the documentation below and implement the feature function described.

In [41]:
from collections import defaultdict

def make_feature_map(document, feature_set, 
                     binary=True, 
                     mark_negation=False):
    """
    This function takes a document, possibly pre-processes it by marking words in the scope of negation, 
     and constructs a dict indicating which features in `feature_set` fire. Features may be binary, 
     flagging occurrence, or integer, indicating the number of occurrences.
     If no feature can be extracted, a special feature is fired, namely 'EMPTY()'.
     
    :param document: a list of words
    :param feature_set: set of features we are looking for
    :param binary: whether we are indicating presence or counting features in feature_set
    :param mark_negation: whether we should apply NLTK's mark_negation to document before applying the feature function
    :returns: dict with entries 'contains(f)=1/0' for binary features or 'count(f)=n' for count features
    """
    
    # create the map
    feature_map = defaultdict(float)
    
    # should we mark the negations in the document?
    if mark_negation:
        
        # mark the negations in the document
        document = util.mark_negation(document)
    
    # loop over the words in the document
    for word in document:
        
        # check if the word is in the feature set
        if word in feature_set:
            
            # make a binary feature map
            if binary:
                
                # add the word to the map
                feature_map['contains(' + word + ')'] = 1.0
                
            # make a counting feature map
            else:
    
                # add a count of the word to the map
                feature_map['count(' + word + ')'] += 1.0
    
    # check if the map is empty
    if not feature_map:
        
        # mark it as empty
        feature_map['EMPTY()'] = 1.0
       
    # return the map
    return feature_map
        

Here are some outputs for you to check your implementation:

```python
>>> make_feature_map(pos_sents[7], unigram_features_with_negation, 
                 binary=True, mark_negation=True)
```
```
defaultdict(float,
            {'contains(.)': 1.0,
             'contains(ever_NEG)': 1.0,
             'contains(good_NEG)': 1.0,
             'contains(has_NEG)': 1.0,
             'contains(hell_NEG)': 1.0,
             'contains(is_NEG)': 1.0,
             'contains(literally_NEG)': 1.0,
             'contains(made_NEG)': 1.0,
             'contains(more_NEG)': 1.0,
             'contains(no)': 1.0,
             'contains(perhaps)': 1.0,
             'contains(picture_NEG)': 1.0,
             'contains(road_NEG)': 1.0,
             'contains(that_NEG)': 1.0,
             'contains(the_NEG)': 1.0,
             'contains(to_NEG)': 1.0,
             'contains(with_NEG)': 1.0})
```
```python
>>> make_feature_map(['AKSJDHAU'], unigram_features_with_negation, 
                 binary=True, mark_negation=True)
```
```
defaultdict(float, {'EMPTY()': 1.0})
```

In [42]:
make_feature_map(pos_sents[7], unigram_features_with_negation, binary=True, mark_negation=True)

defaultdict(float,
            {'contains(.)': 1.0,
             'contains(ever_NEG)': 1.0,
             'contains(good_NEG)': 1.0,
             'contains(has_NEG)': 1.0,
             'contains(hell_NEG)': 1.0,
             'contains(is_NEG)': 1.0,
             'contains(literally_NEG)': 1.0,
             'contains(made_NEG)': 1.0,
             'contains(more_NEG)': 1.0,
             'contains(no)': 1.0,
             'contains(perhaps)': 1.0,
             'contains(picture_NEG)': 1.0,
             'contains(road_NEG)': 1.0,
             'contains(that_NEG)': 1.0,
             'contains(the_NEG)': 1.0,
             'contains(to_NEG)': 1.0,
             'contains(with_NEG)': 1.0})

In [43]:
make_feature_map(['AKSJDHAU'], unigram_features_with_negation, binary=True, mark_negation=True)

defaultdict(float, {'EMPTY()': 1.0})

In [44]:
make_feature_map([word for sentence in pos_sents[:10] for word in sentence], 
                 unigram_features_with_negation, binary=False, mark_negation=True)

defaultdict(float,
            {'count(")': 4.0,
             'count(,)': 3.0,
             'count(--)': 1.0,
             'count(.)': 13.0,
             'count(21st)': 1.0,
             'count(;)': 1.0,
             'count(a)': 5.0,
             'count(absolute)': 1.0,
             'count(adequately)': 1.0,
             'count(all)': 1.0,
             'count(an)': 1.0,
             'count(and)': 3.0,
             'count(arnold)': 1.0,
             'count(as)': 1.0,
             'count(asian)': 1.0,
             'count(at)': 1.0,
             'count(be)': 1.0,
             'count(biopic)': 1.0,
             'count(but)': 2.0,
             'count(cannot)': 1.0,
             'count(care)': 1.0,
             'count(cat)': 1.0,
             'count(cinema)': 1.0,
             'count(clever)': 1.0,
             'count(co-writer/director)': 1.0,
             'count(column)': 1.0,
             'count(combination)': 1.0,
             'count(comics)': 1.0,
             'count(conan)': 1.0,
     

## <a name="MLE"> MLE

Now you will estimate the cpds involved in NB classification. We use a balanced dataset over two classes (positive and negative), so there's no need to compute $P_Y(y)$, it would simply be $0.5$ per class.

You should simply implement cpds for $P_{F|Y}$, that is, exactly $2$ cpds via MLE and you should use Laplace-$\alpha$ smoothing.

<a name="ex5-3" style="color:red">**Exercise 5-3**</a> **[3 points]** Check the documentation below and complete the code for the NB classifier. You will need to implement

* estimation of cpds $P_{F|Y}$ with Laplace smoothing  (1 point) 
* the `classify` method (2 points)

In [92]:
import numpy as np

def make_cpd(raw_counts, alpha, v):
#     print(raw_counts)
    """
    This converts a dictionary of raw counts into a cpd.

    :param raw_counts: dict where a key is a feature and a value is its counts (without pseudo counts)
        this should already include the 'EMPTY()' feature
    :param alpha: how many pseudo counts should we add per observation
    :param v: the size of the feature set (already including the 'EMPTY()' feature)
    :returns: a cpd as a dict where a key is a feature and a value is its smoothed probability
    """
    # ***TYPE YOUR SOLUTION*** 
    
    cpd = defaultdict(float)      
    
    total = sum(raw_counts.values())
        
    for word, count in raw_counts.items():

        cpd[word] = (count + alpha) / (total + (v*alpha))
   
    return cpd

class NaiveBayesClassifier:
    
    def __init__(self, training_pos, training_neg, binary, mark_negation, alpha=0.1, min_freq=2):
        """
        :param training_pos: positive documents
            a document is a list of tokens
        :param training_neg: negative documents
            a document is a list of tokens
        :param binary: whether we are using binary or count features
        :param mark_negation: whether we are pre-processing words in negation scope
        :param alpha: Laplace smooth pseudo count
        :param min_freq: minimum frequency of a token for it to be considered a feature
        """
                
        # Make feature set
        print('Extracting features:')
        feature_set = make_unigram_feature_set(
            training_pos + training_neg,  # we use a concatenation of positive and negative training instances
            min_freq=min_freq, 
            mark_negation=mark_negation)
        
        print(' %d features' % len(feature_set))
                
        # Estimate model: 1/2) count        
        print('MLE: counting')        
        counts = [defaultdict(float), defaultdict(float)]
        for docs, y in [(training_pos, 1), (training_neg, 0)]:
            for doc in docs:  # for each document
                # we extract features
                fmap = make_feature_map(doc, 
                                        feature_set, 
                                        binary=binary, 
                                        mark_negation=mark_negation)
                # and gather counts for the pair (y, f)
#                 print(fmap)
                for f, n in fmap.items():
                    counts[y][f] += n  
                                
        # 2/2) Laplace-1 MLE
        #  we put EMPTY() is in the support
        print('MLE: smoothing')
        counts[0]['EMPTY()'] += 0
        counts[1]['EMPTY()'] += 0
        # and compute cpds using Laplace smoothing
        self._cpds = [
            make_cpd(counts[0], alpha, len(feature_set) + 1),  # we add 1 because we want EMPTY() to add towards total
            make_cpd(counts[1], alpha, len(feature_set) + 1)]
        print('MLE: done')
            
        # Store data
        self._binary = binary
        self._mark_negation = mark_negation
        self._alpha = alpha
        self._feature_set = feature_set
            
    def get_log_parameter(self, f, y):
        """Returns log P(f|y)"""
        return np.log(self._cpds[y].get(f, self._cpds[y]['EMPTY()']))
        
    def classify(self, doc):
        """
        This function classifies a document by extracting features <f_1...f_n> for it 
         and then computing 
            log P(<f_1...f_n>|Y=0) and log P(<f_1...f_n>|Y=1)
         and finally picking the best (that is, either Y=0 or Y=1).
        
        :param doc: a list of tokens
        :returns: 0 or 1 (the argmax of log P(<f_1...f_n>|y))
        """
        
        # initial probabilities
        probabilities = [0, 0]
        
        # do we need to mark negations?
        if self._mark_negation:
            
            # mark negations in the document
            doc = util.mark_negation(doc)
            
        # loop over the words in the document
        for word in doc: 
            
            # check if we use binary
            if self._binary:
                
                # use contains for the key
                key = 'contains(' + word + ')'
            
            else:
                
                # use count for the key
                key = 'count(' + word + ')'
            
            # get the log prob of the negative sentiment
            probabilities[0] += self.get_log_parameter(key, 0)

            #  get the log prob of the positive sentiment
            probabilities[1] += self.get_log_parameter(key, 1)

        # return the index of the highest probability (or 0 if they are equal)
        return np.argmax(probabilities)

This is how you should use the classifier:

```python
>>> classifier1 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=False,
    alpha=1., min_freq=2)
```
```
Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done
```

In [93]:
classifier1 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=False,
    alpha=1., min_freq=2)

Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done


In [94]:
0.00011357613386840312/3

3.785871128946771e-05

## <a name="eval"> Evaluation

We evaluate binary classifiers on precision, recall, F1 and accuracy. See [Figure 4.4](https://web.stanford.edu/~jurafsky/slp3/4.pdf) and complete the code below:

<a name="ex5-4" style="color:red">**Exercise 5-4**</a> **[2 points]** Classify all documents in a dev set and compute the quantities in [Figure 4.4](https://web.stanford.edu/~jurafsky/slp3/4.pdf).

In [95]:
def evaluate_model(classifier, pos_docs, neg_docs):
    """
    :param classifier: an NaiveBayesClassifier object
    :param pos_docs: positive documents
    :param neg_docs: negative documents
    :returns: a dictionary containing the number of
        * true positives
        * true negatives
        * false positives
        * false negatives
     as well as 
        * accuracy
        * precision
        * recall 
        * and [F1](https://en.wikipedia.org/wiki/F1_score)
    """
    
    true_positives = 0
    true_negatives = 0
    false_positives = 0
    false_negatives = 0
    
    for doc in pos_docs:
        classification = classifier.classify(doc)
        if classification == 1:
            true_positives += 1
        else:
            false_negatives += 1
            
    for doc in neg_docs:
        classification = classifier.classify(doc)
        if classification == 0:
            true_negatives += 1
        else:
            false_positives += 1
            
    accuracy = float(true_positives + true_negatives) / (true_positives + true_negatives + 
                                                    false_positives + false_negatives)
    precision = float(true_positives) / (true_positives + false_positives)
    recall = float(true_positives) / (true_positives + false_negatives)
    f1_score = 2 * ((precision * recall) / (precision + recall))
    
    result = dict()
    
    result['TP'] = true_positives
    result['TN'] = true_negatives
    result['FP'] = false_positives
    result['FN'] = false_negatives
    result['A'] = accuracy
    result['P'] = precision
    result['R'] = recall
    result['F1'] = f1_score
    
    return result

For example, our implementation yields:

```python
>>> classifier1 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=False,
    alpha=1., min_freq=2)
```
```
Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done
```
```python
>>> dev_metrics1 = evaluate_model(classifier1, dev_pos, dev_neg)
>>> print('Development')
>>> print('TP %d TN %d FP %d FN %d' % (dev_metrics1['TP'], dev_metrics1['TN'], dev_metrics1['FP'], dev_metrics1['FN']))
>>> print('P %.4f R %.4f A %.4f F1 %.4f' % (dev_metrics1['P'], dev_metrics1['R'], dev_metrics1['A'], dev_metrics1['F1']))
```
```
Development
TP 239 TN 268 FP 63 FN 92
P 0.7914 R 0.7221 A 0.7659 F1 0.7551
```
```python
>>> classifier2 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=True,
    alpha=1., min_freq=2)
```
```
Extracting features:
 9581 features
MLE: counting
MLE: smoothing
MLE: done  
```
```python
>>> dev_metrics2 = evaluate_model(classifier2, dev_pos, dev_neg)
>>> print('Development')
>>> print('TP %d TN %d FP %d FN %d' % (dev_metrics2['TP'], dev_metrics2['TN'], dev_metrics2['FP'], dev_metrics2['FN']))
>>> print('P %.4f R %.4f A %.4f F1 %.4f' % (dev_metrics2['P'], dev_metrics2['R'], dev_metrics2['A'], dev_metrics2['F1']))
```
```
Development
TP 248 TN 273 FP 58 FN 83
P 0.8105 R 0.7492 A 0.7870 F1 0.7786
```

In [96]:
classifier1 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=False,
    alpha=1., min_freq=2)

Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done


In [97]:
dev_metrics1 = evaluate_model(classifier1, dev_pos, dev_neg)
print('Development')
print('TP %d TN %d FP %d FN %d' % (dev_metrics1['TP'], dev_metrics1['TN'], dev_metrics1['FP'], dev_metrics1['FN']))
print('P %.4f R %.4f A %.4f F1 %.4f' % (dev_metrics1['P'], dev_metrics1['R'], dev_metrics1['A'], dev_metrics1['F1']))

Development
TP 243 TN 267 FP 64 FN 88
P 0.7915 R 0.7341 A 0.7704 F1 0.7618


In [98]:
classifier2 = NaiveBayesClassifier(
    training_pos, training_neg, 
    binary=True, mark_negation=True,
    alpha=1., min_freq=2)

Extracting features:
 9581 features
MLE: counting
MLE: smoothing
MLE: done


In [99]:
dev_metrics2 = evaluate_model(classifier2, dev_pos, dev_neg)
print('Development')
print('TP %d TN %d FP %d FN %d' % (dev_metrics2['TP'], dev_metrics2['TN'], dev_metrics2['FP'], dev_metrics2['FN']))
print('P %.4f R %.4f A %.4f F1 %.4f' % (dev_metrics2['P'], dev_metrics2['R'], dev_metrics2['A'], dev_metrics2['F1']))

Development
TP 250 TN 273 FP 58 FN 81
P 0.8117 R 0.7553 A 0.7900 F1 0.7825


<a name="ex5-5" style="color:red">**Exercise 5-5**</a> **[4 points]** Use the dev set to choose the best configuration of 

* alpha (try values like 0.1, 0.5, 1.)
* and binary vs count

for a model that marks negation and one that does not. 

Then report performance on test set for your best model in each case. 

Points:
* 3 points for the search on dev set
* 1 point for the table of test results

In [125]:
def hyperparameters(alpha, binary, mark_negation, min_freq):
    result = []
    for a in alpha:
        for b in binary:
            for mn in mark_negation:
                for mf in min_freq:
                    classifier = NaiveBayesClassifier(
                    training_pos, training_neg, 
                    binary=b, mark_negation=mn,
                    alpha=a, min_freq=mf)
                    
                    dev_metrics = evaluate_model(classifier, dev_pos, dev_neg)
                    print('Development')
                    print('Parameters: alpha %.1f, binary %r, mark_negation %r, min_freq %d' % (a, b, mn, mf))
                    print('TP %d TN %d FP %d FN %d' % (dev_metrics['TP'], dev_metrics['TN'], dev_metrics['FP'], dev_metrics['FN']))
                    print('P %.4f R %.4f A %.4f F1 %.4f' % (dev_metrics['P'], dev_metrics['R'], dev_metrics['A'], dev_metrics['F1']))
                    print()
                    print()
                    dev_metrics['Alpha'] = a
                    dev_metrics['Binary'] = b
                    dev_metrics['Mark Negation'] = mn
                    dev_metrics['Min Frequency'] = mf
                    result.append(dev_metrics)
    return np.asarray(result)

In [126]:
alpha = np.linspace(0, 1, 11)
binary = [True, False]
mark_negation = [True, False]
min_freq = range(1, 6)

info = hyperparameters(alpha, binary, mark_negation, min_freq)

Extracting features:
 21805 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation True, min_freq 1
TP 77 TN 314 FP 17 FN 254
P 0.8191 R 0.2326 A 0.5906 F1 0.3624


Extracting features:




 9581 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation True, min_freq 2
TP 56 TN 322 FP 9 FN 275
P 0.8615 R 0.1692 A 0.5710 F1 0.2828


Extracting features:
 6283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation True, min_freq 3
TP 317 TN 27 FP 304 FN 14
P 0.5105 R 0.9577 A 0.5196 F1 0.6660


Extracting features:
 4701 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation True, min_freq 4
TP 320 TN 21 FP 310 FN 11
P 0.5079 R 0.9668 A 0.5151 F1 0.6660


Extracting features:
 3739 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation True, min_freq 5
TP 323 TN 19 FP 312 FN 8
P 0.5087 R 0.9758 A 0.5166 F1 0.6687


Extracting features:
 18283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.0, binary True, mark_negation False, mi

MLE: smoothing
MLE: done
Development
Parameters: alpha 0.1, binary False, mark_negation False, min_freq 5
TP 316 TN 55 FP 276 FN 15
P 0.5338 R 0.9547 A 0.5604 F1 0.6847


Extracting features:
 21805 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.2, binary True, mark_negation True, min_freq 1
TP 254 TN 261 FP 70 FN 77
P 0.7840 R 0.7674 A 0.7779 F1 0.7756


Extracting features:
 9581 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.2, binary True, mark_negation True, min_freq 2
TP 249 TN 268 FP 63 FN 82
P 0.7981 R 0.7523 A 0.7810 F1 0.7745


Extracting features:
 6283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.2, binary True, mark_negation True, min_freq 3
TP 309 TN 72 FP 259 FN 22
P 0.5440 R 0.9335 A 0.5755 F1 0.6874


Extracting features:
 4701 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.2, binary True, mark_negation True, min_freq 4
TP 313 TN 62 FP 2

 5829 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.3, binary False, mark_negation False, min_freq 3
TP 299 TN 118 FP 213 FN 32
P 0.5840 R 0.9033 A 0.6299 F1 0.7094


Extracting features:
 4448 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.3, binary False, mark_negation False, min_freq 4
TP 303 TN 106 FP 225 FN 28
P 0.5739 R 0.9154 A 0.6178 F1 0.7055


Extracting features:
 3525 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.3, binary False, mark_negation False, min_freq 5
TP 307 TN 89 FP 242 FN 24
P 0.5592 R 0.9275 A 0.5982 F1 0.6977


Extracting features:
 21805 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.4, binary True, mark_negation True, min_freq 1
TP 250 TN 268 FP 63 FN 81
P 0.7987 R 0.7553 A 0.7825 F1 0.7764


Extracting features:
 9581 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.4, binary True, mark_negation

 18283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.5, binary False, mark_negation False, min_freq 1
TP 245 TN 269 FP 62 FN 86
P 0.7980 R 0.7402 A 0.7764 F1 0.7680


Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.5, binary False, mark_negation False, min_freq 2
TP 241 TN 270 FP 61 FN 90
P 0.7980 R 0.7281 A 0.7719 F1 0.7615


Extracting features:
 5829 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.5, binary False, mark_negation False, min_freq 3
TP 296 TN 147 FP 184 FN 35
P 0.6167 R 0.8943 A 0.6692 F1 0.7300


Extracting features:
 4448 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.5, binary False, mark_negation False, min_freq 4
TP 302 TN 123 FP 208 FN 29
P 0.5922 R 0.9124 A 0.6420 F1 0.7182


Extracting features:
 3525 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.5, binary False, mark_negat

MLE: smoothing
MLE: done
Development
Parameters: alpha 0.7, binary False, mark_negation True, min_freq 3
TP 302 TN 138 FP 193 FN 29
P 0.6101 R 0.9124 A 0.6647 F1 0.7312


Extracting features:
 4701 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.7, binary False, mark_negation True, min_freq 4
TP 304 TN 114 FP 217 FN 27
P 0.5835 R 0.9184 A 0.6314 F1 0.7136


Extracting features:
 3739 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.7, binary False, mark_negation True, min_freq 5
TP 306 TN 100 FP 231 FN 25
P 0.5698 R 0.9245 A 0.6133 F1 0.7051


Extracting features:
 18283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.7, binary False, mark_negation False, min_freq 1
TP 245 TN 273 FP 58 FN 86
P 0.8086 R 0.7402 A 0.7825 F1 0.7729


Extracting features:
 8577 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.7, binary False, mark_negation False, min_freq 2
TP 241 TN

 21805 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.9, binary False, mark_negation True, min_freq 1
TP 255 TN 270 FP 61 FN 76
P 0.8070 R 0.7704 A 0.7931 F1 0.7883


Extracting features:
 9581 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.9, binary False, mark_negation True, min_freq 2
TP 248 TN 274 FP 57 FN 83
P 0.8131 R 0.7492 A 0.7885 F1 0.7799


Extracting features:
 6283 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.9, binary False, mark_negation True, min_freq 3
TP 298 TN 154 FP 177 FN 33
P 0.6274 R 0.9003 A 0.6828 F1 0.7395


Extracting features:
 4701 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.9, binary False, mark_negation True, min_freq 4
TP 301 TN 134 FP 197 FN 30
P 0.6044 R 0.9094 A 0.6571 F1 0.7262


Extracting features:
 3739 features
MLE: counting
MLE: smoothing
MLE: done
Development
Parameters: alpha 0.9, binary False, mark_negation 

In [128]:
def get_best_parameters(data, score):
    index = np.argmax([x[score] for x in data])
    return data[index]

In [133]:
best_F1 = get_best_parameters(info, 'F1')
print(best_F1)

{'TP': 256, 'TN': 272, 'FP': 59, 'FN': 75, 'A': 0.797583081570997, 'P': 0.8126984126984127, 'R': 0.7734138972809668, 'F1': 0.7925696594427245, 'Alpha': 1.0, 'Binary': False, 'Mark Negation': True, 'Min Frequency': 1}


In [134]:
best_A = get_best_parameters(info, 'A')
print(best_A)

{'TP': 256, 'TN': 272, 'FP': 59, 'FN': 75, 'A': 0.797583081570997, 'P': 0.8126984126984127, 'R': 0.7734138972809668, 'F1': 0.7925696594427245, 'Alpha': 1.0, 'Binary': False, 'Mark Negation': True, 'Min Frequency': 1}


In [135]:
best_P = get_best_parameters(info, 'P')
print(best_P)

{'TP': 61, 'TN': 322, 'FP': 9, 'FN': 270, 'A': 0.5785498489425982, 'P': 0.8714285714285714, 'R': 0.18429003021148035, 'F1': 0.3042394014962594, 'Alpha': 0.0, 'Binary': False, 'Mark Negation': False, 'Min Frequency': 2}


In [136]:
best_R = get_best_parameters(info, 'R')
print(best_R)

{'TP': 323, 'TN': 19, 'FP': 312, 'FN': 8, 'A': 0.5166163141993958, 'P': 0.5086614173228347, 'R': 0.9758308157099698, 'F1': 0.6687370600414079, 'Alpha': 0.0, 'Binary': True, 'Mark Negation': True, 'Min Frequency': 5}
