<a href="https://colab.research.google.com/github/lstuurman/SC_comp/blob/master/Laurens_Stuurman_practical1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Practical 1: Sentiment Detection of Movie Reviews
========================================



This practical concerns sentiment detection of movie reviews.
In [this file](https://gist.githubusercontent.com/bastings/d47423301cca214e3930061a5a75e177/raw/5113687382919e22b1f09ce71a8fecd1687a5760/reviews.json) (80MB) you will find 1000 positive and 1000 negative **movie reviews**.
Each review is a **document** and consists of one or more sentences.

To prepare yourself for this practical, you should
have a look at a few of these texts to understand the difficulties of
the task (how might one go about classifying the texts?); you will write
code that decides whether a random unseen movie review is positive or
negative.

Please make sure you have read the following paper:

>   Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan
(2002). 
[Thumbs up? Sentiment Classification using Machine Learning
Techniques](https://dl.acm.org/citation.cfm?id=1118704). EMNLP.

Bo Pang et al. were the "inventors" of the movie review sentiment
classification task, and the above paper was one of the first papers on
the topic. The first version of your sentiment classifier will do
something similar to Bo Pang’s system. If you have questions about it,
we should resolve them in our first demonstrated practical.


**Advice**

Please read through the entire practical and familiarise
yourself with all requirements before you start coding or otherwise
solving the tasks. Writing clean and concise code can make the difference
between solving the assignment in a matter of hours, and taking days to
run all experiments.

**Environment**

All code should be written in **Python 3**. 
If you use Colab, check if you have that version with `Runtime -> Change runtime type` in the top menu.

> If you want to work in your own computer, then download this notebook through `File -> Download .ipynb`.
The easiest way to
install Python is through downloading
[Anaconda](https://www.anaconda.com/download). 
After installation, you can start the notebook by typing `jupyter notebook filename.ipynb`.
You can also use an IDE
such as [PyCharm](https://www.jetbrains.com/pycharm/download/) to make
coding and debugging easier. It is good practice to create a [virtual
environment](https://docs.python.org/3/tutorial/venv.html) for this
project, so that any Python packages don’t interfere with other
projects.

#### Learning Python 3

If you are new to Python 3, you may want to check out a few of these resources:
- https://learnxinyminutes.com/docs/python3/
- https://www.learnpython.org/
- https://docs.python.org/3/tutorial/

Loading the Data
-------------------------------------------------------------

In [0]:
# download sentiment lexicon
!wget https://gist.githubusercontent.com/bastings/d6f99dcb6c82231b94b013031356ba05/raw/f80a0281eba8621b122012c89c8b5e2200b39fd6/sent_lexicon
# download review data
!wget https://gist.githubusercontent.com/bastings/d47423301cca214e3930061a5a75e177/raw/5113687382919e22b1f09ce71a8fecd1687a5760/reviews.json

--2019-11-11 11:15:52--  https://gist.githubusercontent.com/bastings/d6f99dcb6c82231b94b013031356ba05/raw/f80a0281eba8621b122012c89c8b5e2200b39fd6/sent_lexicon
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.36.133
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 662577 (647K) [text/plain]
Saving to: ‘sent_lexicon.1’


2019-11-11 11:15:52 (4,38 MB/s) - ‘sent_lexicon.1’ saved [662577/662577]

--2019-11-11 11:15:53--  https://gist.githubusercontent.com/bastings/d47423301cca214e3930061a5a75e177/raw/5113687382919e22b1f09ce71a8fecd1687a5760/reviews.json
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.36.133
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.36.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 83503869 (80M) [text/plain]
Saving to: ‘reviews.json.1’


2019-

In [0]:
import math
import os
import sys
from subprocess import call
from nltk import FreqDist
from nltk.util import ngrams
from nltk.stem.porter import PorterStemmer
import sklearn as sk
import pickle
import json
from collections import Counter
import requests
import matplotlib.pyplot as plt
import numpy as np

In [0]:
# load reviews into memory
# file structure:
# [
#  {"cv": integer, "sentiment": str, "content": list} 
#  {"cv": integer, "sentiment": str, "content": list} 
#   ..
# ]
# where `content` is a list of sentences, 
# with a sentence being a list of (token, pos_tag) pairs.

# For documentation on POS-tags, see 
# https://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf

with open("reviews.json", mode="r", encoding="utf-8") as f:
  reviews = json.load(f)
  
print(len(reviews))

def print_sentence_with_pos(s):
  print(" ".join("%s/%s" % (token, pos_tag) for token, pos_tag in s))

for i, r in enumerate(reviews):
  print(r["cv"], r["sentiment"], len(r["content"]))  # cv, sentiment, num sents
  print_sentence_with_pos(r["content"][0])
  if i == 4: 
    break
    
c = Counter()
for review in reviews:
  for sentence in review["content"]:
    for token, pos_tag in sentence:
      c[token.lower()] += 1
      
print("#types", len(c))

print("Most common tokens:")
for token, count in c.most_common(25):
  print("%10s : %8d" % (token, count))
  
print(c)

2000
0 NEG 29
Two/CD teen/JJ couples/NNS go/VBP to/TO a/DT church/NN party/NN ,/, drink/NN and/CC then/RB drive/NN ./.
1 NEG 11
Damn/JJ that/IN Y2K/CD bug/NN ./.
2 NEG 24
It/PRP is/VBZ movies/NNS like/IN these/DT that/WDT make/VBP a/DT jaded/JJ movie/NN viewer/NN thankful/JJ for/IN the/DT invention/NN of/IN the/DT Timex/NNP IndiGlo/NNP watch/NN ./.
3 NEG 19
QUEST/NN FOR/IN CAMELOT/NNP ``/`` Quest/NNP for/IN Camelot/NNP ''/'' is/VBZ Warner/NNP Bros./NNP '/POS first/JJ feature-length/JJ ,/, fully-animated/JJ attempt/NN to/TO steal/VB clout/NN from/IN Disney/NNP 's/POS cartoon/NN empire/NN ,/, but/CC the/DT mouse/NN has/VBZ no/DT reason/NN to/TO be/VB worried/VBN ./.
4 NEG 38
Synopsis/NNPS :/: A/DT mentally/RB unstable/JJ man/NN undergoing/VBG psychotherapy/NN saves/VBZ a/DT boy/NN from/IN a/DT potentially/RB fatal/JJ accident/NN and/CC then/RB falls/VBZ in/IN love/NN with/IN the/DT boy/NN 's/POS mother/NN ,/, a/DT fledgling/NN restauranteur/NN ./.
#types 47743
Most common tokens:
       

Symbolic approach – sentiment lexicon (2pts)
---------------------------------------------------------------------



**How** could one automatically classify movie reviews according to their
sentiment? 

If we had access to a **sentiment lexicon**, then there are ways to solve
the problem without using Machine Learning. One might simply look up
every open-class word in the lexicon, and compute a binary score
$S_{binary}$ by counting how many words match either a positive, or a
negative word entry in the sentiment lexicon $SLex$.

$$S_{binary}(w_1w_2...w_n) = \sum_{i = 1}^{n}\text{sgn}(SLex\big[w_i\big])$$

**Threshold.** In average there are more positive than negative words per review (~7.13 more positive than negative per review) to take this bias into account you should use a threshold of **8** (roughly the bias itself) to make it harder to classify as positive.

$$
\text{classify}(S_{binary}(w_1w_2...w_n)) = \bigg\{\begin{array}{ll}
        \text{positive} & \text{if } S_{binary}(w_1w_2...w_n) > threshold\\
        \text{negative} & \text{else }
        \end{array}
$$

To implement this approach, you should use the sentiment
lexicon in `sent_lexicon`, which was taken from the
following work:

> Theresa Wilson, Janyce Wiebe, and Paul Hoffmann
(2005). [Recognizing Contextual Polarity in Phrase-Level Sentiment
Analysis](http://www.aclweb.org/anthology/H/H05/H05-1044.pdf). HLT-EMNLP.

#### (Q: 1.1) Implement this approach and report its classification accuracy. (1 pt)

In [0]:
# YOUR CODE HERE
from collections import defaultdict

def create_lexicon(filepath = "sent_lexicon", scaling = 1.5):

    with open(filepath) as f:
    #lex = [[line.split(' ')[2].rsplit('=')[1], -1 if line.split(' ')[5].rsplit('=')[1] == "negative\n" else 1 if line.split(' ')[5].rsplit('=')[1] == "positive\n" else 0] for line in f]
        lex_score = defaultdict(int, {line.split(' ')[2].rsplit('=')[1] : (-1 if "negative" in line.split(' ')[5].rsplit('=')[1].lower() else 1 if "positive" in line.split(' ')[5].rsplit('=')[1].lower() else 0) for line in f})
    with open(filepath) as f:  
        lex_scale = defaultdict(float, {line.split(' ')[2].rsplit('=')[1] : (scaling if "strong" in line.split(' ')[0].rsplit('=')[1] else 1) for line in f})
    return lex_score, lex_scale

def SBinaryClassification(reviews, thresh = 8, scl = False, sf = 1.5):
    lexicon, scale_mult = create_lexicon(scaling = sf)
    #print(lexicon)
    SB = []
    for review in reviews:
        c = Counter()
        score = 0
        for sentence in review["content"]:
      #print(sentence)
            for word in sentence:
                if scl:
                  #print(scale_mult[word[0]])
                    score += (lexicon[word[0]] * scale_mult[word[0]])
                else:
                    score += lexicon[word[0]]
        SB.append(score)
    classification = ['POS' if s > thresh else 'NEG' for s in SB]
    return classification

def SBClassifyScoring(SB, true_ID):
  
    Comp = [1 if SB[i] == true_ID[i] else 0 for i in range(0,len(true_ID))]
    score = sum(Comp)/len(Comp)
    return score,Comp

#(Accuracy should be over 60 or 70)

In [0]:
predictions = SBinaryClassification(reviews)
token_accuracy,token_results = SBClassifyScoring(predictions, [review['sentiment'] for review in reviews])
# print(token_results)
# print(sum(token_results)/2000)
# token_results = SBinaryClassification(reviews)
# token_accuracy = SBClassifyScoring(token_results, [review['sentiment'] for review in reviews])
print("Accuracy: %0.2f" % token_accuracy)

Accuracy: 0.68


If the sentiment lexicon also has information about the **magnitude** of
sentiment (e.g., *“excellent"* would have higher magnitude than
*“good"*), we could take a more fine-grained approach by adding up all
sentiment scores, and deciding the polarity of the movie review using
the sign of the weighted score $S_{weighted}$.

$$S_{weighted}(w_1w_2...w_n) = \sum_{i = 1}^{n}SLex\big[w_i\big]$$


Their lexicon also records two possible magnitudes of sentiment (*weak*
and *strong*), so you can implement both the binary and the weighted
solutions (please use a switch in your program). For the weighted
solution, you can choose the weights intuitively *once* before running
the experiment.

#### (Q: 1.2) Now incorporate magnitude information and report the classification accuracy. Don't forget to use the threshold. (1 pt)

In [0]:
# find best value for sf

for i in np.arange(1.2,1.5,0.01):
    print(i)
    predictions = SBinaryClassification(reviews, scl = True, sf = i)
    magnitude_accuracy,magnitude_results = SBClassifyScoring(predictions, [review['sentiment'] for review in reviews])
    # magnitude_results = SBinaryClassification(reviews, scl = True, sf = i)
    # magnitude_accuracy = SBClassifyScoring(magnitude_results, [review['sentiment'] for review in reviews])
    print("Accuracy: %0.2f" % magnitude_accuracy)

1.2
Accuracy: 0.69
1.21
Accuracy: 0.69
1.22
Accuracy: 0.69
1.23
Accuracy: 0.68
1.24
Accuracy: 0.68
1.25
Accuracy: 0.69
1.26
Accuracy: 0.68
1.27
Accuracy: 0.69
1.28
Accuracy: 0.69
1.29
Accuracy: 0.68
1.3
Accuracy: 0.68
1.31
Accuracy: 0.69
1.32
Accuracy: 0.69
1.33
Accuracy: 0.69
1.34
Accuracy: 0.69
1.35
Accuracy: 0.69
1.36
Accuracy: 0.69
1.37
Accuracy: 0.69
1.3800000000000001
Accuracy: 0.69
1.3900000000000001
Accuracy: 0.69
1.4000000000000001
Accuracy: 0.69
1.4100000000000001
Accuracy: 0.69
1.4200000000000002
Accuracy: 0.69
1.4300000000000002
Accuracy: 0.69
1.4400000000000002
Accuracy: 0.69
1.4500000000000002
Accuracy: 0.69
1.4600000000000002
Accuracy: 0.69
1.4700000000000002
Accuracy: 0.69
1.4800000000000002
Accuracy: 0.69
1.4900000000000002
Accuracy: 0.69
1.5000000000000002
Accuracy: 0.69


In [0]:
predictions = SBinaryClassification(reviews, scl = True, sf = 1.37)
magnitude_accuracy, magnitude_results = SBClassifyScoring(predictions, [review['sentiment'] for review in reviews])
print("Accuracy: %0.2f" % magnitude_accuracy)

Accuracy: 0.69


#### Optional: make a barplot of the two results.

In [0]:
# YOUR CODE HERE;


Answering questions in statistically significant ways (1pt)
-------------------------------------------------------------

Does using the magnitude improve the results? Oftentimes, answering questions like this about the performance of
different signals and/or algorithms by simply looking at the output
numbers is not enough. When dealing with natural language or human
ratings, it’s safe to assume that there are infinitely many possible
instances that could be used for training and testing, of which the ones
we actually train and test on are a tiny sample. Thus, it is possible
that observed differences in the reported performance are really just
noise. 

There exist statistical methods which can be used to check for
consistency (*statistical significance*) in the results, and one of the
simplest such tests is the **sign test**. 

The sign test is based on the binomial distribution. Count all cases when System 1 is better than System 2, when System 2 is better than System 1, and when they are the same. Call these numbers $Plus$, $Minus$ and $Null$ respectively. 

The sign test returns the probability that the null hypothesis is true. 

This probability is called the $p$-value and it can be calculated for the two-sided sign test using the following formula (we multiply by two because this is a two-sided sign test and tests for the significance of differences in either direction):

$$2 \, \sum\limits_{i=0}^{k} \binom{N}{i} \, q^i \, (1-q)^{N-i}$$

where $$N = 2 \Big\lceil \frac{Null}{2}\Big\rceil + Plus + Minus$$ is the total
number of cases, and
$$k = \Big\lceil \frac{Null}{2}\Big\rceil + \min\{Plus,Minus\}$$ is the number of
cases with the less common sign. 

In this experiment, $q = 0.5$. Here, we
treat ties by adding half a point to either side, rounding up to the
nearest integer if necessary. 


#### (Q 2.1): Implement the sign test. Is the difference between the two symbolic systems significant? What is the p-value? (1 pt)

You should use the `comb` function from `scipy` and the `decimal` package for the stable adding of numbers in the final summation.

You can quickly verify the correctness of
your sign test code using a [free online
tool](https://www.graphpad.com/quickcalcs/binomial1.cfm).

In [0]:
from decimal import Decimal
from scipy.special import comb


def sign_test(results_1, results_2):
    """test for significance
    results_1 is a list of classification results (+ for correct, - incorrect)
    results_2 is a list of classification results (+ for correct, - incorrect)
    """
    ties, plus, minus = 0, 0, 0

    # "-" carries the errorabs
    
    for i in range(0, len(results_1)):
        if results_1[i] == results_2[i]:
            ties += 1
        elif results_1[i]== 0: # - 
            plus += 1
        elif results_2[i]== 0: # - 
            minus += 1
    #print(ties,minus,plus)
    n = int(2 * ties/2 + minus + plus) # YOUR CODE HERE
    k =  int(ties/2 + min([plus,minus])) # YOUR CODE HERE  int(ties/2) +
    
#     print(n,k)
    summation = Decimal(0.0)
    for i in range(0,int(k)+1):
        summation += Decimal(comb(n,i,exact = True)) #* (.5**i) * (.5**(n-i)))# YOUR CODE HERE

    # use two-tailed version of test
    summation *= 2
    summation *= (Decimal(0.5)**Decimal(n))
    print("the difference is", "not significant" if summation >= 0.05 else "significant")

    return summation

p_value = sign_test(token_results, magnitude_results)
print(p_value)

the difference is not significant
0.8057148676803825624105038557


In [0]:
print(token_results[:100])
print(magnitude_results[:100])

[0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1]
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1]


## Using the Sign test

**From now on, report all differences between systems using the
sign test.** You can think about a change that you apply to one system, as a
 new system.
    
You should report statistical test
results in an appropriate form – if there are several different methods
(i.e., systems) to compare, tests can only be applied to pairs of them
at a time. This creates a triangular matrix of test results in the
general case. When reporting these pair-wise differences, you should
summarise trends to avoid redundancy.


Naive Bayes (8pt + 1pt bonus)
==========


Your second task is to program a simple Machine Learning approach that operates
on a simple Bag-of-Words (BoW) representation of the text data, as
described in Pang et al. (2002). In this approach, the only features we
will consider are the words in the text themselves, without bringing in
external sources of information. The BoW model is a popular way of
representing text information as vectors (or points in space), making it
easy to apply classical Machine Learning algorithms on NLP tasks.
However, the BoW representation is also very crude, since it discards
all information related to word order and grammatical structure in the
original text.

## Writing your own classifier

Write your own code to implement the Naive Bayes (NB) classifier. As
a reminder, the Naive Bayes classifier works according to the following
equation:
$$\hat{c} = \operatorname*{arg\,max}_{c \in C} P(c|\bar{f}) = \operatorname*{arg\,max}_{c \in C} P(c)\prod^n_{i=1} P(f_i|c)$$
where $C = \{ \text{POS}, \text{NEG} \}$ is the set of possible classes,
$\hat{c} \in C$ is the most probable class, and $\bar{f}$ is the feature
vector. Remember that we use the log of these probabilities when making
a prediction:
$$\hat{c} = \operatorname*{arg\,max}_{c \in C} \Big\{\log P(c) + \sum^n_{i=1} \log P(f_i|c)\Big\}$$

You can find more details about Naive Bayes in [Jurafsky &
Martin](https://web.stanford.edu/~jurafsky/slp3/). You can also look at
this helpful
[pseudo-code](https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html).

*Note: this section and the next aim to put you a position to replicate
    Pang et al., Naive Bayes results. However, the numerical results
    will differ from theirs, as they used different data.*

**You must write the Naive Bayes training and prediction code from
scratch.** You will not be given credit for using off-the-shelf Machine
Learning libraries.

The data contains the text of the reviews, where each document consists
of the sentences in the review, the sentiment of the review and an index
(cv) that you will later use for cross-validation. You will find the
text has already been tokenised and POS-tagged for you. Your algorithm
should read in the text, **lowercase it**, and store the words and their
frequencies in an appropriate data structure that allows for easy
computation of the probabilities used in the Naive Bayes algorithm, and
then make predictions for new instances.

#### (Q3.1) Train your classifier on (positive and negative) reviews with cv-value 000-899, and test it on the remaining reviews cv900–cv999.  Report results using simple classification accuracy as your evaluation metric. Your  features are the word vocabulary. The value of a feature is the count of that feature (word) in the document. (2pts)


In [0]:
train  = reviews[0:900]
sentement = [review['sentiment'] for review in train]

In [0]:
elements = c.elements()
test = []
for value in elements:
  test.append(value)

s_test = set(test)
print(s_test)



In [0]:
# YOUR CODE HERE
import math
def get_unique_from_counter(cnt):
    test = []
    for value in cnt.elements():
        test.append(value)
    return set(test)

def train_test_split(reviews):
    train = reviews[0:900]+reviews[1000:1900]
    # train.append(reviews[1000:1900])
    test = reviews[900:1000] + reviews[1900:1999]
    # test.append(reviews[1900:1999])
    return train, test

def calc_pior(tr_rv):
    sentement = [review['sentiment'] for review in tr_rv]
    pr_neg = math.log(sentement.count('NEG')/len(sentement))
    pr_pos = math.log(sentement.count('POS')/len(sentement))
    return pr_neg, pr_pos

def calc_likelihood(tr_rv, smoothing = False, print_len = False):
    c_total = Counter()
    c_neg = Counter()
    c_pos = Counter()
    
    for rev in tr_rv:
        if rev['sentiment'] == 'NEG':
            for sentence in rev['content']:
                for token,pos in sentence:
                    c_total[token.lower()] += 1
                    c_neg[token.lower()] += 1
        else:
             for sentence in rev['content']:
                for token,pos in sentence:
                    c_total[token.lower()] += 1
                    c_pos[token.lower()] += 1           
                    

    num_words = len(list(c_total.keys()))
    
    p_dict_pos = {}
    p_dict_neg = {}
    
    if smoothing:
        # convert to logprobabilities : 
        for key,value in c_total.items():
            try:
                p_dict_neg[key] = math.log((c_neg[key] + 1)/(c_total[key] + num_words))
#             except:
#                 pass
            except:
                p_dict_neg[key] = math.log((1)/(c_total[key] + num_words))
            try:
                p_dict_pos[key] = math.log((c_pos[key] + 1)/(c_total[key] + num_words))
#             except:
#                 pass
            except:
                p_dict_pos[key] = math.log((1)/(c_total[key] + num_words))
                
    else:
        # convert to logprobabilities : 
        for key,value in c_total.items():
            try:
                p_dict_neg[key] = math.log((c_neg[key])/(c_total[key]))
            except:
                pass
#             except:
#                 p_dict_neg[key] = math.log((1)/(c_total[key]))
            try:
                p_dict_pos[key] = math.log((c_pos[key])/(c_total[key]))
            except:
                pass
#             except:
#                 p_dict_pos[key] = math.log((1)/(c_total[key]))
                                           

    
    if print_len:
        print(len(list(c_total.keys())))
    return p_dict_neg,p_dict_pos

def train_naive(train, test,smoothing,print_len = False):
    prior_neg, prior_pos = calc_pior(train)
    like_neg, like_pos = calc_likelihood(train,smoothing, print_len)
#     for key,value in list(like_neg.items())[:100]:
#         print(key,value)
    return prior_neg, prior_pos, like_neg, like_pos

def naive_classification(review, P_pos, P_neg, P_ln, P_lp):
    prob_pos = P_pos
    prob_neg = P_neg
    random_likelihood = math.log(.5)
    for sentence in review["content"]:
        for word in sentence:
            try:
                prob_pos += P_lp[word[0].lower()] #prob_pos + P_lp[word[0]]
            except: 
                prob_pos += random_likelihood # unknown words have .5 probability of being positive or negative
            try:
                prob_neg += P_ln[word[0].lower()] # prob_neg + P_ln[word[0]]
            except:
                prob_pos += random_likelihood
                
    if prob_pos > prob_neg: #>
        return 'POS'
    else:
        return 'NEG'

def predict_naive_class(t_r, P_pos, P_neg, P_ln, P_lp):  
    predicts = [naive_classification(review, P_pos, P_neg, P_ln, P_lp) for review in t_r]
    return predicts

def Naive_Accuracy(reviews, smoothing = False):
    r_train, r_test = train_test_split(reviews)
    test_sent = [review['sentiment'] for review in r_test]

    P_pos, P_neg, P_ln, P_lp = train_naive(r_train, r_test, smoothing)
    predictions = predict_naive_class(r_test,P_pos, P_neg, P_ln, P_lp)

        
    # score ; 
    accuracy = 0
    hits = []
    for i,pred in enumerate(predictions):
        if pred == test_sent[i]:
            accuracy +=1
            hits.append(1)
        else:
            hits.append(0)
    accuracy = accuracy/len(predictions)
    return accuracy, hits
    

acc, hitsNB = Naive_Accuracy(reviews,smoothing = False)
print('Accuracy of naive bayes classifier without smoothing : ',acc)

Accuracy of naive bayes classifier without smoothing :  0.6834170854271356


#### (Bonus Questions) Would you consider accuracy to also be a good way to evaluate your classifier in a situation where 90% of your data instances are of positive movie reviews? (1pt)

You can simulate this scenario by keeping the positive reviews
data unchanged, but only using negative reviews cv000–cv089 for
training, and cv900–cv909 for testing. Calculate the classification
accuracy, and explain what changed.

In [0]:
# YOUR CODE HERE


## Smoothing

The presence of words in the test dataset that
haven’t been seen during training can cause probabilities in the Naive
Bayes classifier to be $0$, thus making that particular test instance
undecidable. The standard way to mitigate this effect (as well as to
give more clout to rare words) is to use smoothing, in which the
probability fraction
$$\frac{\text{count}(w_i, c)}{\sum\limits_{w\in V} \text{count}(w, c)}$$ for a word
$w_i$ becomes
$$\frac{\text{count}(w_i, c) + \text{smoothing}(w_i)}{\sum\limits_{w\in V} \text{count}(w, c) + \sum\limits_{w \in V} \text{smoothing}(w)}$$





#### (Q3.2) Implement Laplace feature smoothing (1pt)
($smoothing(\cdot) = \kappa$, constant for all words) in your Naive
Bayes classifier’s code, and report the impact on performance. 
Use $\kappa = 1$.

In [0]:
# YOUR CODE HERE
acc, hitsSMOOTH = Naive_Accuracy(reviews, smoothing = True)
print('Accuracy of naive bayes classifier with smoothing : ',acc)

Accuracy of naive bayes classifier with smoothing :  0.6331658291457286


#### (Q3.3) Is the difference between non smoothed (Q3.1) and smoothed (Q3.2) statistically significant? (0.5pt)

In [0]:
# YOUR CODE HERE
sign_test(hitsSMOOTH,hitsNB)

the difference is not significant


Decimal('0.4784912009825472679858051753')

## Cross-validation

A serious danger in using Machine Learning on small datasets, with many
iterations of slightly different versions of the algorithms, is that we
end up with Type III errors, also called the “testing hypotheses
suggested by the data” errors. This type of error occurs when we make
repeated improvements to our classifiers by playing with features and
their processing, but we don’t get a fresh, never-before seen test
dataset every time. Thus, we risk developing a classifier that’s better
and better on our data, but worse and worse at generalizing to new,
never-before seen data.

A simple method to guard against Type III errors is to use
cross-validation. In N-fold cross-validation, we divide the data into N
distinct chunks / folds. Then, we repeat the experiment N times, each
time holding out one of the chunks for testing, training our classifier
on the remaining N - 1 data chunks, and reporting performance on the
held-out chunk. We can use different strategies for dividing the data:

-   Consecutive splitting:
  - cv000–cv099 = Split 1
  - cv100–cv199 = Split 2
  - etc.
  
-   Round-robin splitting (mod 10):
  - cv000, cv010, cv020, … = Split 1
  - cv001, cv011, cv021, … = Split 2
  - etc.

-   Random sampling/splitting
  - Not used here (but you may choose to split this way in a non-educational situation)

#### (Q3.4) Write the code to implement 10-fold cross-validation using round-robin splitting for your Naive Bayes classifier from Q3.2 and compute the 10 accuracies. Report the final performance, which is the average of the performances per fold. If all splits perform equally well, this is a good sign. (1pt)






In [0]:
# YOUR CODE HERE
from functools import reduce
def Round_Robin_split(l):
    splits = []
    for i in range(0,10):
        indeces = [x+i for x in range(len(l) - 1) if x%10 == 0.]
        splits.append([l[j] for j in indeces])
    return splits

split = Round_Robin_split(reviews)

def RR_crosvall(reviews):
    accuracies = []
    blocks = Round_Robin_split(reviews)
    for i in range(10):
        train = reduce(lambda x,y:x+y, [blocks[j] for j in range(10) if j != i])
        test = blocks[i]
        # train model:
        test_sent = [review['sentiment'] for review in test]
        P_pos, P_neg, P_ln, P_lp = train_naive(train, test, smoothing = True)
        
        # predict:
        predictions = predict_naive_class(test,P_pos, P_neg, P_ln, P_lp)
        
        # score ; 
        accuracy = 0
        for i,pred in enumerate(predictions):
            if pred == test_sent[i]:
                accuracy +=1
        accuracy = accuracy/len(predictions)
        accuracies.append(accuracy)
        
    return accuracies
    
accuracies = RR_crosvall(reviews)
print('Average accuracy over 10 fold Round Robing cross-validation : ',np.average(accuracies))

Average accuracy over 10 fold Round Robing cross-validation :  0.622


In [0]:
accuracies

[0.64, 0.62, 0.635, 0.68, 0.59, 0.625, 0.64, 0.605, 0.595, 0.59]

In [0]:
# split = [x + 9 for x in range(2000 - 1) if x%10 == 0.]
# print(len(split))
# split

#### (Q3.5) Write code to calculate and report variance, in addition to the final performance. (1pt)

**Please report all future results using 10-fold cross-validation now
(unless told to use the held-out test set).**

In [0]:
# YOUR CODE HERE

print('Variance over 10 fold Round Robing cross-validation : ',np.var(accuracies))

Variance over 10 fold Round Robing cross-validation :  0.0007260000000000013


## Features, overfitting, and the curse of dimensionality

In the Bag-of-Words model, ideally we would like each distinct word in
the text to be mapped to its own dimension in the output vector
representation. However, real world text is messy, and we need to decide
on what we consider to be a word. For example, is “`word`" different
from “`Word`", from “`word`”, or from “`words`"? Too strict a
definition, and the number of features explodes, while our algorithm
fails to learn anything generalisable. Too lax, and we risk destroying
our learning signal. In the following section, you will learn about
confronting the feature sparsity and the overfitting problems as they
occur in NLP classification tasks.

#### (Q3.6): A touch of linguistics (1pt)

Taking a step further, you can use stemming to
hash different inflections of a word to the same feature in the BoW
vector space. How does the performance of your classifier change when
you use stemming on your training and test datasets? Please use the [Porter stemming
    algorithm](http://www.nltk.org/howto/stem.html) from NLTK.
 Also, you should do cross validation and concatenate the predictions from all folds to compute the significance.

In [0]:
# YOUR CODE HERE

def likelihood_stemmed(tr_rv, smoothing = False, print_len = False):
    stemmer = PorterStemmer()
    c_total = Counter()
    c_neg = Counter()
    c_pos = Counter()
    
    for rev in tr_rv:
        if rev['sentiment'] == 'NEG':
            for sentence in rev['content']:
                for token,pos in sentence:
                    c_total[stemmer.stem(token.lower())] += 1
                    c_neg[stemmer.stem(token.lower())] += 1
        else:
             for sentence in rev['content']:
                for token,pos in sentence:
                    c_total[stemmer.stem(token.lower())] += 1
                    c_pos[stemmer.stem(token.lower())] += 1           
                    

    num_words = len(list(c_total.keys()))
    
    p_dict_pos = {}
    p_dict_neg = {}
    
    if smoothing:
        # convert to logprobabilities : 
        for key,value in c_total.items():
            try:
                p_dict_neg[key] = math.log((c_neg[key] + 1)/(c_total[key] + num_words))
            except:
                p_dict_neg[key] = math.log((1)/(c_total[key] + num_words))
            try:
                p_dict_pos[key] = math.log((c_pos[key] + 1)/(c_total[key] + num_words))
            except:
                p_dict_pos[key] = math.log((1)/(c_total[key] + num_words))
                
    else:
        # convert to logprobabilities : 
        for key,value in c_total.items():
            try:
                p_dict_neg[key] = math.log((c_neg[key])/(c_total[key]))
            except:
                pass
            try:
                p_dict_pos[key] = math.log((c_pos[key])/(c_total[key]))
            except:
                pass
                                           
    
    if print_len:
        print(len(list(c_total.keys())))
    return p_dict_neg,p_dict_pos

def stem_classification(review, P_pos, P_neg, P_ln, P_lp):
    stemmer = PorterStemmer()
    prob_pos = P_pos
    prob_neg = P_neg
    random_likelihood = math.log(.5)
    for sentence in review["content"]:
        for word in sentence:
            try:
                prob_pos += P_lp[stemmer.stem(word[0].lower())] #prob_pos + P_lp[word[0]]
            except: 
                prob_pos += random_likelihood # unknown words have .5 probability of being positive or negative
            try:
                prob_neg += P_ln[stemmer.stem(word[0].lower())] # prob_neg + P_ln[word[0]]
            except:
                prob_pos += random_likelihood
                
    if prob_pos > prob_neg: #>
        return 'POS'
    else:
        return 'NEG'

def predict_stem(t_r, P_pos, P_neg, P_ln, P_lp):  
    predicts = [stem_classification(review, P_pos, P_neg, P_ln, P_lp) for review in t_r]
    return predicts
    
def train_stem(train, test,smoothing, print_len = False):
    prior_neg, prior_pos = calc_pior(train)
    like_neg, like_pos = likelihood_stemmed(train,smoothing, print_len)
    return prior_neg, prior_pos, like_neg, like_pos


def RR_crosvall_stemmed(reviews, smoothing = True, stemming = True):
    accuracies = []
    hits = []
    blocks = Round_Robin_split(reviews)
    for i in range(10):
        train = reduce(lambda x,y:x+y, [blocks[j] for j in range(10) if j != i])
        test = blocks[i]
        # train model:
        test_sent = [review['sentiment'] for review in test]
        if stemming:
            P_pos, P_neg, P_ln, P_lp = train_stem(train, test, smoothing)
            predictions = predict_stem(test,P_pos, P_neg, P_ln, P_lp)
        else:
            P_pos, P_neg, P_ln, P_lp = train_naive(train, test, smoothing)
            predictions = predict_naive_class(test,P_pos, P_neg, P_ln, P_lp)

        # score ; 
        accuracy = 0
        for i,pred in enumerate(predictions):
            if pred == test_sent[i]:
                accuracy +=1
                hits.append(1)
            else:
                hits.append(0)
        accuracy = accuracy/len(predictions)
        accuracies.append(accuracy)
        
    return accuracies,hits

acc_smooth, hits_smooth = RR_crosvall_stemmed(reviews,smoothing = True, stemming = False)
acc_smooth_stem, hits_smooth_stem = RR_crosvall_stemmed(reviews,smoothing = True, stemming = True)

In [0]:
print('NO STEMMING : ','average accuracy : ',np.average(acc_smooth),'variance : ',np.var(acc_smooth))
print('STEMMING : ','average accuracy : ',np.average(acc_smooth_stem),'variance : ',np.var(acc_smooth_stem))


NO STEMMING :  average accuracy :  0.622 variance :  0.0007260000000000013
STEMMING :  average accuracy :  0.5740000000000001 variance :  0.00021899999999999941


#### (Q3.7): Is the difference between NB with smoothing and NB with smoothing+stemming significant? (0.5pt)


In [0]:
# YOUR ANSWER HERE
sign_test(hits_smooth,hits_smooth_stem)

the difference is not significant


Decimal('0.6386701160706534185506881717')

#### Q3.8: What happens to the number of features (i.e., the size of the vocabulary) when using stemming as opposed to (Q3.2)? (0.5pt)
Give actual numbers. You can use the held-out training set to determine these.

In [0]:
# YOUR CODE HERE
print('Size of vocabulary without stemming : ')
calc_likelihood(reviews, smoothing = True, print_len = True)
print('Size of vocabulary with stemming : ')
likelihood_stemmed(reviews,smoothing = True, print_len = True)

Size of vocabulary without stemming : 
using add-1 smoothing
47743
Size of vocabulary with stemming : 
using add-1 smoothing
34200


(defaultdict(float,
             {'two': -3.735949641162258,
              'teen': -5.800490278727164,
              'coupl': -4.952256447956529,
              'go': -3.598627239772872,
              'to': -1.4597630804413808,
              'a': -1.4014398919492985,
              'church': -7.146217928479448,
              'parti': -5.830662256330254,
              ',': -1.154169644586729,
              'drink': -6.530440214290318,
              'and': -1.4953662592577721,
              'then': -3.81685585498595,
              'drive': -5.7457383715206864,
              '.': -1.1880498216755462,
              'they': -2.7535376240690344,
              'get': -3.100531134186956,
              'into': -3.3809457717992486,
              'an': -2.698176532736524,
              'accid': -6.729678359492409,
              'one': -2.696271736559229,
              'of': -1.4859596009501168,
              'the': -1.1522362652895857,
              'guy': -4.221732527109816,
              'die': -

#### Q3.9: Putting some word order back in (0.5+0.5pt=1pt)

A simple way of retaining some of the word
order information when using bag-of-words representations is to add **n-grams** features. 
Retrain your classifier from (Q3.4) using **unigrams+bigrams** and
**unigrams+bigrams+trigrams** as features, and report accuracy and statistical significances (in comparison to the experiment at (Q3.4) for all 10 folds, and between the new systems).





In [0]:
# YOUR CODE HERE


#### Q3.10: How many features does the BoW model have to take into account now? (0.5pt)
How does this number compare (e.g., linear, square, cubed, exponential) to the number of features at (Q3.8)? 

Use the held-out training set once again for this.


In [0]:
# YOUR CODE HERE

# Support Vector Machines (4pts)


Though simple to understand, implement, and debug, one
major problem with the Naive Bayes classifier is that its performance
deteriorates (becomes skewed) when it is being used with features which
are not independent (i.e., are correlated). Another popular classifier
that doesn’t scale as well to big data, and is not as simple to debug as
Naive Bayes, but that doesn’t assume feature independence is the Support
Vector Machine (SVM) classifier.

You can find more details about SVMs in Chapter 7 of Bishop: Pattern Recognition and Machine Learning.
Other sources for learning SVM:
* http://web.mit.edu/zoya/www/SVM.pdf
* http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf
* https://pythonprogramming.net/support-vector-machine-intro-machine-learning-tutorial/







Use the scikit-learn implementation of 
[SVM.](http://scikit-learn.org/stable/modules/svm.html) with the default parameters.



#### (Q4.1): Train SVM and compare to Naive Bayes (2pt)

Train an SVM classifier (sklearn.svm.LinearSVC) using your features. Compare the
classification performance of the SVM classifier to that of the Naive
Bayes classifier from (Q3.4) and report the numbers.
Do cross validation and concatenate the predictions from all folds to compute the significance.  Are the results significantly better?



In [0]:
import random
# YOUR CODE HERE

# ## convert reviews to vector representation:


def revs_to_vecs(reviews,lexicon, POS = False):
    # template for vectors:
    words = lexicon.keys()
    vectors = []
    for rev in reviews:
    # count word frequencies in reviews :
        c = Counter()
        for sentence in rev["content"]:
            for token, pos_tag in sentence:
                if POS:
                    c[(token.lower() + pos_tag)] += 1
                else:
                    c[token.lower()] += 1

        vector = np.zeros(len(words))
        for i,word in enumerate(words):
            try:
                vector[i] = c[word]
            except:
                pass
         # convert classes to binary
        if rev['sentiment'] == 'NEG':
            sentiment = 0
        else:
            sentiment = 1
        vectors.append((sentiment,vector))
   
    return vectors
lexicon = create_lexicon()[0]
vecs = revs_to_vecs(reviews,lexicon)
# shuffle to make balanced classes : 
random.shuffle(vecs)
# unzip:
classes, revs = list(zip(*vecs))
# convert to lists :
classes = list(classes)
revs = list(revs)

In [0]:
from functools import reduce
from sklearn.svm import LinearSVC
# test some accuracy in cross-validation
revs_split = Round_Robin_split(revs)
classes_split = Round_Robin_split(classes)


def SVM_crossval(revs_split,classes_split):

    cross_results = []
    for i in range(10):
        # train test split : 
        x_train = reduce(lambda x,y: x+y,[revs_split[x] for x in range(10) if x != i])
        y_train = reduce(lambda x,y: x+y,[classes_split[x] for x in range(10) if x != i])

        x_test = revs_split[i]
        y_test = classes_split[i]

        # train svm 
        clf = LinearSVC(max_iter = 2000)
        clf.fit(x_train,y_train)
        accuracy = clf.score(x_test,y_test)
        cross_results.append(accuracy)
    return cross_results

In [0]:
result = SVM_crossval(revs_split,classes_split)
print("SVM accuracy over 10 fold crossvalidation : ")
print("accuracy : ",np.average(result),'variance : ', np.var(result))

SVM accuracy over 10 fold crossvalidation : 
accuracy :  0.8074999999999999 variance :  0.001091249999999999


### More linguistics

Now add in part-of-speech features. You will find the
movie review dataset has already been POS-tagged for you. Try to
replicate what Pang et al. were doing:



####(Q4.2) Replace your features with word+POS features, and report performance with the SVM. Does this help? Do cross validation and concatenate the predictions from all folds to compute the significance. Are the results significant? Why?  (1pt)


In [0]:
# YOUR CODE HERE
def construct_dictonary(reviews):
    word_dict = {}
    for rev in reviews:
        # extract sentiment
        s = rev['sentiment']
        if s == 'NEG':
            s = 0
        else:
            s = 1
    
    # loop over sentences and words : 
        for sent in rev['content']:
            for wt_pair in sent:
                word = wt_pair[0].lower()
                word = word + wt_pair[1]
                if word in word_dict.keys():
                    word_dict[word] += s
                else:
                    word_dict[word] = s
    return word_dict

lexicon_POS = construct_dictonary(reviews)
vecs = revs_to_vecs(reviews,lexicon_POS, POS = True)
# shuffle to make balanced classes : 
random.shuffle(vecs)
# unzip:
classes, revs = list(zip(*vecs))
# convert to lists :
classes = list(classes)
revs = list(revs)

revs_split = Round_Robin_split(revs)
classes_split = Round_Robin_split(classes)


In [0]:
result = SVM_crossval(revs_split,classes_split)
print("SVM accuracy over 10 fold crossvalidation with POS tag: ")
print("accuracy : ",np.average(result),'variance : ', np.var(result))



SVM accuracy over 10 fold crossvalidation with POS tag: 
accuracy :  0.8355 variance :  0.0005522499999999991




[0.485, 0.495, 0.475, 0.495, 0.49, 0.48, 0.5, 0.46, 0.485, 0.465]


#### (Q4.3) Discard all closed-class words from your data (keep only nouns (N*), verbs (V*), adjectives (J*) and adverbs (RB*)), and report performance. Does this help? Do cross validation and concatenate the predictions from all folds to compute the significance. Are the results significantly better than when we don't discard the closed-class words? Why? (1pt)

In [0]:
# YOUR CODE HERE
def construct_dictonary(reviews):
    word_dict = {}
    for rev in reviews:
        # extract sentiment
        s = rev['sentiment']
        if s == 'NEG':
            s = 0
        else:
            s = 1
    
    # loop over sentences and words : 
        for sent in rev['content']:
            for wt_pair in sent:
                word = wt_pair[0].lower()
                word = word + wt_pair[1]
                if wt_pair[1] in ['N','V','J','RB']:
                    if word in word_dict.keys():
                        word_dict[word] += s
                    else:
                        word_dict[word] = s
    return word_dict

lexicon_POS = construct_dictonary(reviews)
vecs = revs_to_vecs(reviews,lexicon_POS, POS = True)
# shuffle to make balanced classes : 
random.shuffle(vecs)
# unzip:
classes, revs = list(zip(*vecs))
# convert to lists :
classes = list(classes)
revs = list(revs)

revs_split = Round_Robin_split(revs)
classes_split = Round_Robin_split(classes)


result = SVM_crossval(revs_split,classes_split)
print("SVM accuracy over 10 fold crossvalidation with POS tag: ")
print("accuracy : ",np.average(result),'variance : ', np.var(result))



SVM accuracy over 10 fold crossvalidation with POS tag: 
accuracy :  0.716 variance :  0.0006590000000000002




# (Q5) Discussion (max. 500 words). (5pts)

> Based on your experiments, what are the effective features and techniques in sentiment analysis? What information do different features encode?
Why is this important? What are the limitations of these features and techniques?
 


Based on experiments performed in this notebook, using a SVM for predicting sentiments in movie review seems the most direct and successful.
Naive bayes does already perform well above change even without any preprocessing and just using single word frequencies. Surprisingly smoothing did not actually improve accuracy. Probably this is due to the way the model handles words it hasn't seen in training. Here I chose to assume unknown words have an equal probability of being negative or positive. Smoothing actually biases the class of very infrequent words quite strongly based on only one observation.Results in this notebook suggest that assigning very unfrequent words an equal probability for every class actually works better then giving it a very high probability for the class is was actually seen in. 
Stemming decreased performance in naive bayes possibly due to the fact that some forms of words hold information about there context, and stemming actually can discard this information.
Unfortunately experiments using di- or tri-grams were not performed due to unlimited time. However most likely it would have resulted in a small increase in performance. One major drawback of using di- or tri-grams is the drastic increase in size of the lexicon. This increase in lexicon size also leads to a decrease in counts for specific entries in the lexicon. Hence to have the same accuracy on estimates for conditional probability, more training data would be needed. 
Using a SVM to predict review sentiments in a very straightforward manner, no preprocessing or 2/3-grams, does result in much higher accuracy then a simple naive bayes model. Most probably this is because using vector representations of reviews leaves more context in place and there for some effect of combinations of words, for example negation by not, can be learned by the model. 




# Submission 


In [0]:
# Write your names and student numbers here:
Laurens Stuurman 10648119



In [0]:
stemmer = PorterStemmer()
stemmer.stem('grateful')

'grate'

**That's it!**

- Check if you answered all questions fully and correctly. 
- Download your completed notebook using `File -> Download .ipynb` 
- Also save your notebook as a Github Gist. Get it by choosing `File -> Save as Github Gist`.  Make sure that the gist has a secret link (not public).
- Check if your answers are all included in the file you submit (e.g. check the Github Gist URL)
- Submit your .ipynb file and link to the Github Gist via *Canvas*. One submission per group. 

In [0]:
#     sentement = [review['sentiment'] for review in tr_rv]
#     for i in range(0,len(sentement)):
#         if sentement[i] == 'NEG':
#             for sentence in tr_rv[i]["content"]:
#                 for token, pos_tag in sentence:
#                     c_total[token.lower()] += 1
#                     c_neg[token.lower()] += 1
#         else:
#             for sentence in tr_rv[i]["content"]:
#                 for token, pos_tag in sentence:
#                     c_total[token.lower()] += 1
#                     c_pos[token.lower()] += 1
                    
#     print(len(list(get_unique_from_counter(c_total))))
#     print(len(list(c_total.keys())))


        
#     if smoothing:
#         print('using add-1 smoothing')
#         p_list_neg = defaultdict(float, {word : math.log((c_neg[word] + 1)/(c_total[word]+ num_words)) for word in c_total.keys()})
#         p_list_pos = defaultdict(float, {word : math.log((c_pos[word]+ 1)/(c_total[word]+ num_words)) for word in c_total.keys()})
#     else:
#         print('using smallest probility in 0 count')
#         p_list_neg = defaultdict(float, {word : 0.0 if c_neg[word] == 0 else math.log(c_neg[word]/c_total[word]) for word in c_total.keys()})
#         p_list_pos = defaultdict(float, {word : 0.0 if c_pos[word] == 0 else math.log(c_pos[word]/c_total[word]) for word in c_total.keys()})
    
    #math.log(min(list(c_total.values())))    math.log(min(list(c_total.values())))
    # give unknown words smallest likelihood and making shure there are  know 0.0s anymore: 
#     for key,value in p_list_neg.items():
#         if value == 0.0:
#             p_list_neg[key] = min(list(p_list_neg.values()))
#     for key,value in p_list_pos.items():
#         if value == 0.0:
#             p_list_neg[key] = min(list(p_list_pos.values()))