# Context Sensitive Lexicon Analysis

### Motivation

Broadly, we started our exploration in techniques used in Aspect Based Sentimental analysis which is a crucial part of Opinion mining. The Aspect based concerns itself with two major tasks : Identifying features (or feature extraction) and score the sentiment the user has about the feature. Now the latter part involves using pre-defined lexicon libraries to put a “score” which justifies user’s feeling and normally accepted methods are lexicon-based. 

This way we went specific to improving the lexicon libraries and particularly the scores they have because it greatly effects the score later on in  Aspect SA stage. We were particularly interested in the “localness” or the effect of a context on a lexicon score. The Paper “Inducing Domain-Specific Sentiment Lexicons from Unlaabeled Copora” exactly reviews this problem and tackles it variety of method.

Mainly we,
	- Re-obtain the results from the paper and the understand underlying algorithms.
	- Researched on the evaluation methods for improved lexicons because we were not satisfied with it. 
	- Change the parameter to models such as Word-embeddings or seed words to produce new context-Sensitive lexicons.

A very intuitive explanation behind this method is that the model take existing lexicons(binary or continuous scores), associate their word-vectors and use the random-walk label propagation algorithm to score words.

The random-walk propagation essentially leverages the edge-weights as provided by distance between two words in embeddings. These distances are the source of context-sensitive information which the model make uses of. Starting from the Positive seed words, the decay in the polarity of score (from +1/-1) increases with the edges and essentially diminishes after traversing certain number of edges.

## Approach 

The main approach is using word-embeddings to get a semantic space where the weight edges defined by : 

$E_{i,j}\ =\ \arccos\big( -\frac{w_i w_j}{|w_i| |w_j|}\ \big)$

In this semantic space, the label propagation algorithm propagates the sentiment of seedwords to its neighbors where the sentiment of a word is proportional to the number of edges between the seed-word and the word i. Or 

$score_w\ \propto \ distance(w,score_{seed_i}) \forall i \in number\ of\ seeds (positive\ or\ negative) $. 

#### The code for label propagation probabilistic algorithm is as follows (reverse engineered psudocode from python code) : 

    
    words = embeddings.iw  ## unique tokens
    
    ## dense embeddings or M*N matrix (M=tokens, N=embedding vector of tokens)
    M = transition_matrix(embeddings, **kwargs) 
    
    ## subset of words which are positive. subset of words which are negative
    pos, neg = teleport_set(words, positive_seeds), teleport_set(words, negative_seeds) 
    
    ## iterative changes to seed values
    def update_seeds(r):
        r[pos] = 1
        r[neg] = -1
    r = run_iterative(M, np.zeros(M.shape[0]), update_seeds, **kwargs)
    return {w: r[i] for i, w in enumerate(words)}
    
    where 
    
    ## R_new = M*R until |R_new-R|>tolerance
    def run_iterative(M, r, update_seeds, max_iter=50, epsilon=1e-6, **kwargs):
    for i in range(max_iter):
        last_r = np.array(r)
        r = np.dot(M, r)
        update_seeds(r)
        if np.abs(r - last_r).sum() < epsilon:
            break
    return r



Note : I tried writing it into a pseudo-code format, the lines were not stacking nicely as they were in Anoop's hw description, so I wrote python and pretty close to pseudo code format


#### Other approach is the Graph Propagation algorithm. Going through the code, it was evident that graph prop, created a similarity matrix and propagated on that 

## Data

https://github.com/williamleif/socialsent 
The main framework of the project comes from this. We made tonnes of change to adapt to our specific situation. Based on the paper “Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora”. 

https://github.com/zeeeyang/lexicon_rnn
This is the code base for the paper “Context-Sensitive Lexicon Features for Neural Sentiment Analysis”

The right lexicon libraries comes from these two repositories. 

https://www.kaggle.com/yelp-dataset/yelp-dataset
Took the yelp_academic_dataset_review.json, extracted the 100k sentences out of it. 
\ ## head -n 100000 > yelp_100k.json

https://github.com/cmasch/word-embeddings-from-scratch/blob/master/Create_Embeddings.ipynb
We altered this code to create our own embeddings on the yelp_100k data. 

https://github.com/dipanjanS/text-analytics-with-python 
We took the movie_review.csv from this GitHub which is the codebase for the book “Text-analytics with python”. We read the chapter on Sentimental Analysis and borrowed the code for unsupervised lexicon-based evaluation.


Research Papers : 

1. 

## Code

As we explained in broad about the paper’s approach that we considered. We initially thought it would not take much time to implement the paper as all the code was made open-source. However it came to our surprise that the code was very dense and un-documented for whole part. Also being written in python2 adds to the already worsened situation.

To deal, we first went over the package and understood the files and the interaction between them. Extracted those files into the project’s directory, made changes which are python version specific and assembled a version of the paper’s algorithm. Our majority of the time took for this purpose. 


Our side of contribution mainly lined up on gathering the code in different places as we referred four different GitHub works and putting up together to have a consistent and logical project workflow. In addition to it, we altered a lot of hyper-paramter(free-variable) for the models such as word-embeddings, lexicons and approach 

From the scratch our team did the following : 

1. Pre-train the word-embeddings on a yelp-dataset
2. Conduct an evaluation on a Corpus with new and old lexicons for comparisons.
3. Expanded the lexicons libraries as stated in [GOOGLE PAPER]
4. Tuning hyper-parameters to see the effect on the final lexicon scoring. This included : 
    1. Generating our own WE and tagging the sentiment scores on a held-out data.
    2. Experimented with the Seedwords as given originally in the paper


## Experimental Setup

The default evaluations as cited in the paper are : 
		1. AUC score on Binary classification (positive and negative) : 
		2. Ternrary score (positive, negative and neural words)
		3. Kendall rank-correlation

 	

The actual SENTPROP package which have all the methods defined for the new lexical polarity induction were available, but they required specific word-embeddings which were not available in the documentation. 

However, the best algorithm SENTPROP which is random-walk using label-propagation method we were able to use because it relied on popular embeddings such as Google or Glove. 

Other prominent methods included PMI, densify (orthagonal projection of embeddings) 

So to the code we were able to run, we choose to do this : 

1. Load Lexicons (Financial, Standard-English, Sentiwn140, SST and others)
2. Use Glove and custom trained WE on yelp 100k sentences. Although we have options to incorporate other embeddings such as twitter but big embeddings were not accomodate nicely in our VM. So just restricted to the embeddings< 1 GB.
4. See the AUC, ternary evaluation and kendall scores for all lexicons. we have three small vocab lexicons for easy testing(finance,standard-english,twitter) and three large vocab lexicons (sst,senti140 and sentiwordnet)
5. We get the three metrics for five lexicons (three are small vocab, three are big vocab) and change word-embedding. These gives us 3*5*2 = 30 numbers in the end

6. Evaluated the impact of the lexicon change through generating lexicon based sentiments through scoring function : 

$polarity_{sentence}= \frac{\sum_{i=1}^{n} Score_{positive_{i}} - \sum_{i=1}^{n}Score_{negative_{i}}}{T}$
where
$Score_{positive_i} and Score_{negative_i} $ are negative and positive score for token $i$ in a review

Obtain the Confusion Matrix with Precision,Recall,F1-score on correct/incorrect sentiment predictions. For this we uses movie_reviews dataset from Text_Analytics_With_python github. In this study if we vary lexicon or word-embedding or seedwords or multiple, we will get different classification scores. This is sort of grid search with lexicon,word-embedding,seedwords as hyperparameters. But we do not cover all the search-space. 

We use embedding = glove, seedwords = default and just varies the lexicon libraries and see how new calculated compare classification scores compared to old lexicons.

7. If this succeeds then advance it and replace with a context-lexicon based neural network as mentioned in paper "Context-Sensitive Lexicon Features for Neural Sentiment Analysis"

## Results and Analysis

Because the primary purpose of the study is to experiment with the variables and see their effects on the final metrics. So first of all we calculated the arc,ternrary and kendall correlation for three lexicon sets finance, standard-english and twitter.

The results are as below : 

|  Embedding | Yelp       | Glove | Yelp       | Glove    
| :-------   | :-------   | :----------:| :-------   | :----------: 
| Technique  |  Label-propagation | Graph-propagation |  Label-propagation | Graph-propagation
| Finance    |  (0.94, 0.46, 0.33) | (0.76, 0.46, 0.32) |  (0.94, 0.46, 0.33) | (0.76, 0.46, 0.32)
| SE         |  (0.81, 0.24, 0.37) | (0.59, 0.24, 0.21) |  (0.94, 0.46, 0.33) | (0.59, 0.24, 0.21) 
| Twitter    |  (0.86, 0.38, 0.4) | (0.71, 0.38, 0.39) |  (0.74, 0.38, 0.40) | (0.71, 0.38, 0.39)

The research author gave these numbers for these lexicons (Only SENTPROP method) : 

|  Embedding |  auc | ternrary F1 | tao (kendall)      
| :-------   | :-------   | :----------:| :----------:|
| Finance | 0.91 | 0.63 | -
| SE | 0.83 | 0.53 | 0.28
| Twitter   |  0.86 | 0.6 | 0.5

All these lexicons were trained in different word-embeddings which were domain-sensitive with vocab > $2*10^7$ tokens.

As we can see, we calculated the three evaluation metrics by changing the word-embedding settings, lexicons and propagation technique. Some inferrences : 

- Changing the embedding does not influence the metric scores in Graph-Propagation method
- Comparing the results of two techniques, among all the three lexicons, the Graph propagation method gave worsened scores as compared the label propagation.
- Comparing again the three lexicons, the fiance lexicon have the most promising scores with highest three metrics compared two the other lexicons libraries. 
- Again to note that these three are small vocab lexcicons (<10k)

At average for all three lexicons and other params, the three-metric pair is worse in our exerperiment when compared to authors. The reason being the limited word-embedding size and using same embeddings for different domains. Author literally emphasised on the fact to use domain-specific word-embedding to induce lexicons to perform sentiment analysis on that domain.


## Future Work

What could be fixed in your approach. What you did not have time to finish, but you think would be a useful addition to your project. In the future we think the following things can be expanded because either we did not had enough computational resource or time to dig deeper.

1. Implement the Bi-directional LSTM as stated in the paper “Context-Sensitive Lexicon Features for Neural Sentiment Analysis”. This will act as another re-affirmation to the effect of re-scored lexicons by label-propagation method. If conducted. If we had more time, this probably had made study more complete.

2. Write an automated method for the seed-generation given a context. The probable heuristic includes frequency method for adjectives, pairs coming near the known sentiment words. Can use the Google paper for lexicon expansion (with synonyms, antonym, hyponyms,hypernym relations).

3. Use some techniques as given in the google paper (which builds a summarize in the end), for the aspect and dynamic extraction of features. This is more on the side of lexicon creationg rather than scoring.

In [1]:
import pandas as pd
import numpy as np
import nltk
import itertools
from nltk.corpus import sentiwordnet as swn
from normalization import normalize_accented_characters, html_parser, strip_html
from utils import display_evaluation_metrics, display_confusion_matrix, display_classification_report
nltk.download('averaged_perceptron_tagger')
nltk.download('sentiwordnet')
import dynet as dy
from nltk.corpus import wordnet as wn
from collections import defaultdict
import gc
from nltk.collocations import *
import socialsent_util
import polarity_induction_methods
from representations.representation_factory import create_representation
import constants
from evaluate_methods import run_method
import polarity_induction_methods
from evaluate_methods import binary_metrics,ternary_metrics

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/ubuntu/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /home/ubuntu/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!
data =  /home/ubuntu/workspace/nlpclass-1187-g-Mad_Titans/project/embeddings_socialsent/


Using Theano backend.


### Comparing the lexicons from stanford paper "Incuding Domain-Specific Sentiment Lexicons from Unalabeled Copora"


It uses a simple heuristic to compare lexicons. For two given lexicon libraries, it counts the common words between thme and the scores. The agreement heuristic measures the mutual lexicons in both libs having same scores. 

In [2]:
def load_lexicon(name, remove_neutral=True):
    lexicon = socialsent_util.load_json("./lexicons_socialsent/"+ name + '.json')
    return {w: p for w, p in lexicon.items() if p != 0} if remove_neutral else lexicon

def compare_lexicons_binary(print_disagreements=False):
    lexicons = {
        "inquirer": load_lexicon("inquirer", False),
        "mpqa": load_lexicon("mpqa", False),
        "bingliu": load_lexicon("bingliu", False),
    }

    for l in lexicons:
        print( l, len(lexicons[l]), len([w for w in lexicons[l] if lexicons[l][w] != 0]))

    for l1, l2 in itertools.combinations(lexicons.keys(), 2):
        ps1, ps2 = lexicons[l1], lexicons[l2]
        common_words = set(ps1.keys()) & set(ps2.keys())
        print( l1, l2, "agreement: {:.2f}".format(
            100.0 * sum(1 if ps1[w] == ps2[w] else 0 for w in common_words) / len(common_words)))
        common_words = set([word for word in ps1.keys() if ps1[word] != 0]) & \
                       set([word for word in ps2.keys() if ps2[word] != 0])  
        print (l1, l2, "agreement ignoring neutral: {:.2f}".format(
            100.0 * sum(1 if ps1[w] * ps2[w] == 1 else 0 for w in common_words) / len(common_words)))
        
        if print_disagreements and l1 == 'opinion' and l2 == 'inquirer':
            for w in common_words:
                if lexicons[l1][w] != lexicons[l2][w]:
                    print (w, lexicons[l1][w], lexicons[l2][w])
      
    
## ALL THESE LEXICONS ARE 2-CLASS SENTIMENTS. 1 = POSITIVE; -1 = NEGATIVE
finance_lexicons=load_lexicon('finance')
bingliu_lexicons=load_lexicon('bingliu')
inquirer_lexicons=load_lexicon('inquirer')
mpqa_lexicons=load_lexicon('mpqa')
twitter_lexicons=load_lexicon('twitter')

#### Now we import the lexicons as mentioned in the paper "Context-Sensitive Lexicons Features for Neural Sentiment Analysis".

- senti140 : built from the point-wise mutual information using distant supervision 
- sentiwn : From the SentimentWordNet3.0. Originally was probability between 0 and 1 but scaled to -2,2 but author
- sst : SD-Lex as mentioned in the paper, excluding the all neutral words and adding the aforementioned offset -2 to each entry

In [3]:

ZEEYANG_LEXICONS='lexicons_zeeyang'
def read_zeeyang_lexicons(fname) : 
    
    polarities=defaultdict()
    for line in open(fname,'r') : 
        token=line.split(" ")[0]
        score=line.split(" ")[1]
        polarities[token]=score
        
    return polarities

## THESE LEXICONS HAVE CONTINOUS SCORES (BETWEEN -1 AND 1 )
senti140_lexicons=read_zeeyang_lexicons(ZEEYANG_LEXICONS+"/sentiment140.lex")
sentiwn_lexicons=read_zeeyang_lexicons(ZEEYANG_LEXICONS+"/sentiwordnet.lex")
sst_lexicons=read_zeeyang_lexicons(ZEEYANG_LEXICONS+"/stanford.tree.lexicon")


## THE CODE CORRECTS THE SENTI140 and SENTIWORDNET

def correct_format(lexicons) : 
    new_lexicons=defaultdict()
    for w in lexicons: 
        score=lexicons[w]
        score=score[:-2]
        score=float(score)
        new_lexicons[w]=score
    return new_lexicons

senti140_lexicons=correct_format(senti140_lexicons)
sentiwn_lexicons=correct_format(sentiwn_lexicons)


### Polarity values of imported lexicons

In [4]:
print("POLARTTY VALUES OF IMPORTED LEXICONS")
print("Finance")
print(pd.DataFrame(list(finance_lexicons.values()),columns=['score'])['score'].unique())
print("Bingliu")
print(pd.DataFrame(list(bingliu_lexicons.values()),columns=['score'])['score'].unique())
print("Inquirer")
print(pd.DataFrame(list(inquirer_lexicons.values()),columns=['score'])['score'].unique())
print("Twitter")
print(pd.DataFrame(list(twitter_lexicons.values()),columns=['score'])['score'].unique())

print("Senti140")
print(pd.DataFrame(list(senti140_lexicons.values()),columns=['score'])['score'].unique())
print("SentiWordNet")
print(pd.DataFrame(list(sentiwn_lexicons.values()),columns=['score'])['score'].unique())
print("SST")
print(pd.DataFrame(list(sst_lexicons.values()),columns=['score'])['score'].unique())


LEXICON_LIST=[finance_lexicons,bingliu_lexicons,inquirer_lexicons,twitter_lexicons,senti140_lexicons,sentiwn_lexicons,sst_lexicons]
LEXICON_LABELS=['Finance','Bingliu','Inquirer','Twitter','Senti140','Sentiwordnet','SST']

print(" VOCAB LENGTH OF THE LEXION LIBRARIES ")
for i,l in enumerate(LEXICON_LIST) : 
    
    
    print("========== {} ========== ".format(LEXICON_LABELS[i]))
    print(len(LEXICON_LIST[i]))

POLARTTY VALUES OF IMPORTED LEXICONS
Finance
[-1  1]
Bingliu
[-1  1]
Inquirer
[-1  1]
Twitter
[-1  1]
Senti140
[-1.25  -0.798  0.049 ... -1.033 -1.876 -2.   ]
SentiWordNet
[2.25 0.75 1.75 3.25 1.   0.25 1.25 2.5  0.5  1.5  3.   2.75 3.5  3.75
 4.   0.  ]
SST
['2\n' '3\n' '1\n' '4\n' '0\n']
 VOCAB LENGTH OF THE LEXION LIBRARIES 
2709
6785
3457
1277
62468
32980
19465


## Compare Lexicons with mutual word scores

In [5]:
## COMPARING THE BINARY LEXICONS 
compare_lexicons_binary()

inquirer 8640 3457
mpqa 6886 6462
bingliu 6785 6785
inquirer mpqa agreement: 82.47
inquirer mpqa agreement ignoring neutral: 98.50
inquirer bingliu agreement: 84.39
inquirer bingliu agreement ignoring neutral: 98.74
mpqa bingliu agreement: 99.19
mpqa bingliu agreement ignoring neutral: 99.44


#### Lexicon Induction : the idea is to generate the lexicons provided the corpus. This method makes sure that the lexicon are sensitive to the context they are drawn from. They may prove useful if we would like to assess them in a simiar context. For instance, financial lexicons will reflect better sentiments than using general lexicons such as SentiWordNet. Three ways purposed for induction 


### POLARITY INDUCTION METHOD : This is used for re-scoring of the lexicons(tokens) by taking information from the word-embeddings (domain-specific), positive and the negative seed words.

This just calls the actual method (method=polarity_induction_methods.label_propagation with seeds and embeddings)

In [6]:


### THIS IS THE FUNCTION FOR INDUCING LEXICONS GIVEN THE SEEDS, EMBEDDINGS AND THE METHOD.
def run_method(positive_seeds, negative_seeds, embeddings, transform_embeddings=False, post_densify=False,
        method=polarity_induction_methods.densify, **kwargs):
    
    print("THE INTERNAL RUN_METHOD IS RUNNING...")
    
    if transform_embeddings:
        print ("Transforming embeddings...")
        embeddings = embedding_transformer.apply_embedding_transformation(embeddings, positive_seeds, negative_seeds, n_dim=50)
    
    print("AFTER EMBEDDING TRANSFORM ",embeddings)
    
    ## using densify method
    if post_densify:
        polarities = method(embeddings, positive_seeds, negative_seeds, **kwargs)
        top_pos = [word for word in 
                sorted(polarities, key = lambda w : -polarities[w])[:150]]
        top_neg = [word for word in 
                sorted(polarities, key = lambda w : polarities[w])[:150]]
        top_pos.extend(positive_seeds)
        top_neg.extend(negative_seeds)
        return polarity_induction_methods.densify(embeddings, top_pos, top_neg)
    
    
    positive_seeds = [s for s in positive_seeds if s in embeddings]
    negative_seeds = [s for s in negative_seeds if s in embeddings]
    
    
    return method(embeddings, positive_seeds, negative_seeds, **kwargs)


## LEXICON INDUCTION ON STANDARD ENGLISH 

Paper referred here is "Inducing Domain-Specific Sentiment Lexicons from Unlabaled Corpora"

In down here, The SENTPROP as mentioned in the primary lexicon induction paper uses label propagation with probabilities. the method itself can be found in the polarity_induction_methods.py with approaches too.

We were able to run the label_propagate_prob on our system because it compute the GIGA embeddings (see the representation/embeddings.py ) to prepare lexicon embeddings and put inside the algorithm.

Other wonderful methods such as pmi or graph_propagation used the SVD Embedding which needs more input than a standard word2vec libraries. It expected some numpy files but there was no documentation in the code about what those files denote, So we were not able to run. 

As cited in the paper, the SVD embedding style gave superior results. But we implemented it due to above bottleneck. Even people have strong opinions to deviate away from Neural word2vec models to more models such as SVD.
https://multithreaded.stitchfix.com/blog/2017/10/18/stop-using-word2vec/


Only working model possible is the label_propagate_prob. But if you want to incorporate other methods, possibly look into polarity_induction_methods.py, each methods use embedding styles described in the representation/embeddings.py. The embeddings.py expectes some files whose path is defined in constants.py

In [7]:
def calculate_new_lexicon_polarities(parent_lexicon,positive_seeds,negative_seeds,technique,embedding_type) : 
    
    ## LOAD THE WORD-EMBEDDINGS : 
    EMBEDDEING_TYPE=''
    if embedding_type=='GLOVE' : 
        EMBEDDING_TYPE=constants.GLOVE_EMBEDDINGS
    elif embedding_type=='YELP' : 
        EMBEDDING_TYPE=constants.YELP_EMBEDDINGS
        
        
    eval_words = set(parent_lexicon.keys())

    #EMBEDDING_TYPE = constants.GLOVE_EMBEDDINGS
    EMBEDDING = create_representation("GIGA", constants.GLOVE_EMBEDDINGS,eval_words.union(positive_seeds).union(negative_seeds))

    embed_words = set(EMBEDDING.iw)
    eval_words = eval_words.intersection(EMBEDDING)
    eval_words = [word for word in eval_words  if not word in positive_seeds and not word in negative_seeds]

    ## TRAIN THE BEST ALGORITHM : SENTPROP and get polarities re-scored
    

    
    polarities=defaultdict()
    if technique=='label_propagate_prob' : 
        
        polarities = run_method(positive_seeds, negative_seeds, 
                    EMBEDDING.get_subembed(set(eval_words).union(negative_seeds).union(positive_seeds)),
                    method=polarity_induction_methods.label_propagate_probabilistic,beta=0.99, nn=10)
        
        return polarities,eval_words
    
    
    elif technique=='graph_propagate' : 
        polarities = run_method(positive_seeds, negative_seeds, 
                    EMBEDDING.get_subembed(set(eval_words).union(negative_seeds).union(positive_seeds)),
                    method=polarity_induction_methods.graph_propagate.T=10)
        
        return polarities,eval_words
        
    
    elif technique == 'pmi' : 
        
        #hist_words = set(hist_embed.iw)
        #eval_words = eval_words.intersection(hist_words)

        #eval_words = [word for word in eval_words if not word in positive_seeds and not word in negative_seeds] 
        print( "Evaluating with ", len(eval_words), "out of", len(lexicon))

        print ("PMI")
        polarities = run_method(positive_seeds, 
                                negative_seeds,
                                EMBEDDING.get_subembed(set(eval_words).union(negative_seeds).union(positive_seeds)),
                                method=polarity_induction_methods.bootstrap,
                                score_method=polarity_induction_methods.pmi,
                                )
        return polarities,eval_words


                

### SEEDS WORDS

We took the default seed words for initializing the label propagation. There are some works such as https://pdfs.semanticscholar.org/a30b/57d80e9f5665f2cd5a2e65887c673e32e8b1.pdf?fbclid=IwAR3bBQIRDf_Sy9fh5Aj1xVVvt2Ki_L6YmzsLs2XBaPR3hMXkLq_V3pwnuJI 

to have more rationale thought into choosing the seeds words. However we kept it as it is and suggested as a future work suggestion in the end. These Lexicon are from the https://github.com/williamleif/socialsent


In [8]:
## These Lexicon are from the https://github.com/williamleif/socialsent
## TRAINING THE LABEL-PROPAGATION FOR THE RE-SCORING OF POLARITIES FROM PRE-DETERMINED LEXICONS (MADE FROM WORD EMBEDDINGS)

## INQUIRER is a standard english lexicon
INQUIRER = load_lexicon("inquirer", remove_neutral=False)

FINANCE_LEXICONS=load_lexicon('finance')
TWITTER_LEXICONS=load_lexicon('twitter')

THREE_WAY_LEXICON = kuperman = load_lexicon("kuperman", remove_neutral=False)

POSITIVE_FINANCE = ["successful", "excellent", "profit", "beneficial", "improving", "improved", "success", "gains", "positive"]
NEGATIVE_FINANCE = ["negligent", "loss", "volatile", "wrong", "losses", "damages", "bad", "litigation", "failure", "down", "negative"]

POSITIVE_SE = ["good", "lovely", "excellent", "fortunate", "pleasant", "delightful", "perfect", "loved", "love", "happy"] 
NEGATIVE_SE = ["bad", "horrible", "poor",  "unfortunate", "unpleasant", "disgusting", "evil", "hated", "hate", "unhappy"]


POSITIVE_TWEET = ["love", "loved", "loves", "awesome",  "nice", "amazing", "best", "fantastic", "correct", "happy"]
NEGATIVE_TWEET = ["hate", "hated", "hates", "terrible",  "nasty", "awful", "worst", "horrible", "wrong", "sad"]



## EVALUATING THE EFFECTIVENESS OF THE NEW LEXICON POLARITIES.

- Calculates the ROC auc scores with the new polarities comparing to the earlier lexicon binary classification (1 = positive and 0 = negative).
- Interpretation of the score. Higher the score, it means that new polarities (continous sentiment scores) confirms with the binary sentiment scores
- Metrics are : 

    - AUC score : Treats the original lexicons as true labels and new lexicons continous scores as predicted probabilities.
     
     #### $auc\_score(y_{true}, y_{probability})$
      where $y_{true}$ is old lexicon value and $y_{probability}$ is new lexicon continous values.
     #### $best\ accuracy\ = \max(\{\frac{1+i-\sum{j=1}^{i}+p}{n}\}_i) \forall i= \{1..n\}$
    
    - Ternary score : Also considers the neutral words through incorporating another third-lexicon library denoted by tau_lexicon. We have little understanding of this part though.
    
     ##### $cmn_{f1} = F1_{score}(y_{true},\ \{ mode(y_{true}) \}_{i=1..y_{true}})$ 
     ##### $maj_{f1} = F1_{score}(y_{true},l)\ ,\ l=\ 1\ if y_{prob}>thresh_{+} otherwise 0$
     
     ##### $thresh\ defined\ for\ positives\ and\ negatives$
    
    - *Kendall Correlation* : Ranking confidence scores. As cited in the paper "Kendall t-rank correlation with continuous human-annotated polarity scores". It is used to compare to order sets and higher confidence score mean that order is same of objectis in the two sets. Here original 
    
    ### $\tau=\frac{n_c-n_d}{n(n-1)/2}$
    where $n_c$ is concordant or same pairs and $n_d$ is discordant pairs. 


In [9]:
def evaluate_method_performance(polarities,INITIAL_LEXICON_LIB,domain,eval_words) : 
    
    ## EVALUATING THE EFFECTIVENESS OF THE NEW LEXICON POLARITIES.
    

    acc, auc, avg_prec = binary_metrics(polarities, INITIAL_LEXICON_LIB, eval_words)
    if auc < 0.5:
        polarities = {word:-1*polarities[word] for word in polarities}
        acc, auc, avg_prec = binary_metrics(polarities, INITIAL_LEXICON_LIB, eval_words)

    print("============== DOMAIN : {} ==============".format(domain))
    print ("Binary metrics:")
    print( "==============")
    print ("Accuracy with optimal threshold: {:.4f}".format(acc))
    print ("ROC AUC Score: {:.4f}".format(auc))
    print ("Average Precision Score: {:.4f}".format(avg_prec))


    tau, cmn_f1, maj_f1, conf_mat = ternary_metrics(polarities, INITIAL_LEXICON_LIB, eval_words, tau_lexicon=THREE_WAY_LEXICON)
    print ("Ternary metrics:")
    print( "==============")
    print ("Majority macro F1 baseline {:.4f}".format(maj_f1))
    print ("Macro F1 with cmn threshold: {:.4f}".format(cmn_f1))
    if tau:
        print ("Kendall Tau {:.4f}".format(tau))
    print ("Confusion matrix: ")
    print (conf_mat)
    print( "Neg :", float(conf_mat[0,0]) / np.sum(conf_mat[0,:]))
    print ("Neut :", float(conf_mat[1,1]) / np.sum(conf_mat[1,:]))
    print ("Pos :", float(conf_mat[2,2]) / np.sum(conf_mat[2,:]))

### Technique=LabelPropagation, Embedding=Yelp

In [33]:
finance_polarities,finance_eval=calculate_new_lexicon_polarities(FINANCE_LEXICONS,POSITIVE_FINANCE,NEGATIVE_FINANCE,'label_propagate_prob','YELP')
evaluate_method_performance(finance_polarities,FINANCE_LEXICONS,'FINANCE',finance_eval)
gc.collect()

standard_english_polarities,se_eval=calculate_new_lexicon_polarities(INQUIRER,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob','YELP')
evaluate_method_performance(standard_english_polarities,INQUIRER,' STANDARD ENGLISH ',se_eval)
gc.collect()

twitter_polarities,se_eval=calculate_new_lexicon_polarities(TWITTER_LEXICONS,POSITIVE_TWEET,NEGATIVE_TWEET,'label_propagate_prob','YELP')
evaluate_method_performance(twitter_polarities,TWITTER_LEXICONS,' TWITTER',se_eval)
gc.collect()

THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f9db0d6a8d0>
Binary metrics:
Accuracy with optimal threshold: 1.7760
ROC AUC Score: 0.9452
Average Precision Score: 0.7901
Ternary metrics:
Majority macro F1 baseline 0.4642
Macro F1 with cmn threshold: 0.1177
Kendall Tau 0.3314
Confusion matrix: 
[[   0    1 2239]
 [   0    0    0]
 [   0    0  345]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f9db0a23c50>
Binary metrics:
Accuracy with optimal threshold: 1.1756
ROC AUC Score: 0.8184
Average Precision Score: 0.7847
Ternary metrics:
Majority macro F1 baseline 0.2497
Macro F1 with cmn threshold: 0.1024
Kendall Tau 0.3770
Confusion matrix: 
[[   0    1 1874]
 [   0    0 5106]
 [   0    0 1547]]
Neg : 0.0
Neut : 0.0
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f9db1075b70>
Binary metrics:
Accuracy with optimal threshold: 0.8612
ROC AUC Score: 0.7370
Average Precision Score: 0.8119
Ternary metrics:
Majority macro F1 baseline 0.3844
Macro F1 with cmn threshold: 0.3844
Kendall Tau 0.4029
Confusion matrix: 
[[  0   1 313]
 [  0   0   0]
 [  0   0 522]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


0

### Technique=LabelPropagation, Embedding=Glove


In [20]:
finance_polarities,finance_eval=calculate_new_lexicon_polarities(FINANCE_LEXICONS,POSITIVE_FINANCE,NEGATIVE_FINANCE,'label_propagate_prob','GLOVE')
evaluate_method_performance(finance_polarities,FINANCE_LEXICONS,'FINANCE',finance_eval)
gc.collect()

standard_english_polarities,se_eval=calculate_new_lexicon_polarities(INQUIRER,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob','GLOVE')
evaluate_method_performance(standard_english_polarities,INQUIRER,' STANDARD ENGLISH ',se_eval)
gc.collect()

twitter_polarities,se_eval=calculate_new_lexicon_polarities(TWITTER_LEXICONS,POSITIVE_TWEET,NEGATIVE_TWEET,'label_propagate_prob','GLOVE')
evaluate_method_performance(twitter_polarities,TWITTER_LEXICONS,' TWITTER',se_eval)
gc.collect()

Binary metrics:
Accuracy with optimal threshold: 1.7760
ROC AUC Score: 0.9479
Average Precision Score: 0.7981
Ternary metrics:
Majority macro F1 baseline 0.4642
Macro F1 with cmn threshold: 0.1177
Kendall Tau 0.3313
Confusion matrix: 
[[   0    1 2239]
 [   0    0    0]
 [   0    0  345]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


Binary metrics:
Accuracy with optimal threshold: 1.1777
ROC AUC Score: 0.8146
Average Precision Score: 0.7821
Ternary metrics:
Majority macro F1 baseline 0.2497
Macro F1 with cmn threshold: 0.1024
Kendall Tau 0.3752
Confusion matrix: 
[[   0    1 1874]
 [   0    0 5106]
 [   0    0 1547]]
Neg : 0.0
Neut : 0.0
Pos : 1.0


  'precision', 'predicted', average, warn_for)


Binary metrics:
Accuracy with optimal threshold: 0.8624
ROC AUC Score: 0.7406
Average Precision Score: 0.8141
Ternary metrics:
Majority macro F1 baseline 0.3844
Macro F1 with cmn threshold: 0.3844
Kendall Tau 0.4053
Confusion matrix: 
[[  0   1 313]
 [  0   0   0]
 [  0   0 522]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


0

### Technique=Graphpropagation, Embedding=Yelp

In [10]:
finance_polarities,finance_eval=calculate_new_lexicon_polarities(FINANCE_LEXICONS,POSITIVE_FINANCE,NEGATIVE_FINANCE,'graph_propagate','YELP')
evaluate_method_performance(finance_polarities,FINANCE_LEXICONS,'FINANCE',finance_eval)
gc.collect()

standard_english_polarities,se_eval=calculate_new_lexicon_polarities(INQUIRER,POSITIVE_SE,NEGATIVE_SE,'graph_propagate','YELP')
evaluate_method_performance(standard_english_polarities,INQUIRER,' STANDARD ENGLISH ',se_eval)
gc.collect()

twitter_polarities,se_eval=calculate_new_lexicon_polarities(TWITTER_LEXICONS,POSITIVE_TWEET,NEGATIVE_TWEET,'graph_propagate','YELP')
evaluate_method_performance(twitter_polarities,TWITTER_LEXICONS,' TWITTER',se_eval)
gc.collect()

THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f94798e8b70>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 1.7563
ROC AUC Score: 0.7687
Average Precision Score: 0.5258
Ternary metrics:
Majority macro F1 baseline 0.4642
Macro F1 with cmn threshold: 0.1177
Kendall Tau 0.3284
Confusion matrix: 
[[   0    1 2239]
 [   0    0    0]
 [   0    0  345]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f94798e8b70>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 1.1300
ROC AUC Score: 0.5905
Average Precision Score: 0.5290
Ternary metrics:
Majority macro F1 baseline 0.2497
Macro F1 with cmn threshold: 0.1024
Kendall Tau 0.2193
Confusion matrix: 
[[   0    1 1874]
 [   0    0 5106]
 [   0    0 1547]]
Neg : 0.0
Neut : 0.0
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f94799261d0>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 0.8648
ROC AUC Score: 0.7130
Average Precision Score: 0.7803
Ternary metrics:
Majority macro F1 baseline 0.3844
Macro F1 with cmn threshold: 0.3844
Kendall Tau 0.3993
Confusion matrix: 
[[  0   1 313]
 [  0   0   0]
 [  0   0 522]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


0

### Technique=GraphPropagation, Embedding=Glove

In [11]:
finance_polarities,finance_eval=calculate_new_lexicon_polarities(FINANCE_LEXICONS,POSITIVE_FINANCE,NEGATIVE_FINANCE,'graph_propagate','GLOVE')
evaluate_method_performance(finance_polarities,FINANCE_LEXICONS,'FINANCE',finance_eval)
gc.collect()

standard_english_polarities,se_eval=calculate_new_lexicon_polarities(INQUIRER,POSITIVE_SE,NEGATIVE_SE,'graph_propagate','GLOVE')
evaluate_method_performance(standard_english_polarities,INQUIRER,' STANDARD ENGLISH ',se_eval)
gc.collect()

twitter_polarities,se_eval=calculate_new_lexicon_polarities(TWITTER_LEXICONS,POSITIVE_TWEET,NEGATIVE_TWEET,'graph_propagate','GLOVE')
evaluate_method_performance(twitter_polarities,TWITTER_LEXICONS,' TWITTER',se_eval)
gc.collect()

THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f94799777f0>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 1.7563
ROC AUC Score: 0.7687
Average Precision Score: 0.5258
Ternary metrics:
Majority macro F1 baseline 0.4642
Macro F1 with cmn threshold: 0.1177
Kendall Tau 0.3284
Confusion matrix: 
[[   0    1 2239]
 [   0    0    0]
 [   0    0  345]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f9479982e80>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 1.1300
ROC AUC Score: 0.5905
Average Precision Score: 0.5290
Ternary metrics:
Majority macro F1 baseline 0.2497
Macro F1 with cmn threshold: 0.1024
Kendall Tau 0.2193
Confusion matrix: 
[[   0    1 1874]
 [   0    0 5106]
 [   0    0 1547]]
Neg : 0.0
Neut : 0.0
Pos : 1.0


  'precision', 'predicted', average, warn_for)


THE INTERNAL RUN_METHOD IS RUNNING...
AFTER EMBEDDING TRANSFORM  <representations.embedding.Embedding object at 0x7f9500328dd8>
Getting positive scores..
Getting negative scores..
Computing final scores...
Binary metrics:
Accuracy with optimal threshold: 0.8648
ROC AUC Score: 0.7130
Average Precision Score: 0.7803
Ternary metrics:
Majority macro F1 baseline 0.3844
Macro F1 with cmn threshold: 0.3844
Kendall Tau 0.3993
Confusion matrix: 
[[  0   1 313]
 [  0   0   0]
 [  0   0 522]]
Neg : 0.0
Neut : nan
Pos : 1.0


  'precision', 'predicted', average, warn_for)


0

### Now senti140, wordnet and sst have huge vocabs.

We have ran the lexicon comparison for senti140_polarities and sentiwn_polarities below

In [None]:
#sent140_polarities,s140eval = calculate_new_lexicon_polarities(senti140_lexicons,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob')
#gc.collect()

#sentiwn_polarities,swn_eval = calculate_new_lexicon_polarities(sentiwn_lexicons,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob')
#gc.collect()

#sst_lexicons,sst_eval = calculate_new_lexicon_polarities(sst_lexicons,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob')
#gc.collect()


### Evaluation for unsupervised Lexicon sentiment tagging

This is to compute the effectiveness of binary sentiment scores provided a lexicon library.
This can be used to see which lexicon libraries help achieving the closest sentiment scores.
Thus a supervised algorithm and evaluation is the 

compare against the sentence tagging (already provided in the dataset )


In [9]:
### BORROWED FROM THE AR_SARKAR METRIC
def analyze_sentiment_sentiwordnet_lexicon(review,verbose=False):
    
    
    #review = normalize_accented_characters(review)
    #review = review.decode('utf-8')
    review = html_parser.unescape(review)
    review = strip_html(review)
    
    text_tokens = nltk.word_tokenize(review)
    tagged_text = nltk.pos_tag(text_tokens)
    pos_score = neg_score = token_count = obj_score = 0

    for word, tag in tagged_text:
        ss_set = None
        if 'NN' in tag and swn.senti_synsets(word, 'n'):
            ss_set = list(swn.senti_synsets(word, 'n'))
            if ss_set : 
                ss_set=ss_set[0]
        elif 'VB' in tag and swn.senti_synsets(word, 'v'):
            ss_set = list(swn.senti_synsets(word, 'v'))
            if ss_set : 
                ss_set=ss_set[0]
        elif 'JJ' in tag and swn.senti_synsets(word, 'a'):
            ss_set = list(swn.senti_synsets(word, 'a'))
            if ss_set : 
                ss_set=ss_set[0]
        elif 'RB' in tag and swn.senti_synsets(word, 'r'):
            ss_set = list(swn.senti_synsets(word, 'r'))
            if ss_set : 
                ss_set=ss_set[0]
        
        if ss_set:
            
            pos_score += ss_set.pos_score()
            neg_score += ss_set.neg_score()
            obj_score += ss_set.obj_score()
            token_count += 1
    
    
    final_score = pos_score - neg_score
    norm_final_score = round(float(final_score) / token_count, 2)
    final_sentiment = 'positive' if norm_final_score >= 0 else 'negative'
    if verbose:
        norm_obj_score = round(float(obj_score) / token_count, 2)
        norm_pos_score = round(float(pos_score) / token_count, 2)
        norm_neg_score = round(float(neg_score) / token_count, 2)
        
        sentiment_frame = pd.DataFrame([[final_sentiment, norm_obj_score,
                                         norm_pos_score, norm_neg_score,
                                         norm_final_score]],
                                         columns=pd.MultiIndex(levels=[['SENTIMENT STATS:'], 
                                                                      ['Predicted Sentiment', 'Objectivity',
                                                                       'Positive', 'Negative', 'Overall']], 
                                                              labels=[[0,0,0,0,0],[0,1,2,3,4]]))
        print (sentiment_frame)   
    return final_sentiment
            
                                                               
def evaluate_lexicons(TRUE_LABELS,PREDICTED_LABELS,POS_CLASS,NEG_CLASS) : 

    print ('Performance metrics:')
    display_evaluation_metrics(true_labels=TRUE_LABELS,
                               predicted_labels=PREDICTED_LABELS,
                               positive_class=str(POS_CLASS))  
    print ('\nConfusion Matrix:'             )              
    display_confusion_matrix(true_labels=TRUE_LABELS,
                             predicted_labels=PREDICTED_LABELS,
                             classes=[str(POS_CLASS),str(NEG_CLASS)])
    print ('\nClassification report:' )                        
    display_classification_report(true_labels=TRUE_LABELS,
                                  predicted_labels=PREDICTED_LABELS,
                                  classes=[str(POS_CLASS),str(NEG_CLASS)])
    return


### BASELINE LEXICON : Unsupervised Classification nn the Movie Reviews using senti-wordnet lexicons

-  Accuracy: 0.6
-  Precision: 0.56
-  Recall: 0.93
-  F1 Score: 0.7

In [6]:
train_x,test_x,test_y=prepare_movie_dataset(0,1000,1000,2000)
sentiwordnet_predictions = [analyze_sentiment_sentiwordnet_lexicon(review) for review in test_x]
evaluate_lexicons(test_y.tolist(),sentiwordnet_predictions,'positive','negative')

dataset size :  50000
Train_X :  1000
Test_X  :  1000
Performance metrics:
Accuracy: 0.6
Precision: 0.56
Recall: 0.93
F1 Score: 0.7

Confusion Matrix:
                 Predicted:         
                   positive negative
Actual: positive        470       34
        negative        365      131

Classification report:
              precision    recall  f1-score   support

    positive       0.56      0.93      0.70       504
    negative       0.79      0.26      0.40       496

   micro avg       0.60      0.60      0.60      1000
   macro avg       0.68      0.60      0.55      1000
weighted avg       0.68      0.60      0.55      1000



### Now we wanted to test the effect of new scores on the supervised lexicon-based sentiment scores of a corpus.  Again because our same lexicons have are very small < 10k, most of the words from corpus are not found in the lexicons and does not returns the proper sentence sentiment. 

 This part can be redone when we have large lexicon library. So, the impact can be seen as in the below code.

In [12]:
## INPUTS : 
## review = single sentence 
## lexicon_dict = dict of the lexicon with key as word and value as the polarity

def analyze_sentiment_domain(review,lexicon_dict,verbose=False):
    
    
    #review = normalize_accented_characters(review)
    #review = review.decode('utf-8')
    review = html_parser.unescape(review)
    review = strip_html(review)
    
    text_tokens = nltk.word_tokenize(review)
    tagged_text = nltk.pos_tag(text_tokens)
    pos_score = neg_score = token_count = obj_score = 0

    ## postitve polarity counts as positive and negative polarities counts as negative
    
    
    for token in text_tokens : 
        
        if token in lexicon_dict : 
            
            if lexicon_dict[token]>0 : 
                pos_score+=1
            elif lexicon_dict[token]<0:
                neg_score+=1

        token_count+=1
            
    final_score = pos_score - neg_score
    norm_final_score = round(float(final_score) / token_count, 2)
    final_sentiment = 'positive' if norm_final_score >= 0 else 'negative'
    if verbose:
        norm_pos_score = round(float(pos_score) / token_count, 2)
        norm_neg_score = round(float(neg_score) / token_count, 2)
        
        
        
        sentiment_frame = pd.DataFrame([[final_sentiment,
                                         norm_pos_score, norm_neg_score,
                                         norm_final_score]],
                                         columns=pd.MultiIndex(levels=[['SENTIMENT STATS:'], 
                                                                      ['Predicted Sentiment',
                                                                       'Positive', 'Negative', 'Overall']], 
                                                              labels=[[0,0,0,0],[0,1,2,3]]))
        print (sentiment_frame)   
    return final_sentiment
            
                               

### Now we wanted to see the supervised classification scores with initial lexicons and after label-propagation scored lexicons.However due to our system limitations, because the sentiwordnet 150 has ~68,000 tokens, the kernal crashes in attempt to run algorith on this lexicon 


In [14]:

## THE PROGRAM CRASHES.: 
## IF PROGRAM CAN RUN, THEN WE SEE changes in supervised sentiment scoring on the movie_reviews dataset


print("-- ORIGINAL SENTI140 LEXICONS----")
train_x,test_x,test_y=prepare_movie_dataset(0,1000,1000,2000)
sentiwordnet_predictions = [analyze_sentiment_domain(review,senti140_lexicons) for review in test_x]
evaluate_lexicons(test_y.tolist(),sentiwordnet_predictions,'positive','negative')

new_senti140_lexicons,se_eval=calculate_new_lexicon_polarities(senti140_lexicons,POSITIVE_SE,NEGATIVE_SE,'label_propagate_prob','YELP')

print("-- RE-CALCULATED SENTI140 LEXICONS----")
train_x,test_x,test_y=prepare_movie_dataset(0,1000,1000,2000)
sentiwordnet_predictions = [analyze_sentiment_domain(review,new_senti140_lexicons) for review in test_x]
evaluate_lexicons(test_y.tolist(),sentiwordnet_predictions,'positive','negative')



-- ORIGINAL SENTI140 LEXICONS----


NameError: name 'prepare_movie_dataset' is not defined

### The classification scored difference using two different lexicon dictionaries may prove the impact.

### THE WHOLE USE OF THE METHODS IS THAT TO GET INSIGHT INTO THE CONTEXT-SENSITIVE INFORMATION.


### HOW? 

### 1. ARRANGE THE PRE-TAGGED LEXICONS (ATLEAST POSITIVE/NEGATIVE)

### 2. WORD EMBEDDINGS TRAINED ON THE CONTEXT-MATERIAL. 

### 3. LABEL-PROPAGATE ALGORITHM TO MAKE LEXICONS SCORE LEXICONS TO THE CONTINOUS SENTIMENT SCORES.

### 4. USE SUM (P+V)/T OR NEURAL NETWORK TO OBTAIN THE SCORE FOR THE WHOLE SENTENCE SENTIMENT 

##### Here, we earlier thought that we would be able to implement this phase if we had more time. The paper "Context-Sensitive Lexicon Features for Neural Sentiment Analysis" we can test baseline with normal lexicons and improvement as label-propagated lexicons with LSTM for scores, and evaluate them back on the binary classification scores 