In [1]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
import os
import sys

sys.path.append(os.path.abspath(os.pardir))

import pandas as pd
import numpy as np

# Model
from tdparse.models.target import TargetInd
from tdparse.models.target import TargetDepC
from tdparse.models.target import TargetDep
from tdparse.models.target import TargetDepSent
# Word Vector methods
from tdparse.word_vectors import GensimVectors
from tdparse.word_vectors import PreTrained
from tdparse.helper import read_config, full_path
# Sentiment lexicons
from tdparse import lexicons
# Get the data
from tdparse.parsers import dong

# Target dependent models
This notebook shows how to use the target dependent models and comparing the results of our implementation to the one in the original [paper](https://www.ijcai.org/Proceedings/15/Papers/194.pdf)

The paper had four different models:
1. **Target-Ind** -- Uses only the full Tweet as context.
2. **Target-Dep-** -- Uses the left and right context of the target word as well as the target word as context.
3. **Target-Dep** -- Uses all of the above contexts.
4. **Target-Dep+** -- Uses all of the above as well as including two additional left and right contexts which ignores all words in the contexts unless they are part of the given sentiment lexicon (or any lexicon).

The above models correspond to the following classes in our implementation:
1. [TargetInd](../tdparse/models/target.py), 2. [TargetDepC](../tdparse/models/target.py), 3. [TargetDep](../tdparse/models/target.py), 4. [TargetDepSent](../tdparse/models/target.py)

All of the results shown below are 5 fold cross validation over the training data of [Dong et al.](https://aclanthology.coli.uni-saarland.de/papers/P14-2009/p14-2009) as reported in the paper.

In [2]:
# Load the training data
train_data = full_path(read_config('dong_twit_train_data'))
train_data = dong(train_data)
train_y = [target_dict['sentiment'] for target_dict in train_data]

# Get word vectors
w2v_path = full_path(read_config('word2vec_files')['vo_zhang'])
w2v = GensimVectors(w2v_path, None, model='word2vec', name='w2v')
sswe_path = full_path(read_config('sswe_files')['vo_zhang'])
sswe = PreTrained(sswe_path, name='sswe')

# Comparing the three base models

In the paper the base models (target-ind, target-dep- and target-dep) using the the word2vec word vectors were compared after they found the best C-values therefore we are going to use the C-Values stated in the paper to compare our results to theres.

In [3]:
# Instances of the models
target_ind = TargetInd()
target_depc = TargetDepC()
target_dep = TargetDep()
# Getting the grid parameters for each model
grid_params_ind = target_ind.get_cv_params(word_vectors=[[w2v]], random_state=42)
grid_params_depc = target_depc.get_cv_params(word_vectors=[[w2v]], random_state=42)
grid_params_dep = target_dep.get_cv_params(word_vectors=[[w2v]], random_state=42)
# Running the grid search over 5 folds.
results_ind = target_ind.grid_search(train_data, train_y, params=grid_params_ind, cv=5, n_jobs=1)
results_depc = target_depc.grid_search(train_data, train_y, params=grid_params_depc, cv=5, n_jobs=1)
results_dep = target_dep.grid_search(train_data, train_y, params=grid_params_dep, cv=5, n_jobs=1)

In [4]:
results = [results_ind['mean_test_score'], results_depc['mean_test_score'], results_dep['mean_test_score']]
all_results = {'Our results' : [result.round(4)[0] * 100 for result in results]}
all_results['Paper results'] = [59.22, 65.38, 65.72]
index = ['Target-Ind', 'Target-Dep-', 'Target-Dep']
base_model_df = pd.DataFrame(all_results, index=index)
base_model_df

Unnamed: 0,Our results,Paper results
Target-Ind,61.01,59.22
Target-Dep-,65.7,65.38
Target-Dep,66.87,65.72


As you can see from the results above that we get similar results and the order of the models stays the same.

# Target-Dep+ and sentiment lexicons
The **Target-Dep+** model uses sentiment lexicons to remove words therefore in this section we compare:
1. The statistics on the sentiment lexicons
2. The results of the model using different lexicons

All the experiments below again use the Word2Vec word embeddings.
## Sentiment lexicon statistics

Below we present the size of the sentiment lexicon once it has been processed and the size of that lexicon stated in the paper.

In [5]:
# Load the sentiment lexicons and remove all words that are not associated
# to the Positive or Negative class.
subset_cats = {'positive', 'negative'}
mpqa = lexicons.Mpqa(subset_cats=subset_cats)
nrc = lexicons.NRC(subset_cats=subset_cats)
hu_liu = lexicons.HuLiu(subset_cats=subset_cats)
# Combine sentiment lexicons - Removes words that contradict each other.
mpqa_huliu = lexicons.Lexicon.combine_lexicons(mpqa, hu_liu)
all_three = lexicons.Lexicon.combine_lexicons(mpqa_huliu, nrc)

# Load the sentiment lexicons but lower case all the words
mpqa_low = lexicons.Mpqa(subset_cats=subset_cats, lower=True)
nrc_low = lexicons.NRC(subset_cats=subset_cats, lower=True)
hu_liu_low = lexicons.HuLiu(subset_cats=subset_cats, lower=True)
mpqa_huliu_low = lexicons.Lexicon.combine_lexicons(mpqa_low, hu_liu_low)
all_three_low = lexicons.Lexicon.combine_lexicons(mpqa_huliu_low, nrc_low)

In [6]:
def filter_cat(lexicon, filter_cat):
    return [word for word, cat in lexicon.lexicon if cat == filter_cat]

all_lexicons = [mpqa, hu_liu, nrc, mpqa_huliu, all_three]
num_positive = [len(filter_cat(lexicon, 'positive')) for lexicon in all_lexicons]
num_negative = [len(filter_cat(lexicon, 'negative')) for lexicon in all_lexicons]

all_lexicons_low = [mpqa_low, hu_liu_low, nrc_low, mpqa_huliu_low, all_three_low]
num_positive_low = [len(filter_cat(lexicon, 'positive')) for lexicon in all_lexicons_low]
num_negative_low = [len(filter_cat(lexicon, 'negative')) for lexicon in all_lexicons_low]

columns = ['Paper No. Positive', 'Ours No. Positive', 'Ours low No. Positive', 
           'Paper No. Negative', 'Ours No. Negative', 'Ours low No. Negative']
index = ['MPQA', 'Hu Liu', 'NRC', 'MPQA & Hu Liu', 'All Three']
data = [[2289, 2003, 2231, 2706, 3940], num_positive, num_positive_low, 
        [4114, 4780, 3243, 5069, 6490], num_negative, num_negative_low]
senti_info = dict(list(zip(columns, data)))
pd.DataFrame(senti_info, columns=columns, index=index)

Unnamed: 0,Paper No. Positive,Ours No. Positive,Ours low No. Positive,Paper No. Negative,Ours No. Negative,Ours low No. Negative
MPQA,2289,2304,2304,4114,4154,4154
Hu Liu,2003,2006,2006,4780,4783,4783
NRC,2231,2312,2312,3243,3324,3324
MPQA & Hu Liu,2706,2725,2725,5069,5077,5073
All Three,3940,4043,4043,6490,6548,6544


In [7]:
# Words that are shared between the MPQA and Hu Liu sentiment lexicons
[word for word, cat in list(set(mpqa_huliu.lexicon).difference(set(mpqa_huliu_low.lexicon))) if cat == 'negative']

['anti-US', 'anti-Semites', 'anti-American', 'anti-Israeli']

As you can see we never agree on the number of words within the lexicons. We get the lexicons from the sources described in the paper. Intrestingly if we do not lower case the words in the lexicons we won't see the same similarities between the MPQA and Hu Liu sentiment lexicon as they both share the words above just the Hu Liu lexicon has the words lower cased already where as MPQA has not.

## Showing the affect of using different sentiment lexicons in the Target-Dep+ model

In [8]:
# Instances of the model
target_dep_plus = TargetDepSent()
# Getting the grid parameters for each model
grid_params_sent = target_dep_plus.get_cv_params(word_vectors=[[w2v]], senti_lexicons=all_lexicons_low,
                                                 random_state=42)
# Running the grid search over 5 folds.
results_sent = target_dep_plus.grid_search(train_data, train_y, params=grid_params_sent, cv=5, n_jobs=4)

In [9]:
all_sent_results = {'Paper results' : [65.72, 66.05, 67.24, 65.56, 67.40, 67.30],
                    'Our results' : np.zeros(6)}
index = ['Target-Dep', 'Target-Dep+: NRC', 'Target-Dep+: Hu Liu', 'Target-Dep+: MPQA',
         'Target-Dep+: MPQA + Hu Liu', 'Target-Dep+: All Three']
sent_results_df = pd.DataFrame(all_sent_results, index=index)
sent_results_df['Our results']['Target-Dep'] = base_model_df['Our results']['Target-Dep']

In [15]:
name_map = {'Mpqa' : 'Target-Dep+: MPQA', 'HuLiu' : 'Target-Dep+: Hu Liu', 'NRC' : 'Target-Dep+: NRC',
            'Mpqa HuLiu' : 'Target-Dep+: MPQA + Hu Liu', 'Mpqa HuLiu NRC' : 'Target-Dep+: All Three'}
results_sent['lexicon'] = results_sent['param_union__left_s__filter__lexicon'].apply(lambda lex: lex.name)
for lex_name, model_name in name_map.items():
    score = results_sent.loc[results_sent['lexicon'] == lex_name]['mean_test_score']
    score = score.round(4) * 100
    sent_results_df['Our results'][model_name] = score
sent_results_df['Our results']['Target-Dep'] = base_model_df['Our results']['Target-Dep']
sent_results_df

Unnamed: 0,Our results,Paper results
Target-Dep,66.87,65.72
Target-Dep+: NRC,66.9,66.05
Target-Dep+: Hu Liu,68.31,67.24
Target-Dep+: MPQA,67.05,65.56
Target-Dep+: MPQA + Hu Liu,68.13,67.4
Target-Dep+: All Three,67.93,67.3


From the results shown above we get different results but the results also have a different rank between the lexicons as in the best lexicon was **Hu and Liu** where as the papers original results show the combination of **MPQA and Hu & Liu** was the best. However in general we can see that it is better to use a sentiment lexicon than not. Also that both our implmentation and the original paper show that the best single sentiment lexicon is **Hu & Liu** and that using **all three** sentiment lexicons is worse than using **MPQA and Hu & Liu**.

# Showing the affect of the different word vectors
As presented in the paper they show the affect of using different word vectors accross the four models using the best sentiment lexicon for the sentiment dependent model. As we had different result for the sentiment lexicons compared to the original paper we will show the results of using Hu & Liu lexicon and using the combination of Hu & Liu and MPQA. The word vectors used are the following:
1. Word2Vec - Which has been used throughout the previous experiments (100 dimensions)
2. SSWE - Sentiment Specific Word Embeddings (50 dimensions)
3. Concatenation of Word2vec and SSWE (150 dimensions)

In [17]:
# Process the results
grid_params_ind = target_ind.get_cv_params(word_vectors=[[w2v], [sswe], [w2v, sswe]], random_state=42)
grid_params_depc = target_depc.get_cv_params(word_vectors=[[w2v], [sswe], [w2v, sswe]], random_state=42)
grid_params_dep = target_dep.get_cv_params(word_vectors=[[w2v], [sswe], [w2v, sswe]], random_state=42)
grid_params_dep_sent = target_dep_plus.get_cv_params(word_vectors=[[w2v], [sswe], [w2v, sswe]], 
                                                     senti_lexicons=[hu_liu_low, mpqa_huliu_low], random_state=42)

results_ind = target_ind.grid_search(train_data, train_y, params=grid_params_ind, cv=5, n_jobs=4)
results_depc = target_depc.grid_search(train_data, train_y, params=grid_params_depc, cv=5, n_jobs=4)
results_dep = target_dep.grid_search(train_data, train_y, params=grid_params_dep, cv=5, n_jobs=4)
results_dep_sent = target_dep_plus.grid_search(train_data, train_y, params=grid_params_dep_sent, cv=5, n_jobs=4)

In [88]:
# Wrangling the results
results_dep_sent['lexicon'] = results_dep_sent['param_union__left_s__filter__lexicon'].apply(lambda lex: lex.name)
results_dep_sent_hu = results_dep_sent[results_dep_sent['lexicon'] == 'HuLiu']
results_dep_sent_hu_mpqa = results_dep_sent[results_dep_sent['lexicon'] == 'Mpqa HuLiu']
grid_results = {'Target-Ind' : results_ind, 'Target-Dep-' : results_depc, 'Target-Dep' : results_dep, 
                'Target-Dep+: Hu Liu' : results_dep_sent_hu, 
                'Target-Dep+: MPQA + Hu Liu' : results_dep_sent_hu_mpqa}
index = ['word2vec', 'sswe', 'word2vec + sswe']
columns = list(grid_results.keys())
name_map = {'w2v' : 'word2vec', 'sswe' : 'sswe', 'w2vsswe' : 'word2vec + sswe'}
vector_results_df = pd.DataFrame(np.zeros((len(index), len(columns))), columns=columns, index=index)
for model_name, result in grid_results.items():
    vec_col = result.columns[result.columns.map(lambda x: 'vector' in x)==True][0]
    get_vec_name = lambda vec_list: ''.join(map(lambda vec: vec.name, vec_list))
    result['vector'] = result[vec_col].apply(get_vec_name)
    for vec_name, index_name in name_map.items():
        score = result.loc[result['vector'] == vec_name]['mean_test_score']
        score = score.round(4) * 100
        vector_results_df[model_name][index_name] = score
vector_results_df


Unnamed: 0,Target-Ind,Target-Dep-,Target-Dep,Target-Dep+: Hu Liu,Target-Dep+: MPQA + Hu Liu
word2vec,61.01,65.7,66.87,68.31,68.13
sswe,60.37,66.68,66.68,67.97,67.57
word2vec + sswe,63.59,67.01,67.73,69.22,68.34


As we can see from the results above using the combination of the two word vectors is best accross all models which is the finding in the original paper. Also that **Target-Dep+** > **Target-Dep** > **Target-Dep-** > **Target-Ind** which is also what the original paper found. However un-like the original paper we found that using the *SSWE* word vectors to be worse than using the *Word2Vec* vectors showing that using just semantic information is more important than using a vector model that was created by reducing the semantic and sentiment loss. Also we found that using **Hu & Liu** lexicon to be better than any other and any other combination of lexicons compared to the original paper which found using the combination of **MPQA and Hu & Liu** to be the best. Finally we can see that we got similar results to the original.