# Modeling: Aspect-Based Sentiment Analysis [Simplistic]
Conducting aspect-based sentiment analysis with [ABSA package by Scala Consultants](https://github.com/ScalaConsultants/Aspect-Based-Sentiment-Analysis)

**`Goal:`** 

Conduct ABSA using word similarity and out-of-the-box ABSA package. This notebook is meant to serve as a start for tweet aspect annotation by getting as much of the aspects indicated and their corresponding sentiments. 

**Note:** Results will be crosschecked during the annotation phase!

**`Process:`** 
1. List aspects (e.g. speed, price, reliability) determined from earlier data annotation phase
2. Get nouns, adjectives and adverbs from the tweets as these will likely be the parts of speech making meaningful reference to aspects
3. Check if each of the words from step 2 is very similar to any of the aspects (e.g. speed [aspect] and fast [word in tweet]) by computing similarity score
4. If similarity score is past a set thresholdhood, we assume the aspect was referenced in the tweet. Hence, note down that the aspect was referenced in that given tweet and also note down the word (herein called aspect-implying word) that implied the aspect
6. Conduct ABSA using the ABSA package with the tweet and with the aspect-implying word and note sentiment (positive, negative or neutral) towards the main aspect (price, speed, etc.)
7. If multiple words make reference to a single aspect, find the average of their sentiments and use to assign a single sentiment 

### 1. Library Importation

In [66]:

!pip install aspect_based_sentiment_analysis==2.0.2

Collecting aspect_based_sentiment_analysis==2.0.2
  Using cached aspect_based_sentiment_analysis-2.0.2-py3-none-any.whl (35 kB)
Collecting transformers==2.5
  Downloading transformers-2.5.0-py3-none-any.whl (481 kB)
[K     |████████████████████████████████| 481 kB 2.7 MB/s eta 0:00:01
[31mERROR: Could not find a version that satisfies the requirement tensorflow==2.2 (from aspect-based-sentiment-analysis) (from versions: 2.5.0rc0, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.5.0, 2.5.1, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.7.0rc0, 2.7.0rc1)[0m
[31mERROR: No matching distribution found for tensorflow==2.2[0m


In [67]:
!pip show transformers

Name: transformers
Version: 4.11.3
Summary: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
Home-page: https://github.com/huggingface/transformers
Author: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Suraj Patil, Stas Bekman, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors
Author-email: thomas@huggingface.co
License: Apache
Location: /Users/koredeakande/opt/anaconda3/envs/jupyterlab/lib/python3.9/site-packages
Requires: filelock, requests, huggingface-hub, sacremoses, tokenizers, packaging, regex, pyyaml, tqdm, numpy
Required-by: aspect-based-sentiment-analysis


In [46]:
import pandas as pd
import numpy as np
import aspect_based_sentiment_analysis as absa
import nltk
#nltk.download('wordnet')
#nltk.download('averaged_perceptron_tagger')
from nltk.corpus import wordnet as wn
from itertools import product
from nltk import pos_tag, RegexpParser

In [65]:
!pip show aspect_based_sentiment_analysis

Name: aspect-based-sentiment-analysis
Version: 2.0.0
Summary: Aspect Based Sentiment Analysis: Transformer & Interpretability (TensorFlow)
Home-page: https://github.com/ScalaConsultants/Aspect-Based-Sentiment-Analysis
Author: Rafal Rolczynski
Author-email: rafal.rolczynski@gmail.com
License: Apache-2.0
Location: /Users/koredeakande/opt/anaconda3/envs/jupyterlab/lib/python3.9/site-packages
Requires: google-cloud-storage, pytest, ipython, spacy, scikit-learn, testfixtures, optuna, transformers, tensorflow
Required-by: 


### 2. Loading the data

In [47]:
df = pd.read_csv('../data/interim/sample_with_sentiment_cleaned.csv')

In [63]:
!pip show transformers

Name: transformers
Version: 4.11.3
Summary: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
Home-page: https://github.com/huggingface/transformers
Author: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Suraj Patil, Stas Bekman, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors
Author-email: thomas@huggingface.co
License: Apache
Location: /Users/koredeakande/opt/anaconda3/envs/jupyterlab/lib/python3.9/site-packages
Requires: filelock, regex, tqdm, sacremoses, packaging, huggingface-hub, pyyaml, tokenizers, numpy, requests
Required-by: aspect-based-sentiment-analysis


In [62]:
name = 'absa/classifier-lapt-0.2'
model = absa.BertABSClassifier.from_pretrained(name)
tokenizer = BertTokenizer.from_pretrained(name)
professor = absa.Professor()     # Explained in detail later on.
text_splitter = absa.sentencizer()  # The English CNN model from SpaCy.
nlp = absa.Pipeline(model, tokenizer, professor, text_splitter)

Downloading:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

ValueError: The first argument to `Layer.call` must always be passed.

In [49]:
?absa.load

[0;31mSignature:[0m
[0mabsa[0m[0;34m.[0m[0mload[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'absa/classifier-rest-0.2'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtext_splitter[0m[0;34m:[0m [0mCallable[0m[0;34m[[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m,[0m [0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mreference_recognizer[0m[0;34m:[0m [0maspect_based_sentiment_analysis[0m[0;34m.[0m[0maux_models[0m[0;34m.[0m[0mReferenceRecognizer[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpattern_recognizer[0m[0;34m:[0m [0maspect_based_sentiment_analysis[0m[0;34m.[0m[0maux_models[0m[0;34m.[0m[0mPatternRecognizer[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mmodel_kwargs[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0maspect_base

In [60]:
#Load the basic configuration of the ABSA package
nlp = absa.load('bert-base-multilingual-cased')

Downloading:   0%|          | 0.00/1.01G [00:00<?, ?B/s]

ValueError: The first argument to `Layer.call` must always be passed.

In [4]:
#List aspects determined during the annotation phase
#Note: This might not be exhaustive! But it should cover most cases. It is also subjective!
#Also using synonyms of these words will likely yield different results
aspects = ['price','speed','reliability','coverage', 'customer service']

In [5]:
#Set to store all seen words
seen_words = set()

#Set to store all aspect implying words found to avoid recomputing similarity words
aspect_implying_words_glob = set()

aspects_with_implying_words = {'price':set(),'speed':set(),'reliability':set(),
                               'coverage':set(), 'customer service':set()}

#Similarity threshold
sim_thresh = 0.4

#Iterate through all the tweets
for tweet in df.Text:
        
    #Split the tweet into words
    text = tweet.split()

    #Tag words with part of speech
    tokens_tag = pos_tag(text)

    #Iterate through all the tagged words
    for token in tokens_tag:
        
        #Check if the tagged word is a noun, adjective or adverb
        regex_match = re.match('NN.?|JJ.?|RB.?',token[1])

        #If it is one of the mentioned parts of speech
        if regex_match:
            
            #Get the word
            word_in_focus = token[0]
        
            #If the word has not been before
            if word_in_focus not in seen_words:
            
                #Iterate through all the aspects and compute similarity/relatedness
                for aspect in aspects:

                    #Look up the words on wordnet – this gets multiple versions of the word
                    sem1, sem2 = wn.synsets(aspect), wn.synsets(word_in_focus)

                    #Iterate through different permutations of the versions of the words
                    #and get the max similarity score seen
                    maxscore = 0

                    for i,j in list(product(*[sem1,sem2])):
                      score = i.wup_similarity(j) # Wu-Palmer Similarity
                      maxscore = score if maxscore < score else maxscore

                    #If the max similarity score seen is greater than the threshold
                    if maxscore > sim_thresh:

                        #Add the word to the set of all aspect-implying words seen
                        aspect_implying_words_glob.add(word_in_focus)

                        #Add the word to the dictionary of the relevant aspect word
                        aspects_with_implying_words[aspect].add(word_in_focus)
                        
                seen_words.add(word_in_focus)
                
            else:
                
                    
                
                
                
                
                
                
        
text ="learn php from guru99 and make study easy".split()
print("After Split:",text)
tokens_tag = pos_tag(text)
print("After Token:",tokens_tag)
patterns= """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}"""
chunker = RegexpParser(patterns)
print("After Regex:",chunker)
output = chunker.parse(tokens_tag)
print("After Chunking",output)
        
        
    
    

In [44]:
'k' in set()

False

In [13]:
import re

In [None]:
for token in tokens_tag:
    
    if token[1] in 

In [43]:
regex_match = re.match('NN.?|JJ.?|RB.?','LPP')
            
if regex_match:
    
    print(regex_match[0])

In [40]:
k = re.match('NN.?|JJ.?|RB.?','L')

In [41]:
k

In [36]:
?re.match

[0;31mSignature:[0m [0mre[0m[0;34m.[0m[0mmatch[0m[0;34m([0m[0mpattern[0m[0;34m,[0m [0mstring[0m[0;34m,[0m [0mflags[0m[0;34m=[0m[0;36m0[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to apply the pattern at the start of the string, returning
a Match object, or None if no match was found.
[0;31mFile:[0m      ~/opt/anaconda3/envs/jupyterlab/lib/python3.9/re.py
[0;31mType:[0m      function


In [12]:
from nltk import pos_tag
from nltk import RegexpParser
text ="learn php from guru99 and make study easy".split()
print("After Split:",text)
tokens_tag = pos_tag(text)
print("After Token:",tokens_tag)
patterns= """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}"""
chunker = RegexpParser(patterns)
print("After Regex:",chunker)
output = chunker.parse(tokens_tag)
print("After Chunking",output)

After Split: ['learn', 'php', 'from', 'guru99', 'and', 'make', 'study', 'easy']
After Token: [('learn', 'JJ'), ('php', 'NN'), ('from', 'IN'), ('guru99', 'NN'), ('and', 'CC'), ('make', 'VB'), ('study', 'NN'), ('easy', 'JJ')]
After Regex: chunk.RegexpParser with 1 stages:
RegexpChunkParser with 1 rules:
       <ChunkRule: '<NN.?>*<VBD.?>*<JJ.?>*<CC>?'>
After Chunking (S
  (mychunk learn/JJ)
  (mychunk php/NN)
  from/IN
  (mychunk guru99/NN and/CC)
  make/VB
  (mychunk study/NN easy/JJ))


In [None]:
{((<NN|CD.?|RB>)

In [6]:
for tweet in df.Text[:3]:
    print(tweet,'\n')

my family used my spectranet and they don't want to help my ministry now it has finished. spectranet_ng abeg how will i change my password. 

spectranet_ng how can i get the freedom mifi in ajah today 

drolufunmilayo iconic_remi spectranet_ng 

