# Rule-based aspect-based sentiment analysis for English 😥😀

This Notebook shows you how to perform ABSA on an English corpus through a rule- and lexicon-based approach.

We do this through the following steps:



1.   Extract nouns as aspects using [spaCy's POS-tagger](https://spacy.io/) after cleaning steps such as lemmatizing and removing stopwords.
2.   Extract adjectives, coordinating conjunctions and subordinating conjunctions as opinion words using spaCy's POS-tagger.
3.   Apply [SenticNet](https://sentic.net/) on the opinion words to find a sentiment score for the opinion words, and [NLTK's SynSet](https://www.nltk.org/howto/wordnet.html) to transform negated opinion words into their antonym, and calculate the opninion score based on this antonym. We know the texts we are working with include both Italian words and English words, so we first search for the opinion word in the English SenticNet, and if it is not found we look for it in [BabelSenticNet](https://sentic.net/babelsenticnet.pdf) (multilingual version which includes Italian).

The Notebook also shows you how to create IOB-labels for the results, and evaluate the extracted spans quantitatively using [Nervaluate](https://pypi.org/project/nervaluate/).

❗🧠 This notebook does **not** show you how to add categories to the aspects. It simply defines aspects through their syntactical function.



# Install and import packages

In [3]:
!pip install senticnet #for sentiment analysis (takes concept)
!pip install urllib3==1.26.15 requests-toolbelt==0.10.1
!pip install nervaluate
!pip install inceptalytics

Collecting senticnet
  Downloading senticnet-1.6-py3-none-any.whl (51.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.9/51.9 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: senticnet
Successfully installed senticnet-1.6
Collecting urllib3==1.26.15
  Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.9/140.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting requests-toolbelt==0.10.1
  Downloading requests_toolbelt-0.10.1-py2.py3-none-any.whl (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: urllib3, requests-toolbelt
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.7
    Uninstalling urllib3-2.0.7:
      Successfully uninstalled urllib3-2.0.7
Successfully installed requests-toolbelt-0.10.1 urllib3-1.26.15
C

In [4]:
import pandas as pd
import spacy

nlp = spacy.load("en_core_web_sm")
nlp.Defaults.stop_words |= {"chapter","title", "author", "date"} #add corpus-specific stopwords

from senticnet.senticnet import SenticNet
sn = SenticNet() #English
from senticnet.babelsenticnet import BabelSenticNet #= multilingual
bsn = BabelSenticNet('it') #Italian

import statistics
import numpy as np

from nervaluate import Evaluator
from inceptalytics import Project

from spacy.matcher import Matcher
from spacy.tokenizer import Tokenizer
import nltk
from nltk.corpus import wordnet
nltk.download('wordnet')
import string
from sklearn.metrics import classification_report

import re
import ast
import glob
import os

[nltk_data] Downloading package wordnet to /root/nltk_data...


# LOAD DATA 📚
* Load data



In [40]:
# Load in our example texts
!git clone https://github.com/TessDejaeghere/example_data_CLS.git

Cloning into 'example_data_CLS'...
remote: Enumerating objects: 61, done.[K
remote: Counting objects: 100% (61/61), done.[K
remote: Compressing objects: 100% (61/61), done.[K
remote: Total 61 (delta 5), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (61/61), 7.49 MiB | 5.62 MiB/s, done.
Resolving deltas: 100% (5/5), done.


In [41]:
path = "/content/example_data_CLS/"

In [46]:
all_travelogues = []

for filename in glob.glob(f"{path}*/*.txt"):

  name_file = os.path.basename(filename) #find filename
  folder_name = os.path.dirname(filename).split("/")[-1] #find folder name (in our case: the language)

  with open(filename, "r") as travelogue:

    text = travelogue.read()
    travelogue_data = {"file": name_file, "text": text, "language": folder_name}
    all_travelogues.append(travelogue_data)

travel_df = pd.DataFrame(all_travelogues)

In [49]:
df = travel_df[travel_df["language"] == "English"]

In [50]:
df.head()

Unnamed: 0,file,text,language
20,A_Wanderer_in_Venice.txt,Title: A Wanderer in Venice\nAuthor: E. V. Luc...,English
21,A_Wanderer_in_Florence.txt,Title: A Wanderer in Florence\nAuthor: E. V. L...,English
22,Italian_Highways_and_Byways_from_a_Motor_Car.txt,Title: Italian Highways and Byways from a Moto...,English
23,Florence_and_Northern_Tuscany_with_Genoa.txt,Title: Florence and Northern Tuscany with Geno...,English
24,Cathedral_Cities_of_Italy.txt,Title: Cathedral Cities of Italy\nAuthor: Will...,English


In [51]:
len(df)

10

In [52]:
test_text = df.iloc[0]['text']

# EXTRACT ASPECTS 📜
* Extract Nouns, Proper nouns, compound nouns (aspects)
* Extract adjectives, CCONJ, SCONJ (opinion words)
* Extract polarity label based on avg. of opinion words + aspects (multilingual senticnet)

In [4]:
## INPUT: text
## OUTPUT: dictionary with modifiers, chunks, noun_polarity (Sentic), modifier polarity

def output_dictionary(txt):
  #initialize dictionary
  noun_adj_pairs_en = {}
  #create spacy doc element
  doc = nlp(txt)

  for chunk in doc.noun_chunks:
    adj = []
    mod_polarity = []
    comps = []
    noun = ""

    for tok in chunk:

      # run check to see if word = compound. Otherwise the code will limit the words to just one.
      if tok.pos_ in ["NOUN", "PROPN"] and tok.lemma_ not in nlp.Defaults.stop_words and len(tok.lemma_) > 3:  #nouns and proper nouns

        if tok.dep_ == "compound":

          noun = doc[tok.i: tok.head.i + 1].lemma_ #take the compound noun
          noun = noun.rstrip() #remove newline
          comps.append(tok.head.i) #add piece of the compound noun which you added to the index list
          noun_polarity = return_polarity_scores(noun)

        else:
          if tok.i not in comps: #if the word is not already added to the index list (because it is part of a compound noun)
            noun = tok.lemma_
            noun = noun.rstrip()
            noun_polarity = return_polarity_scores(noun)


      if tok.pos_ == "ADJ" or tok.pos_ == "CCONJ" or tok.pos_ == "SCONJ": #adjectives, coordinating conjunction, subordinating conjunction
        if tok.text != "and": #don't add "and" (coordinating conjunction word)
          adj.append(tok.text)
          modifier_polarity = return_polarity_scores(tok.text)
          mod_polarity.append(modifier_polarity) #add modifier polarity score to list

    if noun:
        noun_adj_pairs_en.update({noun: {"modifiers": adj, "chunk": chunk, "noun_polarity": noun_polarity, "modifier_polarity": mod_polarity}}) #add all modifiers and chunks, noun polarity and modifier polarity


  return noun_adj_pairs_en

In [19]:
def return_polarity_scores(word):
  try:
    polarity_value = bsn.polarity_value(word) #try to find the word in the multilingual senticnet
  except KeyError:
    try:
      polarity_value = sn.polarity_value(word) #try to find the word in the English senticnet
    except KeyError:
      polarity_value = 0 #if not found, return 0 (neutral)
  return float(polarity_value)

In [20]:
#normalize scores with sigmoid in 0-1 range
def sig(x):
 return 1/(1 + np.exp(-x))

In [21]:
def polarity_label(score): #add polarity label based on Senticnet score

  if score <= 0.20:
    return 1
  elif score > 0.20 and score <= 0.40:
    return 2
  elif score > 0.40 and score <= 0.60:
    return 3
  elif score > 0.60 and score <= 0.80:
    return 4
  elif score > 0.80 and score <= 1:
    return 5

In [22]:
def add_mean_polarity_score(noun_adj_pairs):
  for k, v in noun_adj_pairs.items():
    if v['modifier_polarity']:
      #take the mean polarity scores of the modifiers + push in a 0-1 range w/ sigmoid
      #because the range of Sentic is -1 : 1
      mean_pol = sig(statistics.mean([float(x) for x in v["modifier_polarity"]]))
      label = polarity_label(mean_pol)

      noun_adj_pairs_en[k]["mean_polarity"] = mean_pol
      noun_adj_pairs_en[k]["polarity_label"] = label #add a polarity label according to the gold standard annotations

  return noun_adj_pairs

In [14]:
noun_adj_pairs_en = output_dictionary(test_text)
result = add_mean_polarity_score(noun_adj_pairs_en)

In [16]:
#Let's check the results for the word "town" in our corpus
result['town']

{'modifiers': ['small', 'provincial'],
 'chunk': a small provincial town,
 'noun_polarity': 0.0,
 'modifier_polarity': [0.0, -0.89],
 'mean_polarity': 0.3905502163716748,
 'polarity_label': 2}

## ASPECTS: extract aspects & transform to IOB

This section applies the following steps:

*   We load in a partition of our gold standard dataset ("gs_aspects.csv").
*   Extract noun chunks which contain a Noun and a Proper Noun.
*   Transform to IOB-labels (B-aspect, I-aspect, O) and add the results to a new column.
*   Evaluate the results on the gold standard dataset.



In [56]:
gold = pd.read_csv("example_data_CLS/gs_aspects.csv")

In [57]:
test_sentence = gold.iloc[0]["sentence"]

In [53]:
#add rule to tokenizer to tokenize text based on whitespace to match the gold standard tokens
nlp.tokenizer = Tokenizer(nlp.vocab, token_match=re.compile(r'\S+').match)

In [54]:
def get_aspect_labels(txt):

  doc = nlp(txt)
  chunks = [chunk for chunk in doc.noun_chunks for tok in chunk if tok.pos_ in ["NOUN", "PROPN"] and tok.lemma_ not in nlp.Defaults.stop_words] #return chunk if it contains a noun/proper noun and is not made up of stop words

  tokens = ["O" for tok in doc]

  for chunk in chunks:
    len_chunk = len([tok for tok in chunk])

    if len_chunk > 1: #if the chunk has more than 1 token
      indices = [tok.i for tok in chunk] #indices of the chunk tokens
      tokens[indices[0]] = "B-aspect" #the first element of the chunk = B-aspect

      for ind in indices[1::]:
        tokens[ind] = "I-aspect" #the other elements of the chunk = I-aspect

    else: #if the chunk just has one element, = B-aspect
      indices = [tok.i for tok in chunk]
      tokens[indices[0]] = "B-aspect"

  return tokens


In [22]:
def get_tokens(txt):
  return [tok for tok in doc]

In [23]:
def get_chunks(txt):
  doc = nlp(txt)
  chunks = [chunk for chunk in doc.noun_chunks for tok in chunk if tok.pos_ in ["NOUN", "PROPN"] and tok.lemma_ not in nlp.Defaults.stop_words] #return chunk if it contains a noun/proper noun and is not made up of stop words

  return chunks

In [58]:
gold["chunks"] = gold["sentence"].apply(get_chunks)

In [59]:
gold

Unnamed: 0,words,sentence,labels_y,chunks
0,"[""''"", 'CHAPTER', 'III', 'THE', 'CUMÆAN', 'SIB...",'' CHAPTER III THE CUMÆAN SIBYL A part of the ...,"['O', 'O', 'O', 'O', 'B-aspect', 'B-aspect', '...","[(CHAPTER, III, THE, CUMÆAN), (CHAPTER, III, T..."
1,"[""''"", 'This', ',', 'and', '``', 'May', 'God',...","'' This , and `` May God forgive the sins of y...","['O', 'O', 'O', 'O', 'O', 'O', 'B-aspect', 'O'...","[(God), (the, sins), (your, father), (mother)]"
2,"[""''"", 'are', 'surely', 'the', 'utterances', '...",'' are surely the utterances of a tired race .,"['O', 'O', 'O', 'O', 'O', 'O', 'B-aspect', 'I-...","[(the, utterances), (a, tired, race)]"
3,"['A', 'Belgian', ',', 'M.']","A Belgian , M.","['O', 'O', 'O', 'B-aspect']","[(A, Belgian, ,, M.)]"
4,"['A', '``', 'third', 'edition', ',', 'revised'...","A `` third edition , revised and enlarged , ''...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ...","[(A, ``, third, edition), (16mo)]"
...,...,...,...,...
2174,"['my', 'journey', 'from', 'masuah', 'to', 'gon...",my journey from masuah to gondar transactions ...,"['O', 'O', 'O', 'B-aspect', 'O', 'B-aspect', '...","[(my, journey), (masuah), (transactions), (man..."
2175,"['preserved', ']', 'the', 'spirit', 'of', 'the...",preserved ] the spirit of the Indians and kept...,"['O', 'O', 'O', 'O', 'O', 'O', 'B-aspect', 'O'...","[(the, spirit), (the, Indians), (their, minds)..."
2176,"['the', 'Zibib', '!']",the Zibib !,"['O', 'B-aspect', 'O']","[(the, Zibib)]"
2177,"['then', ',', 'in', 'his', 'own', 'language', ...","then , in his own language of Tigre , he afked...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-aspect'...","[(his, own, language), (Tigre), (the, foldiers..."


In [60]:
gold["predicted_label"] = gold["sentence"].apply(get_aspect_labels)

In [68]:
gold["labels_y"] = gold.labels_y.apply(lambda x: ast.literal_eval(x))

In [86]:
gold.head()

Unnamed: 0,words,sentence,labels_y,chunks,predicted_label
0,"[""''"", 'CHAPTER', 'III', 'THE', 'CUMÆAN', 'SIB...",'' CHAPTER III THE CUMÆAN SIBYL A part of the ...,"[O, O, O, O, B-aspect, B-aspect, O, O, O, B-as...","[(CHAPTER, III, THE, CUMÆAN), (CHAPTER, III, T...","[O, B-aspect, I-aspect, I-aspect, I-aspect, O,..."
1,"[""''"", 'This', ',', 'and', '``', 'May', 'God',...","'' This , and `` May God forgive the sins of y...","[O, O, O, O, O, O, B-aspect, O, O, O, O, O, O,...","[(God), (the, sins), (your, father), (mother)]","[O, O, O, O, O, O, B-aspect, O, B-aspect, I-as..."
2,"[""''"", 'are', 'surely', 'the', 'utterances', '...",'' are surely the utterances of a tired race .,"[O, O, O, O, O, O, B-aspect, I-aspect, I-aspec...","[(the, utterances), (a, tired, race)]","[O, O, O, B-aspect, I-aspect, O, B-aspect, I-a..."
3,"['A', 'Belgian', ',', 'M.']","A Belgian , M.","[O, O, O, B-aspect]","[(A, Belgian, ,, M.)]","[B-aspect, I-aspect, I-aspect, I-aspect]"
4,"['A', '``', 'third', 'edition', ',', 'revised'...","A `` third edition , revised and enlarged , ''...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[(A, ``, third, edition), (16mo)]","[B-aspect, I-aspect, I-aspect, I-aspect, O, O,..."


## Evaluate aspect extraction with gold standard

In [87]:
true = gold["labels_y"].to_list()
predicted = gold["predicted_label"].to_list()

In [88]:
evaluator = Evaluator(true, predicted, tags=['aspect'], loader="list")

results, results_by_tag = evaluator.evaluate()

In [89]:
results

{'ent_type': {'correct': 6400,
  'incorrect': 0,
  'partial': 0,
  'missed': 1213,
  'spurious': 8372,
  'possible': 7613,
  'actual': 14772,
  'precision': 0.4332520985648524,
  'recall': 0.8406672796532247,
  'f1': 0.57181148090239},
 'partial': {'correct': 2246,
  'incorrect': 0,
  'partial': 4154,
  'missed': 1213,
  'spurious': 8372,
  'possible': 7613,
  'actual': 14772,
  'precision': 0.29264825345247764,
  'recall': 0.5678444765532642,
  'f1': 0.38624078624078617},
 'strict': {'correct': 2246,
  'incorrect': 4154,
  'partial': 0,
  'missed': 1213,
  'spurious': 8372,
  'possible': 7613,
  'actual': 14772,
  'precision': 0.1520444083401029,
  'recall': 0.29502167345330355,
  'f1': 0.2006700915791825},
 'exact': {'correct': 2246,
  'incorrect': 4154,
  'partial': 0,
  'missed': 1213,
  'spurious': 8372,
  'possible': 7613,
  'actual': 14772,
  'precision': 0.1520444083401029,
  'recall': 0.29502167345330355,
  'f1': 0.2006700915791825}}

# EXTRACT OPINION WORDS 💬



# Extract opinion words & evaluate opinion extraction on gold standard data 🔎

To perform the opinon word extraction, we take the following steps:

1.   Extract opinion words from noun phrase. Opinion words are defined as adjectives, coordinating conjunctions and subordinating conjunctions.
2.   Add patterns: NOUN - AUX - ADJ ("the house is beautiful")
3.   Take negations into account: if the adjective is negated, we retrieve the antonym through [WordNet's Synsets](https://www.nltk.org/howto/wordnet.html).
4.   Output a column with IOB-labels (B-opinion, I-opinion, O).

In [14]:
gold_opinion = pd.read_csv("example_data_CLS/annos_en_sentwords.csv")

In [5]:
matcher = Matcher(nlp.vocab)

In [6]:
pattern = [ [[{"POS": "AUX"}, {"DEP": "neg", "OP": "*"}, {"POS": "ADV", "OP": "*"}, {"POS": "ADJ"}]] ] #is + (not) + (very) + nice
pattern_neg_conj =  [ [[{"POS": "AUX", "OP": "{0}"}, {"POS": "ADV", "OP": "{0}"}, {"POS": "ADJ", "OP": "{0}"}, {"POS": "CCONJ"},  {"DEP": "neg"}, {"POS": "ADJ"}]]  ] # [is + (very) + nice]DONOTMATCH + but + not + warm

matcher.add("aux_adv_adj", pattern[0])
matcher.add("negations_cconj", pattern_neg_conj[0])

In [7]:
def match_auxiliary_phrases(doc):
  spans = []

  matches = matcher(doc)

  for match_id, start, end in matches:
    span = [x for x in range(start,end)]
    spans.append(span[1::]) #auxiliary verb 'is' or "cconj" "but"/... = not important

  return spans

In [8]:
# Fetch adjectives in noun chunks

def opinion_extractor(txt):
  doc = nlp(txt)
  tokens = ["O" for tok in doc] #initialize token list, length of list = all "O"s


  ### AUXILIARY PHRASES ###

  #fetch auxiliary constructions (the house *is very nice*)
  auxiliary_spans = match_auxiliary_phrases(doc) #get indices of spans auxiliary sentences
  for span in auxiliary_spans:
    tokens[span[0]] = "B-opinion"
    for span_ind in span[1::]:
      tokens[span_ind] = "I-opinion"


  ### NOUN CHUNKS ###

  #return chunk if it contains a noun/proper noun and is not made up of stop words
  all_modifier_indices = []
  for chunk in doc.noun_chunks:
    #print(chunk)

    for tok in chunk:
      if tok.dep_ == "compound":
        continue
      elif tok.pos_ in ["ADJ", "CCONJ", "SCONJ"] and tok.lemma_ not in nlp.Defaults.stop_words:
        modifier_indices = []
        modifier_index = tok.i
        modifier_indices.append(modifier_index) #save index of adjectives looped over



  # Fetch intensifiers by navigating children (adapted from https://towardsdatascience.com/aspect-based-sentiment-analysis-using-spacy-textblob-4c8de3e0d2b9)
        for child in tok.children:
          if child.pos_ == "ADV":
            intensifier_index = child.i
            modifier_indices.append(intensifier_index)
          elif child.dep_ == "neg": #fetch negations to account for negations
            intensifier_index = child.i
            modifier_indices.append(intensifier_index)

        all_modifier_indices.append(sorted(modifier_indices))


  for mod_pair in all_modifier_indices:
    if len(mod_pair) > 1:
      tokens[mod_pair[0]] = "B-opinion"
      for opinion_id in mod_pair[1::]:
        tokens[opinion_id] = "I-opinion"

    else:
      tokens[mod_pair[0]] = "B-opinion"

  return tokens

In [11]:
gold_opinion

Unnamed: 0,source_file,text,_sentence_text,annotation
0,GB-117_sample_English_18.txt,moderate,"In the P.M. had a moderate breeze at East , wh...",3
1,GB-117_sample_English_18.txt,fair,At Midnight the wind came to South-South-West ...,4
2,GB-117_sample_English_18.txt,fresh,Cloudy weather ; Winds at South-West and South...,4
3,GB-117_sample_English_18.txt,with which we made our Course good,Cloudy weather ; Winds at South-West and South...,4
4,GB-117_sample_English_18.txt,steady brisk,Had a steady brisk Gale at South-South-West wi...,4
...,...,...,...,...
1431,GB-78_sample_English_19.txt,with much difficulty prevailed,Soon after with intention to reduce the vast c...,2
1432,GB-78_sample_English_19.txt,perpetually thronged,The house which conducted the Business at Niag...,2
1433,GB-78_sample_English_19.txt,attended with vast trouble,The number of prisoners thrown upon Colonel Jo...,2
1434,GB-78_sample_English_19.txt,accused of fraud,The merchants have since been accused of fraud...,2


In [12]:
gold_opinion["opnion_labels"] = gold_opinion["_sentence_text"].apply(opinion_extractor)

In [13]:
gold_opinion.head()

Unnamed: 0,source_file,text,_sentence_text,annotation,opnion_labels
0,GB-117_sample_English_18.txt,moderate,"In the P.M. had a moderate breeze at East , wh...",3,"[O, O, O, O, O, B-opinion, O, O, O, O, O, O, O..."
1,GB-117_sample_English_18.txt,fair,At Midnight the wind came to South-South-West ...,4,"[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
2,GB-117_sample_English_18.txt,fresh,Cloudy weather ; Winds at South-West and South...,4,"[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
3,GB-117_sample_English_18.txt,with which we made our Course good,Cloudy weather ; Winds at South-West and South...,4,"[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ..."
4,GB-117_sample_English_18.txt,steady brisk,Had a steady brisk Gale at South-South-West wi...,4,"[O, O, B-opinion, B-opinion, O, O, O, O, O, O,..."


# EXTRACT SENTIMENT SCORE THROUGH SENTICNET 😍
1. Apply Senticnet approach to opinion words of gold standard data
2. If the sentiment word is negated, fetch the antonym through SynSets and apply SenticNet on the antonym.
3. Evaluate on our gold standard data.

In [15]:
#when we find a negation, we want to apply the scoring to the antonym of the word in question
def fetch_antonym(word):
  antonyms = []

  try:
    for syn in wordnet.synsets(word):
        for l in syn.lemmas():
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())
    return antonyms[0]
  except:
    return False
   #return first element in antonyms list

In [16]:
inverses = {"1":5, "2":4, "3":3, "4": 2, "5": 1}

In [34]:
# Apply sentic scorer to all of the words in the gold standard data
# if negation: turn word into antonym using wordnet OR swap the scores
# return mean

def sentiment_scorer(text):
  doc = nlp(text)

  text = [tok.lemma_ for tok in doc if not tok.is_punct]


  ## Convert negations to antonyms ###

  negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
  negation_head_tokens = [(token.head, token.head.i) for token in negation_tokens] #get dependency head (= negated word)


  inverse_scores = []
  for w in negation_head_tokens:
    word = str(w[0])
    ind = w[1] #index of token

    antonym = fetch_antonym(word)

    if not antonym: #if an antonym cannot be found, find the opposite label

      word_label = polarity_label(sig(return_polarity_scores(word)))
      print(word_label)
      inverse_score = [*map(inverses.get, str(word_label))][0]
      inverse_scores.append(inverse_score)
      text[ind] = "o"


      continue

    else:
      print(text)
      ind =- 1
      text[ind] = antonym

  if len(negation_tokens) > 0:
    text_negless = [str(tok) for tok in text if tok not in [str(t) for t in negation_tokens]] #remove negation tokens from string if they're present
  else:
    text_negless = [str(tok) for tok in text] #if there are no negations present, just return every token from the text


  ### Collect polarity scores for all words in list ###

  pol_scores = [return_polarity_scores(str(word)) for word in text_negless]


  #append inverse score for when antonyms aren't found
  if len(inverse_scores) > 0:
    for score in inverse_scores:
      pol_scores.append(inverse_score)


  # Return the mean score of the collected scores across all the words in the snippet
  score = round(sig(statistics.mean(pol_scores)), 2)
  label = polarity_label(score) #rond score af tot 2 decimalen na de komma

  print(score, label)
  return label

In [97]:
sentiment_scorer("pride")

['pride']
['pride']
0.66 4


4

In [35]:
gold_opinion["sentiment_predictions"] = gold_opinion["text"].apply(sentiment_scorer)

[1;30;43mStreaminguitvoer ingekort tot de laatste 5000 regels.[0m
['fine', 'feathery', 'crystal']
[]
['fine', 'feathery', 'crystal']
0.45 3
['with', 'no', 'surface', 'crust']
[]
['with', 'no', 'surface', 'crust']
0.51 3
['strong', 'enough', 'to', 'carry', 'the', 'bodily', 'weight']
[]
['strong', 'enough', 'to', 'carry', 'the', 'bodily', 'weight']
0.51 3
['in', 'coarse', 'crystal']
[]
['in', 'coarse', 'crystal']
0.5 3
['old']
[]
['old']
0.31 2
['recent']
[]
['recent']
0.71 4
['continuous']
[]
['continuous']
0.44 3
['a', 'paradox', 'and', 'a', 'puzzle']
[]
['a', 'paradox', 'and', 'a', 'puzzle']
0.55 3
['actual']
[]
['actual']
0.55 3
['the', 'naive', 'and', 'sincere', 'interest']
[]
['the', 'naive', 'and', 'sincere', 'interest']
0.56 3
['mysterious']
[]
['mysterious']
0.71 4
['rise']
[]
['rise']
0.59 3
['motionless', 'little']
[]
['motionless', 'little']
0.5 3
['few', 'high']
[]
['few', 'high']
0.51 3
['also', 'animate']
[]
['also', 'animate']
0.51 3
['its', 'softness']
[]
['its', 'soft

In [36]:
gold_opinion.head()

Unnamed: 0,source_file,text,_sentence_text,annotation,sentiment_predictions
0,GB-117_sample_English_18.txt,moderate,"In the P.M. had a moderate breeze at East , wh...",3,3
1,GB-117_sample_English_18.txt,fair,At Midnight the wind came to South-South-West ...,4,3
2,GB-117_sample_English_18.txt,fresh,Cloudy weather ; Winds at South-West and South...,4,4
3,GB-117_sample_English_18.txt,with which we made our Course good,Cloudy weather ; Winds at South-West and South...,4,3
4,GB-117_sample_English_18.txt,steady brisk,Had a steady brisk Gale at South-South-West wi...,4,4


## Evaluation of sentiment scores (gold standard)

In [37]:
gold_opinion["sentiment_predictions"] = pd.to_numeric(gold_opinion["sentiment_predictions"])
gold_opinion["annotation"] = pd.to_numeric(gold_opinion["annotation"])

In [38]:
true = list(gold_opinion["annotation"])
pred = list(gold_opinion["sentiment_predictions"])

In [39]:
print(classification_report(true, pred))
#system doesn't find any 1s and 5s

              precision    recall  f1-score   support

           1       0.00      0.00      0.00        88
           2       0.49      0.19      0.28       366
           3       0.14      0.64      0.22       200
           4       0.69      0.34      0.46       686
           5       0.00      0.00      0.00        96

    accuracy                           0.30      1436
   macro avg       0.26      0.24      0.19      1436
weighted avg       0.47      0.30      0.32      1436



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# Interesting sources 🎓

**Sources on rule-based aspect extraction**

*  Deon Mai and Wei Emma Zhang. 2020. Aspect Extraction Using Coreference Resolution and Unsupervised Filtering. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 124–129, Suzhou, China. Association for Computational Linguistics.
*   Anwar, Muchamad & Trisanto, Dedy & Juniar, Ahmad & Sase, Fitra. (2023). Aspect-based Sentiment Analysis on Car Reviews Using SpaCy Dependency Parsing and VADER. Advance Sustainable Science Engineering and Technology. 5. 0230109. 10.26877/asset.v5i1.14897.
*   https://www.digitalhumanities.org/dhq/vol/17/2/000691/000691.html#elkins2022
*  https://www.enjoyalgorithms.com/blog/aspect-base-sentiment-analysis-in-python



