## Muthu Palaniappan M 21011101079 - Token Level Sentiment Analysis

### WordNet
- WordNet is a lexical database composing English words, grouped as synonyms into what is known as synsets.
- It is a freely available tool, which can be downloaded from its official website.
- While WordNet can be loosely termed as a Thesaurus, it is said to be more semantically accurate, since it stores synonyms of words put together is specific contexts.
- All the words are linked together by the ISA relationship (more commonly, Generalisation). For example, a car is a type of vehicle, just as a truck.

## SentiWordNet
- SentiWordNet operates on the database provided by WordNet.
- The additional functionality that it provides is the measure of positivity, negativity or neutrality as is required for Sentiment Analysis.
- every synset s is associated with a Pos(s): a positivity score Neg(s): a negativity score Obj(s): an objectivity (neutrality) score
- Pos(s) + Neg(s) + Obj(s) = 1

## Importing Pacakges

In [24]:
import nltk
nltk.download('sentiwordnet')
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.tag import pos_tag
from nltk.stem import WordNetLemmatizer
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package sentiwordnet to /root/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

https://www.nltk.org/_modules/nltk/corpus/reader/sentiwordnet.html

In [16]:
list(swn.senti_synsets('slow'))

[SentiSynset('decelerate.v.01'),
 SentiSynset('slow.v.02'),
 SentiSynset('slow.v.03'),
 SentiSynset('slow.a.01'),
 SentiSynset('slow.a.02'),
 SentiSynset('dense.s.04'),
 SentiSynset('slow.a.04'),
 SentiSynset('boring.s.01'),
 SentiSynset('dull.s.08'),
 SentiSynset('slowly.r.01'),
 SentiSynset('behind.r.03')]

In [45]:
sentence='The weather is beautiful today and I am feeling happy, but the news is depressing.'

In [46]:
token = nltk.word_tokenize(sentence)
after_tagging = nltk.pos_tag(token)

In [47]:
print(f"Tokens from the Text: {token}")
print("===================================")
print(f"Tags from the tokens{after_tagging}")

Tokens from the Text: ['The', 'weather', 'is', 'beautiful', 'today', 'and', 'I', 'am', 'feeling', 'happy', ',', 'but', 'the', 'news', 'is', 'depressing', '.']
Tags from the tokens[('The', 'DT'), ('weather', 'NN'), ('is', 'VBZ'), ('beautiful', 'JJ'), ('today', 'NN'), ('and', 'CC'), ('I', 'PRP'), ('am', 'VBP'), ('feeling', 'VBG'), ('happy', 'JJ'), (',', ','), ('but', 'CC'), ('the', 'DT'), ('news', 'NN'), ('is', 'VBZ'), ('depressing', 'VBG'), ('.', '.')]


In [48]:
def penn_to_wn(tag):
    """
    Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

In [49]:
sentiment = 0
tokens_count = 0
lemmatizer = WordNetLemmatizer()

In [55]:
for word, tag in after_tagging:
  wn_tag = penn_to_wn(tag)
  if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
    continue

  lemma = lemmatizer.lemmatize(word, pos=wn_tag)
  if not lemma:
    continue

  synsets = wn.synsets(lemma, pos=wn_tag)
  if not synsets:
    continue

  # Take the first sense, the most common
  synset = synsets[0]
  swn_synset = swn.senti_synset(synset.name())
  print(swn_synset)

  sentiment += swn_synset.pos_score() - swn_synset.neg_score()
  tokens_count += 1

<weather.n.01: PosScore=0.0 NegScore=0.0>
<beautiful.a.01: PosScore=0.75 NegScore=0.0>
<today.n.01: PosScore=0.125 NegScore=0.0>
<happy.a.01: PosScore=0.875 NegScore=0.0>
<news.n.01: PosScore=0.0 NegScore=0.0>


In [51]:
print(sentiment)

1.75


In [60]:
!pip install datasets
from IPython.display import clear_output
clear_output()

In [58]:
from datasets import load_dataset
dataset = load_dataset("sentiment140")

Downloading data:   0%|          | 0.00/124M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.1k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1600000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/498 [00:00<?, ? examples/s]

In [124]:
data = dataset['train'].to_pandas()
data.drop(columns = ['date','user','query'],inplace=True)
data.head()

Unnamed: 0,text,sentiment
0,"@switchfoot http://twitpic.com/2y1zl - Awww, t...",0
1,is upset that he can't update his Facebook by ...,0
2,@Kenichan I dived many times for the ball. Man...,0
3,my whole body feels itchy and like its on fire,0
4,"@nationwideclass no, it's not behaving at all....",0


In [125]:
import re

In [126]:
def clean(text):
# Removes all special characters and numericals leaving the alphabets
    text = re.sub('[^A-Za-z]+', ' ', text)
    return text

In [127]:
data['text'] = data['text'].apply(clean)
data.head()

Unnamed: 0,text,sentiment
0,switchfoot http twitpic com y zl Awww that s ...,0
1,is upset that he can t update his Facebook by ...,0
2,Kenichan I dived many times for the ball Mana...,0
3,my whole body feels itchy and like its on fire,0
4,nationwideclass no it s not behaving at all i...,0


In [128]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('stopwords')
from nltk.corpus import stopwords
nltk.download('wordnet')
from nltk.corpus import wordnet

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [129]:
data = data[:500]

In [130]:
pos_dict = {'J':wordnet.ADJ, 'V':wordnet.VERB, 'N':wordnet.NOUN, 'R':wordnet.ADV}

def token_stop_pos(text):
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
          newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist

In [131]:
data['POS tagged'] = data['text'].apply(token_stop_pos)
data.head()

Unnamed: 0,text,sentiment,POS tagged
0,switchfoot http twitpic com y zl Awww that s ...,0,"[(switchfoot, n), (http, n), (twitpic, n), (co..."
1,is upset that he can t update his Facebook by ...,0,"[(upset, a), (update, v), (Facebook, n), (text..."
2,Kenichan I dived many times for the ball Mana...,0,"[(Kenichan, n), (dived, v), (many, a), (times,..."
3,my whole body feels itchy and like its on fire,0,"[(whole, a), (body, n), (feels, n), (itchy, v)..."
4,nationwideclass no it s not behaving at all i...,0,"[(nationwideclass, n), (behaving, v), (mad, a)..."


In [132]:
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
def lemmatize(pos_data):
    lemma_rew = " "
    for word, pos in pos_data:
      if not pos:
        lemma = word
        lemma_rew = lemma_rew + " " + lemma
      else:
        lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
        lemma_rew = lemma_rew + " " + lemma
    return lemma_rew

In [133]:
data['Lemma'] = data['POS tagged'].apply(lemmatize)
data.head()

Unnamed: 0,text,sentiment,POS tagged,Lemma
0,switchfoot http twitpic com y zl Awww that s ...,0,"[(switchfoot, n), (http, n), (twitpic, n), (co...",switchfoot http twitpic com zl Awww bummer s...
1,is upset that he can t update his Facebook by ...,0,"[(upset, a), (update, v), (Facebook, n), (text...",upset update Facebook texting might cry resu...
2,Kenichan I dived many times for the ball Mana...,0,"[(Kenichan, n), (dived, v), (many, a), (times,...",Kenichan dive many time ball Managed save re...
3,my whole body feels itchy and like its on fire,0,"[(whole, a), (body, n), (feels, n), (itchy, v)...",whole body feel itchy like fire
4,nationwideclass no it s not behaving at all i...,0,"[(nationwideclass, n), (behaving, v), (mad, a)...",nationwideclass behave mad see


In [134]:
nltk.download('sentiwordnet')
from nltk.corpus import sentiwordnet as swn
def sentiwordnetanalysis(pos_data):
    sentiment = 0
    tokens_count = 0
    for word, pos in pos_data:
        if not pos:
            continue
        lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
        if not lemma:
            continue
        synsets = wordnet.synsets(lemma, pos=pos)  # Indent this line
        if not synsets:
            continue
            # Take the first sense, the most common
            synset = synsets[0]
            swn_synset = swn.senti_synset(synset.name())
            sentiment += swn_synset.pos_score() - swn_synset.neg_score()
            tokens_count += 1
            # print(swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score())
        if not tokens_count:
            return 0
        if sentiment>0:
            return "Positive"
        if sentiment==0:
            return "Neutral"
        else:
            return "Negative"

[nltk_data] Downloading package sentiwordnet to /root/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


In [135]:
data['SWN analysis'] = data['POS tagged'].apply(sentiwordnetanalysis)

In [136]:
data.head()

Unnamed: 0,text,sentiment,POS tagged,Lemma,SWN analysis
0,switchfoot http twitpic com y zl Awww that s ...,0,"[(switchfoot, n), (http, n), (twitpic, n), (co...",switchfoot http twitpic com zl Awww bummer s...,0.0
1,is upset that he can t update his Facebook by ...,0,"[(upset, a), (update, v), (Facebook, n), (text...",upset update Facebook texting might cry resu...,0.0
2,Kenichan I dived many times for the ball Mana...,0,"[(Kenichan, n), (dived, v), (many, a), (times,...",Kenichan dive many time ball Managed save re...,0.0
3,my whole body feels itchy and like its on fire,0,"[(whole, a), (body, n), (feels, n), (itchy, v)...",whole body feel itchy like fire,0.0
4,nationwideclass no it s not behaving at all i...,0,"[(nationwideclass, n), (behaving, v), (mad, a)...",nationwideclass behave mad see,0.0


### Conversion of Tokens

If the previous token is a negation word, attempts to find the antonym of the current token using WordNet. If an antonym is found, adds it to the converted_tokens list; otherwise, adds the lemma of the current token.

In [147]:
import nltk
from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [148]:
def convert_to_positive(sentence):
    negation_words = ['not', 'no', 'never', 'none', 'neither', 'nor', 'nobody', 'nowhere', 'nothing', 'noone', 'hardly', 'scarcely', 'barely']

    tokens = word_tokenize(sentence)
    tagged_tokens = nltk.pos_tag(tokens)
    lemmatizer = nltk.WordNetLemmatizer()

    converted_tokens = []
    i = 0
    while i < len(tagged_tokens):
        token, pos = tagged_tokens[i]
        if token.lower() in negation_words:
            i += 1
            continue
        elif pos.startswith('J'):  ##Adjective
            lemma = lemmatizer.lemmatize(token, wordnet.ADJ)
        elif pos.startswith('V'):  ##Verb
            lemma = lemmatizer.lemmatize(token, wordnet.VERB)
        elif pos.startswith('R'):  ##Adverb
            lemma = lemmatizer.lemmatize(token, wordnet.ADV)
        else:  ##Noun
            lemma = lemmatizer.lemmatize(token, wordnet.NOUN)

        ##Check for negation and convert to opposite meaning
        if i > 0 and tokens[i - 1].lower() in negation_words:
            for syn in wordnet.synsets(lemma):
                for lemma_pos in syn.pos():
                    if lemma_pos.startswith('J'):
                        converted_tokens.append(syn.lemmas()[0].antonyms()[0].name())
                        break
                else:
                    continue
                break
            else:
                converted_tokens.append(lemma)
        else:
            converted_tokens.append(lemma)

        i += 1

    return ' '.join(converted_tokens)

In [149]:
sen = "I am not happy with the service, neither are they"
converted_sentence = convert_to_positive(sentence)

In [152]:
print("Original sentence:", sen)
print("Converted sentence:", converted_sentence)

Original sentence: I am not happy with the service, neither are they
Converted sentence: I be happy with the service , be they
