## Sentiment Analysis of Reviews using NLTK in Python
- This project is created as part of the [#CrystalizeMyLearning](https://www.linkedin.com/pulse/introducing-crystalizemylearning-movement-lye-jia-jun/) movement I founded where I seek to learn through writing

In [1]:
import nltk

### Extracting Source Data
- Source: https://www.amazon.sg/Kindle-Paperwhite-Waterproof-International-generation/dp/B07741S7Y8/ref=sr_1_18?keywords=kindle+paperwhite+2021&qid=1651924528&sprefix=kindle+paperwhite%2Caps%2C323&sr=8-18#customerReviews

In [2]:
with open("kindle_reviews.txt", encoding='utf-8') as f:
    reviews = f.read()
    review_list = reviews.split('\n\n')

In [3]:
# show the last review on the review list
review_list[-1]

"I've been reading with an e-reader for years. Mostly a Kobo. Also have the Kobo Forma, great e-reader. But I was always curious about the Kindle. Now ordered the Kindle with Amazon.nl and I'm very happy with it! Nice small size. Fits almost in my pocket. Fantastic dictionaries and great ease of reading. I can see that there are years of experience with e-readers behind this. Super purchase."

### Executing Sentiment Analysis Algorithm

In [4]:
# Extract positive word list from positive.txt
with open("positive.txt") as f:
    content = f.read()
    positive_word_list = content.splitlines()
    
# Extract negative word list from negative.txt
with open("negative.txt") as f:
    content = f.read()
    negative_word_list = content.splitlines()

positive_word_list[:10]

['abound',
 'abounds',
 'abundance',
 'abundant',
 'accessable',
 'accessible',
 'acclaim',
 'acclaimed',
 'acclamation',
 'accolade']

**The main sentiment analysis algorithm**

In [5]:
review_sentiment_score_dict = {}

for review in review_list: # iterate through each reivew
    review_sentiment_score_dict[review] =  {
        'sentiment_score': 0,
        'positive_adj':[],
        'negative_adj':[]
    }
    # score to track the sentiment value of this review
    sentiment_score = 0
    # tokenize the review paragraph to list of word tokens
    word_list = nltk.word_tokenize(review)
    # list containing predicted part of speech for each word
    pos_list = nltk.pos_tag(word_list, tagset='universal')
    # all the words that are identified as adjective    
    adj_list = [word[0] for word in pos_list if word[1] == 'ADJ']
    for adjective in adj_list: # iterate through each adjective
        # add 1 to sentiment score for each positive adjective detected
        if adjective.lower() in positive_word_list: 
            review_sentiment_score_dict[review]['positive_adj'].append(adjective)
            sentiment_score+=1
        # subtract 1 from sentiment score for each negative adjective detected
        if adjective.lower() in negative_word_list: 
            review_sentiment_score_dict[review]['negative_adj'].append(adjective)
            sentiment_score-=1
            
    # store review and its score into the dictionary
    review_sentiment_score_dict[review]['sentiment_score'] = sentiment_score

In [6]:
for review, info in review_sentiment_score_dict.items():
    print('Sentiment Score:', info['sentiment_score'])
    print('Positive Adjectives:', info['positive_adj'])
    print('Negative Adjectives:', info['negative_adj'])
    print(review + '\n')

Sentiment Score: -1
Positive Adjectives: []
Negative Adjectives: ['bad']
I owned 3 Kindles.
This Kindle started acting wierd 2 months in. (No response to touch, phantom touch, auto page turn)
The warranty and support is really bad. I can't return it. I can't get a replacement. I had to buy a new one, and get it refunded, and then arrange for shipment back. All just because I bought from Amazon.sg

Sentiment Score: 1
Positive Adjectives: ['impress']
Negative Adjectives: []
Hi folks! I just received my Kindle from Amazon within 9 days of ordering it from Singapore online and the package was well packed and secured from tampering.
As I am a first time user I must say that I am truly impress as it is a really light weight device to hold and use. It does exactly what it is suppose to do which is to use it for reading. I use to use the Kindle app on my iPhone but I always got interrupted while reading as messages would pop up.
Honestly, I have been hesitating for a long time in getting a Kin

### Explanation of NLTK Methods

**nltk.word_tokenize()**
- Tokenize paragraph of review into list of words

In [7]:
for review in review_list:
    word_list = nltk.word_tokenize(review)
word_list

['I',
 "'ve",
 'been',
 'reading',
 'with',
 'an',
 'e-reader',
 'for',
 'years',
 '.',
 'Mostly',
 'a',
 'Kobo',
 '.',
 'Also',
 'have',
 'the',
 'Kobo',
 'Forma',
 ',',
 'great',
 'e-reader',
 '.',
 'But',
 'I',
 'was',
 'always',
 'curious',
 'about',
 'the',
 'Kindle',
 '.',
 'Now',
 'ordered',
 'the',
 'Kindle',
 'with',
 'Amazon.nl',
 'and',
 'I',
 "'m",
 'very',
 'happy',
 'with',
 'it',
 '!',
 'Nice',
 'small',
 'size',
 '.',
 'Fits',
 'almost',
 'in',
 'my',
 'pocket',
 '.',
 'Fantastic',
 'dictionaries',
 'and',
 'great',
 'ease',
 'of',
 'reading',
 '.',
 'I',
 'can',
 'see',
 'that',
 'there',
 'are',
 'years',
 'of',
 'experience',
 'with',
 'e-readers',
 'behind',
 'this',
 '.',
 'Super',
 'purchase',
 '.']

**nltk.pos_tag()**
- Tag the relevant part of speech for each word

In [8]:
for review in review_list:
    word_list = nltk.word_tokenize(review)
    pos_list = nltk.pos_tag(word_list, tagset='universal')
pos_list

[('I', 'PRON'),
 ("'ve", 'VERB'),
 ('been', 'VERB'),
 ('reading', 'VERB'),
 ('with', 'ADP'),
 ('an', 'DET'),
 ('e-reader', 'NOUN'),
 ('for', 'ADP'),
 ('years', 'NOUN'),
 ('.', '.'),
 ('Mostly', 'ADV'),
 ('a', 'DET'),
 ('Kobo', 'NOUN'),
 ('.', '.'),
 ('Also', 'ADV'),
 ('have', 'VERB'),
 ('the', 'DET'),
 ('Kobo', 'NOUN'),
 ('Forma', 'NOUN'),
 (',', '.'),
 ('great', 'ADJ'),
 ('e-reader', 'NOUN'),
 ('.', '.'),
 ('But', 'CONJ'),
 ('I', 'PRON'),
 ('was', 'VERB'),
 ('always', 'ADV'),
 ('curious', 'ADJ'),
 ('about', 'ADP'),
 ('the', 'DET'),
 ('Kindle', 'NOUN'),
 ('.', '.'),
 ('Now', 'ADV'),
 ('ordered', 'VERB'),
 ('the', 'DET'),
 ('Kindle', 'NOUN'),
 ('with', 'ADP'),
 ('Amazon.nl', 'NOUN'),
 ('and', 'CONJ'),
 ('I', 'PRON'),
 ("'m", 'VERB'),
 ('very', 'ADV'),
 ('happy', 'ADJ'),
 ('with', 'ADP'),
 ('it', 'PRON'),
 ('!', '.'),
 ('Nice', 'NOUN'),
 ('small', 'ADJ'),
 ('size', 'NOUN'),
 ('.', '.'),
 ('Fits', 'VERB'),
 ('almost', 'ADV'),
 ('in', 'ADP'),
 ('my', 'PRON'),
 ('pocket', 'NOUN'),
 ('.', '.

**Extracting adjectives from part of speech word list**
- Extract words that is tagged as 'ADJ' (adjectives)

In [9]:
for review in review_list:
    word_list = nltk.word_tokenize(review)
    pos_list = nltk.pos_tag(word_list, tagset='universal')
    adj_list = [word[0] for word in pos_list if word[1] == 'ADJ']
adj_list

['great', 'curious', 'happy', 'small', 'Fantastic', 'great']

**Calculate sentiment score**

In [10]:
for review in review_list:
    sentiment_score = 0
    word_list = nltk.word_tokenize(review)
    pos_list = nltk.pos_tag(word_list, tagset='universal')
    adj_list = [word[0] for word in pos_list if word[1] == 'ADJ']
    for adjective in adj_list:
        if adjective.lower() in positive_word_list:
            sentiment_score += 1
        if adjective.lower() in negative_word_list:
            sentiment_score -= 1

sentiment_score           

4