![NVIDIA Logo](images/nvidia.png)

# Sentiment Analysis

In this notebook you will begin work on a sentiment analysis task using a dataset of Amazon reviews by performing a baseline zero-shot analysis on 2 GPT models.

---

## Learning Objectives

By the time you complete this notebook you will be able to:
- Be familiar with the Amazon reviews dataset.
- Observe zero-shot performance for sentiment analysis on the reviews using GPT43B and GPT8B.

---

## Imports

In [1]:
import json

from llm_utils.nemo_service_models import NemoServiceBaseModel
from llm_utils.models import Models

---

## List Models

In [2]:
Models.list_models()

gpt8b: gpt-8b-000
gpt20b: gpt20b
gpt43b_2: gpt-43b-002
gpt43b: gpt-43b-001
llama70b_chat: llama-2-70b-chat-hf
llama70b: llama-2-70b-hf


---

## Amazon Review Data

For the sentiment analysis task, we will be working with a public dataset of Amazon customer reviews. The raw reviews file has been provided for you at `data/reviews.txt`. It contains 400,000 reviews.

In [3]:
!wc -l data/reviews.txt

400000 data/reviews.txt


If we look at the first few samples, we can see that each begins with either `__label__2` which indicates a positive sentiment, or `__label__1` which indicates a negative sentiment.

In [4]:
!head -3 data/reviews.txt

__label__2 Great CD: My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing "Who was that singing ?"
__label__2 One of the best game music soundtracks - for a game I didn't really play: Despite the fact that I have only played a small portion of the game, the music I heard (plus the connection to Chrono Trigger which was great as well) led me to purchase the soundtrack, and it remains one of my favorite albums. There is an incredible mix of fun, epic, and emotional songs. Those sad and beautiful tracks I especially like, as there's not too many of th

---

## Sentiment Analysis Prompt Template

For our sentiment analysis task, we will be working with the following prompt template.

In [5]:
def sentiment_template(text):
    return f'Is the overall sentiment of the following review "positive" or "negative"? {review} Sentiment:'

Assuming we have a review to pass into the template:

In [13]:
review = f'''\
One of the best game music soundtracks - for a game I didn't really play: Despite the fact that I \
have only played a small portion of the game, the music I heard (plus the connection to Chrono Trigger \
which was great as well) led me to purchase the soundtrack, and it remains one of my favorite albums. \
There is an incredible mix of fun, epic, and emotional songs. Those sad and beautiful tracks I especially \
like, as there's not too many of those kinds of songs in my other video game soundtracks. \
I must admit that one of the songs (Life-A Distant Promise) has brought tears to my eyes on many occasions.\
My one complaint about this soundtrack is that they use guitar fretting effects in many of the songs, \
which I find distracting. But even if those weren't included I would still consider the collection worth it.\
'''

...we can generate a sentiment analysis prompt for the review.

In [14]:
print(sentiment_template(review))

Is the overall sentiment of the following review "positive" or "negative"? One of the best game music soundtracks - for a game I didn't really play: Despite the fact that I have only played a small portion of the game, the music I heard (plus the connection to Chrono Trigger which was great as well) led me to purchase the soundtrack, and it remains one of my favorite albums. There is an incredible mix of fun, epic, and emotional songs. Those sad and beautiful tracks I especially like, as there's not too many of those kinds of songs in my other video game soundtracks. I must admit that one of the songs (Life-A Distant Promise) has brought tears to my eyes on many occasions.My one complaint about this soundtrack is that they use guitar fretting effects in many of the songs, which I find distracting. But even if those weren't included I would still consider the collection worth it. Sentiment:


## Process Prompts and Labels

For our purposes we will create a training dataset of 1500 samples, as well as a small test dataset of 20 samples.

Here we gather the first 1520 samples into a `prompts_with_labels` list which contains 2-tuples of review prompts, created using `sentiment_template`, and their labels.

In [15]:
prompts_with_labels = []

with open('data/reviews.txt', 'r', encoding='utf-8') as file:
    for i, line in enumerate(file):
        if i >= 1520:  # Stop after reading 1520 lines
            break

        label, review = line.strip().split(' ', 1)
        sentiment = 'positive' if label == '__label__2' else 'negative'
        prompts_with_labels.append((sentiment_template(review), sentiment))

In [16]:
print(prompts_with_labels[0])

('Is the overall sentiment of the following review "positive" or "negative"? Great CD: My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I\'m in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life\'s hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing "Who was that singing ?" Sentiment:', 'positive')


Next we split the list into separate train and test lists.

In [17]:
train_prompts_with_labels = prompts_with_labels[:1500]
test_prompts_with_labels = prompts_with_labels[1500:]

In [18]:
len(train_prompts_with_labels)

1500

In [19]:
len(test_prompts_with_labels)

20

## Write Data to File

For use in subsequent notebooks, we will now write the train and test prompts and labels data to file.

In [20]:
with open('data/sentiment_prompts_labels_train_1500.json', 'w') as f:
    json.dump(train_prompts_with_labels, f)

In [21]:
with open('data/sentiment_prompts_labels_test_20.json', 'w') as f:
    json.dump(test_prompts_with_labels, f)

## Test Models on Zero-shot Prompts

Before we begin work on fine-tuning, let's establish a baseline for performance by using our zero-shot prompts with GPT43B and GPT8B.

## GPT43B

First we create an instance of the GPT43B model.

In [22]:
gpt43b = NemoServiceBaseModel(Models.gpt43b.value)

### Sanity Check

Let's try a single sentiment analysis prompt out on GPT43B.

In [23]:
prompt, label = test_prompts_with_labels[0]

In [24]:
label

'negative'

In [25]:
gpt43b.generate(prompt)

' negative'

Except for some white space we can strip, it looks pretty good so far.

### Try on Test Data

Let's try GPT43B on the full test set.

In [26]:
num_correct = 0
num_samples = len(test_prompts_with_labels)
for prompt, label in test_prompts_with_labels:
    response = gpt43b.generate(prompt).strip()
    is_correct = response == label
    if is_correct:
        num_correct += 1
    print(f'Response: {response}')
    print(f'Label: {label}')
    print(f'Is Correct: {response == label}\n')

print(f'Number Correct: {num_correct}/{num_samples}')
print(f'Percentage Correct: {num_correct / num_samples*100:.1f}%')

Response: negative
Label: negative
Is Correct: True

Response: Negative
Label: negative
Is Correct: False

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: negative
Label: negative
Is Correct

### Analysis

GPT43B seems to be well-suited out of the box for this sentiment analysis task.

---

## GPT8B

Next we will try with GPT8B. First we create a model instance.

In [27]:
gpt8b = NemoServiceBaseModel(Models.gpt8b.value)

### Sanity Check

Let's try a single sentiment analysis prompt out on GPT8B.

In [28]:
prompt, label = test_prompts_with_labels[0]

In [29]:
label

'negative'

In [30]:
gpt8b.generate(prompt)

' Negative\n\nReviewer: jennie - favorite favorite favorite favorite favorite - April 27, 2007\n\nSubject: Clay Aiken is a disgrace to the North Carolina name I\'m sorry: I\'m sorry for everyone out there who actually spent their money on Clay\'s CD. I\'m amazed that Clay even made it to the top ten on American Idol. I think that the only people that voted for him were people that thought he was "cute". Personally I think he is a disgrace to the North Carolina name. In conclusion, the only reason I give this CD one star is because you can\'t give a CD zero stars, if I could give it zero stars I would. Sentiment: Negative\n\nReviewer: jennie - favorite favorite favorite favorite favorite - April 27, 2007\n\nSubject: Clay Aiken is a disgrace to the North Carolina name I\'m sorry: I\'m sorry for everyone'

GPT8B gave us a the correct sentiment, but then went on long after we wished.

### Try on Test Data

Let's try GPT8B on the full test set. We will indicate that we wish the model to stop generating after newline characters, strip white space, and lower case its responses.

In [31]:
num_correct = 0
num_samples = len(test_prompts_with_labels)
for prompt, label in test_prompts_with_labels:
    response = gpt8b.generate(prompt, stop=['\n']).strip().lower()
    is_correct = response == label
    if is_correct:
        num_correct += 1
    print(f'Response: {response}')
    print(f'Label: {label}')
    print(f'Is Correct: {response == label}\n')

print(f'Number Correct: {num_correct}/{num_samples}')
print(f'Percentage Correct: {num_correct / num_samples*100:.1f}%')

Response: negative
Label: negative
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive.
Label: positive
Is Correct: False

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: negative
Is Correct: False

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: negative
Label: negative
Is Correct: True

Response: positive
Label: positive
Is Correct: True

Response: positive.
Label: positive
Is Correct: False

Response: negative
Label: negative
Is Correct: True

Response: negative
Label: negative
Is Cor

### Analysis

GPT8B did pretty well on this task, although we had to rely on a fair amount of post-processing, including a stop character to prevent it from going on long after we wished.

Looking at the outputs above, it missed at least a couple on account of including a period at the end of its output, and we see that it still got the wrong sentiment on occasion.