### Working Environment

### Import Dataset

In [28]:
import pandas as pd

data = pd.read_csv('amazon_alexa.tsv', sep='\t')
data.head(10)

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1
5,5,31-Jul-18,Heather Gray Fabric,I received the echo as a gift. I needed anothe...,1
6,3,31-Jul-18,Sandstone Fabric,"Without having a cellphone, I cannot use many ...",1
7,5,31-Jul-18,Charcoal Fabric,I think this is the 5th one I've purchased. I'...,1
8,5,30-Jul-18,Heather Gray Fabric,looks great,1
9,5,30-Jul-18,Heather Gray Fabric,Love it! I’ve listened to songs I haven’t hear...,1


In [29]:
mydata = data[['verified_reviews','feedback']]
mydata.columns = ['review','label']

mydata.head()

Unnamed: 0,review,label
0,Love my Echo!,1
1,Loved it!,1
2,"Sometimes while playing a game, you can answer...",1
3,I have had a lot of fun with this thing. My 4 ...,1
4,Music,1


In [30]:
mydata.value_counts('label')

label
1    2893
0     257
dtype: int64

In [31]:
# Count the occurrences of each label
label_counts = mydata["label"].value_counts()

# Get the number of rows to drop from the majority class
rows_to_drop = label_counts.max() - label_counts.min()

# Drop rows from the majority class randomly
if rows_to_drop > 0:
   data_majority = mydata[mydata["label"] == 1]
   data_balanced = mydata.drop(data_majority.sample(rows_to_drop).index)
else:
   data_balanced = mydata.copy()

# Check the new class balance
print(data_balanced["label"].value_counts())

1    257
0    257
Name: label, dtype: int64


## Data Preprocessing

In [32]:
import re

def clean_text(text):
  # Remove special characters and punctuation
  text = re.sub(r"[^\w\s]", " ", text)

  # Remove single characters
  text = re.sub(r"\b[a-zA-Z]\b", " ", text)

  # Remove HTML tags
  text = re.sub(r"<[^>]*>", " ", text)

  # Lowercase the text
  text = text.lower()

  # Remove extra whitespace
  text = re.sub(r"\s+", " ", text)

  # Trim leading and trailing spaces
  text = text.strip()

  return text

In [33]:
import pandas as pd

# Extract the review column as a list
reviews = data_balanced['review'].tolist()

# Clean the text in the list
cleaned_reviews = [clean_text(review) for review in reviews]

# Add the cleaned reviews as a new column to the DataFrame
data_balanced['clean_reviews'] = cleaned_reviews

In [34]:
data_balanced

Unnamed: 0,review,label,clean_reviews
2,"Sometimes while playing a game, you can answer...",1,sometimes while playing game you can answer qu...
16,Really happy with this purchase. Great speake...,1,really happy with this purchase great speaker ...
21,"We love Alexa! We use her to play music, play ...",1,we love alexa we use her to play music play ra...
26,"I love my Echo. It's easy to operate, loads of...",1,love my echo it easy to operate loads of fun i...
33,The speakers sound pretty good for being so sm...,1,the speakers sound pretty good for being so sm...
...,...,...,...
3135,I loved it does exactly what it says,1,loved it does exactly what it says
3137,Very convenient,1,very convenient
3139,Easy to set up Ready to use in minutes.,1,easy to set up ready to use in minutes
3143,Awesome device wish I bought one ages ago.,1,awesome device wish bought one ages ago


## Data Split

In [35]:
import pandas as pd

# Assuming your DataFrame is called "df"
total_rows = len(data_balanced)
test_size = int(total_rows * 0.95)

# Randomly sample train_size rows for the training set
test_set = data_balanced.sample(test_size)

# Get the remaining rows for the test set
train_set = data_balanced.drop(test_set.index)

## Sentiment w/ LLM

### Setting up Gemini API

In [36]:
!pip install -q -U google-generativeai

In [37]:
# Necessary packages
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Used to securely store your API key
from google.colab import userdata

In [38]:
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY='AIzaSyCpZFN0kGg0VyhYz4HUuuOuCCG_3ZoGmJE'
genai.configure(api_key=GOOGLE_API_KEY)

In [39]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-pro
models/gemini-pro-vision


In [40]:
model = genai.GenerativeModel('gemini-pro')

In [41]:
%%time
response = model.generate_content("What is the meaning of life?")

to_markdown(response.text)

CPU times: user 88.5 ms, sys: 11.2 ms, total: 99.6 ms
Wall time: 6.1 s


> The meaning of life is a deep and philosophical question that has been pondered by humans for centuries. There is no one definitive answer, as the meaning of life can vary from person to person.
> 
> Some people believe that the meaning of life is to find happiness and fulfillment. Others believe that it is to make a difference in the world and leave a lasting legacy. Still others believe that the meaning of life is to connect with something greater than oneself, such as God or the universe.
> 
> Ultimately, the meaning of life is a personal decision that each individual must make for themselves. However, there are some common themes that can be found in many people's answers to this question.
> 
> One common theme is the idea of purpose. Many people believe that the meaning of life is to find their purpose and then live it out. This purpose could be anything from raising a family to starting a business to making the world a better place.
> 
> Another common theme is the idea of love. Many people believe that the meaning of life is to love and be loved. This love can come from family, friends, romantic partners, or even strangers.
> 
> Finally, many people believe that the meaning of life is to learn and grow. They believe that we are all here on Earth to learn and evolve, and that the lessons we learn will help us to become better people.
> 
> No matter what your personal beliefs are, there is no right or wrong answer to the question of what is the meaning of life. The most important thing is to live your life in a way that is meaningful to you.

#### Single API Call

In [42]:
test_set_sample = test_set.sample(20)

test_set_sample['pred_label'] = ''

test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2000,received the wrong product...was so excited to...,0,received the wrong product was so excited to i...,
1976,The Echo does not link to Direct tv even thoug...,0,the echo does not link to direct tv even thoug...,
1979,Doesn't use apple music. It is worthless to m...,0,doesn use apple music it is worthless to me wi...,
2162,Was loving it but starting in June Hulu stoppe...,0,was loving it but starting in june hulu stoppe...,
2734,Really like it,1,really like it,
2398,Easy to install. Up and running in minutes.,1,easy to install up and running in minutes,
2152,I will never buy anything Amazon makes again!T...,0,will never buy anything amazon makes again thi...,
350,Item no longer works after just 5 months of us...,0,item no longer works after just 5 months of us...,
1125,Love it. Small with good sound,1,love it small with good sound,
352,Works great no different than the new ones,1,works great no different than the new ones,


In [43]:
# Convert the DataFrame to JSON using the to_json() method

json_data = test_set_sample[['clean_reviews','pred_label']].to_json(orient='records')

# Print the JSON data
print(json_data)

[{"clean_reviews":"received the wrong product was so excited to install it all excitement gone thank you amazon","pred_label":""},{"clean_reviews":"the echo does not link to direct tv even though they said it would have spent multiple hours on this made phone calls and texts also is missing free bulb","pred_label":""},{"clean_reviews":"doesn use apple music it is worthless to me without it sound quality on it is also poor","pred_label":""},{"clean_reviews":"was loving it but starting in june hulu stopped working and crunchyroll doesn work either probably will switch to something else soon","pred_label":""},{"clean_reviews":"really like it","pred_label":""},{"clean_reviews":"easy to install up and running in minutes","pred_label":""},{"clean_reviews":"will never buy anything amazon makes again this fire stick is not even year old and it does nothing but restart and freeze up constantly no warranty from amazon in this should have been my first clue that it was horrible device will never 

In [44]:
prompt = f"""
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
{json_data}
```
"""

print(prompt)


You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
[{"clean_reviews":"received the wrong product was so excited to install it all excitement gone thank you amazon","pred_label":""},{"clean_reviews":"the echo does not link to direct tv even though they said it would have spent multiple hours on this made phone calls and texts also is missing free bulb","pred_label":""},{"clean_reviews":"doesn use apple music it is worthless to me without it sound quality on it is also poor","pred_label":""},{"clean_reviews":"was loving it but starting in june hulu stopped wo

In [45]:
response = model.generate_content(prompt)

print(response.text)

```
[{"clean_reviews":"received the wrong product was so excited to install it all excitement gone thank you amazon","pred_label":0},{"clean_reviews":"the echo does not link to direct tv even though they said it would have spent multiple hours on this made phone calls and texts also is missing free bulb","pred_label":0},{"clean_reviews":"doesn use apple music it is worthless to me without it sound quality on it is also poor","pred_label":0},{"clean_reviews":"was loving it but starting in june hulu stopped working and crunchyroll doesn work either probably will switch to something else soon","pred_label":0},{"clean_reviews":"really like it","pred_label":1},{"clean_reviews":"easy to install up and running in minutes","pred_label":1},{"clean_reviews":"will never buy anything amazon makes again this fire stick is not even year old and it does nothing but restart and freeze up constantly no warranty from amazon in this should have been my first clue that it was horrible device will never bu

In [46]:
import json

# Clean the data by stripping the backticks
json_data = response.text.strip("`")

# Load the cleaned data and convert to DataFrame
data = json.loads(json_data)
df_sample = pd.DataFrame(data)

df_sample

Unnamed: 0,clean_reviews,pred_label
0,received the wrong product was so excited to i...,0
1,the echo does not link to direct tv even thoug...,0
2,doesn use apple music it is worthless to me wi...,0
3,was loving it but starting in june hulu stoppe...,0
4,really like it,1
5,easy to install up and running in minutes,1
6,will never buy anything amazon makes again thi...,0
7,item no longer works after just 5 months of us...,0
8,love it small with good sound,1
9,works great no different than the new ones,1


In [47]:
# prompt: Overwrite pred_label from 'df' into pred_label in 'train_set_sample'

test_set_sample['pred_label'] = df_sample['pred_label'].values
test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2000,received the wrong product...was so excited to...,0,received the wrong product was so excited to i...,0
1976,The Echo does not link to Direct tv even thoug...,0,the echo does not link to direct tv even thoug...,0
1979,Doesn't use apple music. It is worthless to m...,0,doesn use apple music it is worthless to me wi...,0
2162,Was loving it but starting in June Hulu stoppe...,0,was loving it but starting in june hulu stoppe...,0
2734,Really like it,1,really like it,1
2398,Easy to install. Up and running in minutes.,1,easy to install up and running in minutes,1
2152,I will never buy anything Amazon makes again!T...,0,will never buy anything amazon makes again thi...,0
350,Item no longer works after just 5 months of us...,0,item no longer works after just 5 months of us...,0
1125,Love it. Small with good sound,1,love it small with good sound,1
352,Works great no different than the new ones,1,works great no different than the new ones,1


In [48]:
# Plotting confusion matrix on the predictions

from sklearn.metrics import confusion_matrix

y_true = test_set_sample["label"]
y_pred = test_set_sample["pred_label"]

confusion_matrix(y_true, y_pred)

array([[11,  0],
       [ 0,  9]])