### Working Environment

### Import Dataset

In [13]:
import pandas as pd

data = pd.read_csv('amazon_alexa.tsv', sep='\t')
data.head(10)

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1
5,5,31-Jul-18,Heather Gray Fabric,I received the echo as a gift. I needed anothe...,1
6,3,31-Jul-18,Sandstone Fabric,"Without having a cellphone, I cannot use many ...",1
7,5,31-Jul-18,Charcoal Fabric,I think this is the 5th one I've purchased. I'...,1
8,5,30-Jul-18,Heather Gray Fabric,looks great,1
9,5,30-Jul-18,Heather Gray Fabric,Love it! I’ve listened to songs I haven’t hear...,1


In [14]:
mydata = data[['verified_reviews','feedback']]
mydata.columns = ['review','label']

mydata.head()

Unnamed: 0,review,label
0,Love my Echo!,1
1,Loved it!,1
2,"Sometimes while playing a game, you can answer...",1
3,I have had a lot of fun with this thing. My 4 ...,1
4,Music,1


In [15]:
mydata.value_counts('label')

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,2893
0,257


In [16]:
# Count the occurrences of each label
label_counts = mydata["label"].value_counts()

# Get the number of rows to drop from the majority class
rows_to_drop = label_counts.max() - label_counts.min()

# Drop rows from the majority class randomly
if rows_to_drop > 0:
   data_majority = mydata[mydata["label"] == 1]
   data_balanced = mydata.drop(data_majority.sample(rows_to_drop).index)
else:
   data_balanced = mydata.copy()

# Check the new class balance
print(data_balanced["label"].value_counts())

label
1    257
0    257
Name: count, dtype: int64


## Data Preprocessing

In [17]:
import re

def clean_text(text):

  #convert to string
  text = str(text)

  # Remove special characters and punctuation
  text = re.sub(r"[^\w\s]", " ", text)

  # Remove single characters
  text = re.sub(r"\b[a-zA-Z]\b", " ", text)

  # Remove HTML tags
  text = re.sub(r"<[^>]*>", " ", text)

  # Lowercase the text
  text = text.lower()

  # Remove extra whitespace
  text = re.sub(r"\s+", " ", text)

  # Trim leading and trailing spaces
  text = text.strip()

  return text

In [18]:
import pandas as pd

# Extract the review column as a list
reviews = data_balanced['review'].tolist()

# Clean the text in the list
cleaned_reviews = [clean_text(review) for review in reviews]

# Add the cleaned reviews as a new column to the DataFrame
data_balanced['clean_reviews'] = cleaned_reviews

In [19]:
data_balanced

Unnamed: 0,review,label,clean_reviews
8,looks great,1,looks great
13,"Love, Love, Love!!",1,love love love
29,Just like the other one,1,just like the other one
31,I like it,1,like it
36,Love my Echo. Still learning all the things it...,1,love my echo still learning all the things it ...
...,...,...,...
3091,I didn’t order it,0,didn order it
3096,The product sounded the same as the emoji spea...,0,the product sounded the same as the emoji spea...
3110,"Love it! I personally prefer Spotify music, so...",1,love it personally prefer spotify music so it ...
3136,I used it to control my smart home devices. Wo...,1,used it to control my smart home devices works...


## Data Split

In [20]:
import pandas as pd

# Assuming your DataFrame is called "df"
total_rows = len(data_balanced)
test_size = int(total_rows * 0.95)

# Randomly sample train_size rows for the training set
test_set = data_balanced.sample(test_size)

# Get the remaining rows for the test set
train_set = data_balanced.drop(test_set.index)

## Sentiment w/ LLM

### Setting up Gemini API

In [21]:
!pip install -q -U google-generativeai

In [22]:
# Necessary packages
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Used to securely store your API key
from google.colab import userdata

In [23]:
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

In [24]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash
models/gemini-1.5-flash-001-tuning


In [25]:
model = genai.GenerativeModel('gemini-pro')

In [26]:
%%time
response = model.generate_content("What is the meaning of life?")

to_markdown(response.text)

CPU times: user 142 ms, sys: 14.2 ms, total: 156 ms
Wall time: 7.41 s


> The meaning of life is a philosophical and existential question that has been pondered by humans for centuries. There is no one definitive answer, as the meaning of life is subjective and personal to each individual. Some people may find meaning in pursuing their passions, helping others, or creating a legacy. Others may find meaning in their relationships, their work, or their religious or spiritual beliefs. Ultimately, the meaning of life is something that each person must discover for themselves.
> 
> According to Viktor Frankl, an Austrian neurologist, psychiatrist, philosopher, author, and Holocaust survivor, the meaning of life is to find meaning in one's life. He believed that people are motivated by a desire for meaning, and that finding meaning in life is essential for happiness and well-being. Frankl also believed that people can find meaning in life even in the most difficult circumstances, such as in the face of suffering and death.
> 
> The Dalai Lama, the spiritual leader of Tibet, has said that the meaning of life is to be happy. He believes that happiness is not something that can be achieved through external possessions or circumstances, but rather something that comes from within. The Dalai Lama teaches that we can find happiness by practicing compassion, kindness, and forgiveness, and by living in harmony with nature and with ourselves.
> 
> The meaning of life is a complex and multifaceted question, and there is no one definitive answer. However, by exploring our own values, beliefs, and experiences, we can each come to a deeper understanding of what makes life meaningful for us.

#### Single API Call

In [27]:
test_set_sample = test_set.sample(20)

test_set_sample['pred_label'] = ''

test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2496,My son loves his Echo Dot.,1,my son loves his echo dot,
621,If you want to listen to music and have it com...,0,if you want to listen to music and have it com...,
2653,I love it.,1,love it,
882,Really disappointed Alexa has to be plug-in to...,0,really disappointed alexa has to be plug in to...,
1602,Returned from repair with No repair done. It h...,0,returned from repair with no repair done it ha...,
2491,"I reached out to Amazon, because the device wa...",0,reached out to amazon because the device wante...,
375,Bought for my bathroom to listen when I'm in t...,1,bought for my bathroom to listen when in the s...,
388,Never could get it to work. A techie friend lo...,0,never could get it to work techie friend looke...,
1537,It does not pick up my voice unless I yell mos...,1,it does not pick up my voice unless yell most ...,
759,I use it primarily to play music. It works wo...,1,use it primarily to play music it works wonder...,


In [28]:
# Convert the DataFrame to JSON using the to_json() method

json_data = test_set_sample[['clean_reviews','pred_label']].to_json(orient='records')

# Print the JSON data
print(json_data)

[{"clean_reviews":"my son loves his echo dot","pred_label":""},{"clean_reviews":"if you want to listen to music and have it come through several of the echo dot units simultaneously you must pay monthly fee thought this was amazon not apple ve paid for many of these so could have one in each room is that not enough of my money","pred_label":""},{"clean_reviews":"love it","pred_label":""},{"clean_reviews":"really disappointed alexa has to be plug in to wall socket all the time my fault for not checking this but made the assumption that company has technologically advanced as amazon would sell this product with rechargeable battery if could return it would as my apple music and boom speaker give me more flexibility the alexa","pred_label":""},{"clean_reviews":"returned from repair with no repair done it has problem which requires it to be on for while for the defective part to show itself sent it for repair and got it back without being fixed the site gives no option to complain from the

In [29]:
prompt = f"""
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
{json_data}
```
"""

print(prompt)


You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
[{"clean_reviews":"my son loves his echo dot","pred_label":""},{"clean_reviews":"if you want to listen to music and have it come through several of the echo dot units simultaneously you must pay monthly fee thought this was amazon not apple ve paid for many of these so could have one in each room is that not enough of my money","pred_label":""},{"clean_reviews":"love it","pred_label":""},{"clean_reviews":"really disappointed alexa has to be plug in to wall socket all the time my fault for not checking this 

In [30]:
response = model.generate_content(prompt)

print(response.text)

```
[{"clean_reviews":"my son loves his echo dot","pred_label":1},{"clean_reviews":"if you want to listen to music and have it come through several of the echo dot units simultaneously you must pay monthly fee thought this was amazon not apple ve paid for many of these so could have one in each room is that not enough of my money","pred_label":0},{"clean_reviews":"love it","pred_label":1},{"clean_reviews":"really disappointed alexa has to be plug in to wall socket all the time my fault for not checking this but made the assumption that company has technologically advanced as amazon would sell this product with rechargeable battery if could return it would as my apple music and boom speaker give me more flexibility the alexa","pred_label":0},{"clean_reviews":"returned from repair with no repair done it has problem which requires it to be on for while for the defective part to show itself sent it for repair and got it back without being fixed the site gives no option to complain from the

In [31]:
import json

# Clean the data by stripping the backticks
json_data = response.text.strip("`")

# Load the cleaned data and convert to DataFrame
data = json.loads(json_data)
df_sample = pd.DataFrame(data)

df_sample

Unnamed: 0,clean_reviews,pred_label
0,my son loves his echo dot,1
1,if you want to listen to music and have it com...,0
2,love it,1
3,really disappointed alexa has to be plug in to...,0
4,returned from repair with no repair done it ha...,0
5,reached out to amazon because the device wante...,0
6,bought for my bathroom to listen when in the s...,1
7,never could get it to work techie friend looke...,0
8,it does not pick up my voice unless yell most ...,0
9,use it primarily to play music it works wonder...,1


In [32]:
# prompt: Overwrite pred_label from 'df' into pred_label in 'train_set_sample'

test_set_sample['pred_label'] = df_sample['pred_label'].values
test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2496,My son loves his Echo Dot.,1,my son loves his echo dot,1
621,If you want to listen to music and have it com...,0,if you want to listen to music and have it com...,0
2653,I love it.,1,love it,1
882,Really disappointed Alexa has to be plug-in to...,0,really disappointed alexa has to be plug in to...,0
1602,Returned from repair with No repair done. It h...,0,returned from repair with no repair done it ha...,0
2491,"I reached out to Amazon, because the device wa...",0,reached out to amazon because the device wante...,0
375,Bought for my bathroom to listen when I'm in t...,1,bought for my bathroom to listen when in the s...,1
388,Never could get it to work. A techie friend lo...,0,never could get it to work techie friend looke...,0
1537,It does not pick up my voice unless I yell mos...,1,it does not pick up my voice unless yell most ...,0
759,I use it primarily to play music. It works wo...,1,use it primarily to play music it works wonder...,1


In [33]:
# Plotting confusion matrix on the predictions

from sklearn.metrics import confusion_matrix

y_true = test_set_sample["label"]
y_pred = test_set_sample["pred_label"]

confusion_matrix(y_true, y_pred)

array([[11,  0],
       [ 1,  8]])

#### Batching API Calls (Single Shot)

In [34]:
test_set.shape

(488, 3)

In [35]:
test_set_total = test_set.sample(100)

test_set_total['pred_label'] = ''

test_set_total

Unnamed: 0,review,label,clean_reviews,pred_label
2005,Why do we need to buy a $100 hub to get it to ...,0,why do we need to buy 100 hub to get it to wor...,
2439,"Seems to work ok, but no youtube tv? Really? ...",0,seems to work ok but no youtube tv really can ...,
2628,,0,,
1379,Bought the spot and loved it. Within months it...,0,bought the spot and loved it within months it ...,
1036,Alexa hardly came on..,0,alexa hardly came on,
...,...,...,...,...
2009,"At a volume setting of half or less, the speak...",0,at volume setting of half or less the speaker ...,
2472,Nope. Still a lot to be improved. For most of ...,0,nope still lot to be improved for most of the ...,
2782,I like having more Alexa devices in my house a...,1,like having more alexa devices in my house and...,
2672,"Got this for my 2nd Echo in the house, I alrea...",1,got this for my 2nd echo in the house already ...,


In [36]:
batches = []
batch_size = 50

for i in range(0, len(test_set_total), batch_size):
  batches.append(test_set_total[i : i + batch_size])  # Append batches instead of assigning

### Batching API Calls: Gemini API

In [37]:
test_set.shape

(488, 3)

In [38]:
test_set_total = test_set.sample(100)

test_set_total['pred_label'] = ''

test_set_total

Unnamed: 0,review,label,clean_reviews,pred_label
380,"Six words, &#34;Alexa, tell me a poop joke.&#34;",1,six words 34 alexa tell me poop joke 34,
395,Great Product fast shipping,1,great product fast shipping,
1931,The alexa is awesome but when i rcieved the li...,0,the alexa is awesome but when rcieved the ligh...,
993,It's extremely useful in simple things like sp...,1,it extremely useful in simple things like spot...,
735,My husband likes being able to use it to liste...,1,my husband likes being able to use it to liste...,
...,...,...,...,...
553,Love these guys they work so great,1,love these guys they work so great,
1593,Echo Show is said to work with certain apps bu...,0,echo show is said to work with certain apps bu...,
1493,Easy set up,1,easy set up,
510,Happy with this as I was with the other 2 I or...,1,happy with this as was with the other 2 ordered,


In [39]:
batches = []
batch_size = 25

for i in range(0, len(test_set_total), batch_size):
  batches.append(test_set_total[i : i + batch_size])  # Append batches instead of assigning

In [40]:
import time

def gemini_completion_function(batch,current_batch,total_batch):
  """Function works in three steps:
  # Step-1: Convert the DataFrame to JSON using the to_json() method.
  # Step-2: Preparing the Gemini Prompt
  # Step-3: Calling Gemini API
  """

  print(f"Now processing batch#: {current_batch+1} of {total_batch}")

  json_data = batch[['clean_reviews','pred_label']].to_json(orient='records')

  prompt = f"""You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
  Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
  Customer reviews are provided between three backticks below.
  In your output, only return the Json code back as output - which is provided between three backticks.
  Your task is to update predicted labels under 'pred_label' in the Json code.
  Don't make any changes to Json code format, please.
  Error handling instruction: In case a Customer Review violates API policy, please assign it default sentiment as Negative (label=0).

  ```
  {json_data}
  ```
  """

  print(prompt)
  response = model.generate_content(prompt)
  time.sleep(5)

  return response

In [41]:
batch_count = len(batches)
responses = []

for i in range(0,len(batches)):
  responses.append(gemini_completion_function(batches[i],i,batch_count))

Now processing batch#: 1 of 4
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
  Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
  Customer reviews are provided between three backticks below.
  In your output, only return the Json code back as output - which is provided between three backticks.
  Your task is to update predicted labels under 'pred_label' in the Json code.
  Don't make any changes to Json code format, please.
  Error handling instruction: In case a Customer Review violates API policy, please assign it default sentiment as Negative (label=0).

  ```
  [{"clean_reviews":"six words 34 alexa tell me poop joke 34","pred_label":""},{"clean_reviews":"great product fast shipping","pred_label":""},{"clean_reviews":"the alexa is awesome but when rcieved the light buld did not know that need another adapter to have the light buld to work was not aware of that disappointed","pred_label

In [43]:
import json

df_total = pd.DataFrame()  # Initialize an empty DataFrame

for response in responses:
    # Clean the data by stripping the backticks
    json_data = response.text.strip("`")

    # Load the cleaned data and convert to DataFrame
    data = json.loads(json_data)
    df_temp = pd.DataFrame(data)

    # Concatenate the DataFrame to the final DataFrame
    df_total = pd.concat([df_total, df_temp], ignore_index=True)

print(df_total)  # Display the final DataFrame

                                        clean_reviews  pred_label
0             six words 34 alexa tell me poop joke 34           0
1                         great product fast shipping           1
2   the alexa is awesome but when rcieved the ligh...           0
3   it extremely useful in simple things like spot...           1
4   my husband likes being able to use it to liste...           1
..                                                ...         ...
95                 love these guys they work so great           1
96  echo show is said to work with certain apps bu...           0
97                                        easy set up           1
98    happy with this as was with the other 2 ordered           1
99  it like having another kid in the house have t...           0

[100 rows x 2 columns]


In [44]:
# prompt: Overwrite pred_label from 'df' into pred_label in 'train_set_sample'

test_set_total['pred_label'] = df_total['pred_label'].values
test_set_total

Unnamed: 0,review,label,clean_reviews,pred_label
380,"Six words, &#34;Alexa, tell me a poop joke.&#34;",1,six words 34 alexa tell me poop joke 34,0
395,Great Product fast shipping,1,great product fast shipping,1
1931,The alexa is awesome but when i rcieved the li...,0,the alexa is awesome but when rcieved the ligh...,0
993,It's extremely useful in simple things like sp...,1,it extremely useful in simple things like spot...,1
735,My husband likes being able to use it to liste...,1,my husband likes being able to use it to liste...,1
...,...,...,...,...
553,Love these guys they work so great,1,love these guys they work so great,1
1593,Echo Show is said to work with certain apps bu...,0,echo show is said to work with certain apps bu...,0
1493,Easy set up,1,easy set up,1
510,Happy with this as I was with the other 2 I or...,1,happy with this as was with the other 2 ordered,1


In [45]:
# Plotting confusion matrix on the predictions

from sklearn.metrics import confusion_matrix

y_true = test_set_total["label"]
y_pred = test_set_total["pred_label"]

confusion_matrix(y_true, y_pred)

array([[51,  1],
       [ 6, 42]])