**Sentiment Analysis with Large Language Model - Gemini API**


Data Preprocessing and Splitting

In [None]:
import pandas as pd

data = pd.read_csv('amazon_alexa.tsv', sep='\t')
data.head(10)

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1
5,5,31-Jul-18,Heather Gray Fabric,I received the echo as a gift. I needed anothe...,1
6,3,31-Jul-18,Sandstone Fabric,"Without having a cellphone, I cannot use many ...",1
7,5,31-Jul-18,Charcoal Fabric,I think this is the 5th one I've purchased. I'...,1
8,5,30-Jul-18,Heather Gray Fabric,looks great,1
9,5,30-Jul-18,Heather Gray Fabric,Love it! I’ve listened to songs I haven’t hear...,1


In [None]:
mydata = data[['verified_reviews','feedback']]
mydata.columns = ['review','label']

mydata.head()

Unnamed: 0,review,label
0,Love my Echo!,1
1,Loved it!,1
2,"Sometimes while playing a game, you can answer...",1
3,I have had a lot of fun with this thing. My 4 ...,1
4,Music,1


In [None]:
mydata.value_counts('label')

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,2893
0,257


In [None]:
#Balance the Data
label_counts = mydata["label"].value_counts()
rows_to_drop = label_counts.max() - label_counts.min()
if rows_to_drop > 0:
   data_majority = mydata[mydata["label"] == 1]
   data_balanced = mydata.drop(data_majority.sample(rows_to_drop).index)
else:
   data_balanced = mydata.copy()
print(data_balanced["label"].value_counts())

label
1    257
0    257
Name: count, dtype: int64


In [None]:
#Data preprocessing
import re
def clean_text(text):
    text = re.sub(r"[^\w\s]", " ", text)
    text = re.sub(r"\b[a-zA-Z]\b", " ", text)
    text = re.sub(r"<[^>]*>", " ", text)
    text = text.lower()
    text = re.sub(r"\s+", " ", text)
    text = text.strip()
    return text


In [None]:
import pandas as pd
reviews = data_balanced['review'].tolist()
cleaned_reviews = [clean_text(review) for review in reviews]
data_balanced['clean_reviews'] = cleaned_reviews

In [None]:
data_balanced

Unnamed: 0,review,label,clean_reviews
0,Love my Echo!,1,love my echo
3,I have had a lot of fun with this thing. My 4 ...,1,have had lot of fun with this thing my 4 yr ol...
46,"It's like Siri, in fact, Siri answers more acc...",0,it like siri in fact siri answers more accurat...
52,Works as you’d expect and then some. Also good...,1,works as you expect and then some also good so...
60,😍,1,
...,...,...,...
3091,I didn’t order it,0,didn order it
3096,The product sounded the same as the emoji spea...,0,the product sounded the same as the emoji spea...
3121,I like the hands free operation vs the Tap. We...,1,like the hands free operation vs the tap we us...
3135,I loved it does exactly what it says,1,loved it does exactly what it says


In [None]:
#Data split
import pandas as pd
total_rows = len(data_balanced)
test_size = int(total_rows * 0.95)
test_set = data_balanced.sample(test_size)
train_set = data_balanced.drop(test_set.index)

Setting up Gemini API

In [None]:
!pip install -q -U google-generativeai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/153.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m112.6/153.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.4/153.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/760.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m757.8/760.0 kB[0m [31m27.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m760.0/760.0 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

from google.colab import userdata

In [None]:

Secret_key=userdata.get('Secret_key')

genai.configure(api_key=Secret_key)

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-1.5-flash-002


In [None]:
model = genai.GenerativeModel('gemini-pro')

In [None]:
response = model.generate_content("What is the meaning of life?")

response


response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "There is no one definitive answer to the question \"What is the meaning of life?\" as it is a deeply personal and subjective matter that varies from individual to individual. However, some common themes that emerge in people's reflections on this question include:\n\n* **Purpose:** Many people find meaning in their lives by identifying a purpose or goal that they work towards. This could be anything from raising a family to starting a business to making a difference in the world.\n* **Relationships:** Building strong relationships with family, friends, and loved ones is another important source of meaning for many people. These relationships provide us with support, love, and a sense of belonging.\n* **Growth and learning:** Continuously striving to learn, g

In [None]:
to_markdown(response.candidates[0].content.parts[0].text)


> There is no one definitive answer to the question "What is the meaning of life?" as it is a deeply personal and subjective matter that varies from individual to individual. However, some common themes that emerge in people's reflections on this question include:
> 
> * **Purpose:** Many people find meaning in their lives by identifying a purpose or goal that they work towards. This could be anything from raising a family to starting a business to making a difference in the world.
> * **Relationships:** Building strong relationships with family, friends, and loved ones is another important source of meaning for many people. These relationships provide us with support, love, and a sense of belonging.
> * **Growth and learning:** Continuously striving to learn, grow, and expand our horizons can also give our lives meaning. This can involve pursuing education, developing new skills, or simply seeking out new experiences.
> * **Making a contribution:** Feeling like we are making a positive contribution to society or the world can give our lives meaning. This could involve volunteering our time, donating to charity, or simply being a good friend or neighbor.
> * **Finding inner peace and happiness:** Ultimately, many people find the most lasting meaning in their lives by cultivating inner peace and happiness. This can involve practicing mindfulness, meditation, or simply spending time in nature.
> 
> It is important to note that the meaning of life is not something that is fixed or static. It can change and evolve as we grow and experience life. There is no right or wrong answer, and what gives one person meaning may not give another. The most important thing is to find what gives you a sense of purpose, fulfillment, and joy.

In [None]:
test_set_sample = test_set.sample(20)

test_set_sample['pred_label'] = ''

test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2823,Nope. Still a lot to be improved. For most of ...,0,nope still lot to be improved for most of the ...,
2337,"The first stick is a solid, entry level, devic...",1,the first stick is solid entry level device an...,
1822,It works perfect!,1,it works perfect,
2812,I am quite disappointed by this product.There ...,0,am quite disappointed by this product there cl...,
3096,The product sounded the same as the emoji spea...,0,the product sounded the same as the emoji spea...,
374,,0,,
1612,"Great device, features are awesome, the intera...",0,great device features are awesome the interact...,
2697,NOT CONNECTED TO MY PHONE PLAYLIST :(,0,not connected to my phone playlist,
398,Dont trust this....,0,dont trust this,
799,"LOVE, LOVE this new little gadget. Has made ...",1,love love this new little gadget has made our ...,


In [None]:
json_data = test_set_sample[['clean_reviews','pred_label']].to_json(orient='records')
print(json_data)

[{"clean_reviews":"nope still lot to be improved for most of the things we ask it says hmmmm dont know that","pred_label":""},{"clean_reviews":"the first stick is solid entry level device and does what it supposed to do","pred_label":""},{"clean_reviews":"it works perfect","pred_label":""},{"clean_reviews":"am quite disappointed by this product there clearly is bug half of the time ask for newsflash two second after it starts the whole thing stops and reboots for couple of minutes same goes when ask to play the radio also many times it get confused when ask to switch of 34 the lights 34 as other items are also named the same which is not true clearly no ai or natural language very beta stage still for me not impressed","pred_label":""},{"clean_reviews":"the product sounded the same as the emoji speaker from five below my sister has and even that one has bluetooth and doesn need to be plugged in the only good thing about this is that you can speak to it","pred_label":""},{"clean_reviews

Using the Gemini API to Classify Sentiments

In [None]:
prompt = f"""
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
{json_data}
```
"""

print(prompt)


You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.

```
[{"clean_reviews":"nope still lot to be improved for most of the things we ask it says hmmmm dont know that","pred_label":""},{"clean_reviews":"the first stick is solid entry level device and does what it supposed to do","pred_label":""},{"clean_reviews":"it works perfect","pred_label":""},{"clean_reviews":"am quite disappointed by this product there clearly is bug half of the time ask for newsflash two second after it starts the whole thing stops and reboots for couple of minutes same goes when ask to play

In [None]:
Single API Call

In [None]:
response = model.generate_content(prompt)

print(response.candidates[0].content.parts[0].text)

```
[{"clean_reviews":"nope still lot to be improved for most of the things we ask it says hmmmm dont know that","pred_label":0},{"clean_reviews":"the first stick is solid entry level device and does what it supposed to do","pred_label":1},{"clean_reviews":"it works perfect","pred_label":1},{"clean_reviews":"am quite disappointed by this product there clearly is bug half of the time ask for newsflash two second after it starts the whole thing stops and reboots for couple of minutes same goes when ask to play the radio also many times it get confused when ask to switch of 34 the lights 34 as other items are also named the same which is not true clearly no ai or natural language very beta stage still for me not impressed","pred_label":0},{"clean_reviews":"the product sounded the same as the emoji speaker from five below my sister has and even that one has bluetooth and doesn need to be plugged in the only good thing about this is that you can speak to it","pred_label":0},{"clean_reviews"

In [None]:
import json
json_data = response.text.strip("`")
data = json.loads(json_data)
df_sample = pd.DataFrame(data)

df_sample

Unnamed: 0,clean_reviews,pred_label
0,nope still lot to be improved for most of the ...,0
1,the first stick is solid entry level device an...,1
2,it works perfect,1
3,am quite disappointed by this product there cl...,0
4,the product sounded the same as the emoji spea...,0
5,,0
6,great device features are awesome the interact...,0
7,not connected to my phone playlist,0
8,dont trust this,0
9,love love this new little gadget has made our ...,1


In [None]:
test_set_sample['pred_label'] = df_sample['pred_label'].values
test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
2823,Nope. Still a lot to be improved. For most of ...,0,nope still lot to be improved for most of the ...,0
2337,"The first stick is a solid, entry level, devic...",1,the first stick is solid entry level device an...,1
1822,It works perfect!,1,it works perfect,1
2812,I am quite disappointed by this product.There ...,0,am quite disappointed by this product there cl...,0
3096,The product sounded the same as the emoji spea...,0,the product sounded the same as the emoji spea...,0
374,,0,,0
1612,"Great device, features are awesome, the intera...",0,great device features are awesome the interact...,0
2697,NOT CONNECTED TO MY PHONE PLAYLIST :(,0,not connected to my phone playlist,0
398,Dont trust this....,0,dont trust this,0
799,"LOVE, LOVE this new little gadget. Has made ...",1,love love this new little gadget has made our ...,1


In [None]:
# Plotting confusion matrix on the predictions

from sklearn.metrics import confusion_matrix

y_true = test_set_sample["label"]
y_pred = test_set_sample["pred_label"]

confusion_matrix(y_true, y_pred)

array([[10,  0],
       [ 1,  9]])

Batching API Calls

In [None]:
test_set.shape

(488, 3)

In [None]:
test_set_total = test_set.sample(100)

test_set_total['pred_label'] = ''

test_set_total

Unnamed: 0,review,label,clean_reviews,pred_label
2472,Nope. Still a lot to be improved. For most of ...,0,nope still lot to be improved for most of the ...,
668,It's ok. The speaker is pretty terrible. Googl...,0,it ok the speaker is pretty terrible google ho...,
2163,Puts the pep back in my old TV. All of the so...,1,puts the pep back in my old tv all of the soft...,
1571,All of my echo devices stopped communicating p...,0,all of my echo devices stopped communicating p...,
341,Alexa hardly came on..,0,alexa hardly came on,
...,...,...,...,...
2696,Echo Dot responds to us when we aren't even ta...,0,echo dot responds to us when we aren even talk...,
368,I returned 2 Echo Dots & am only getting refun...,0,returned 2 echo dots am only getting refund fo...,
1386,Invasive and scared the crap out of me for spe...,0,invasive and scared the crap out of me for spe...,
1814,I decided to buy smart door lock and decided o...,0,decided to buy smart door lock and decided on ...,


In [None]:
batches = []
batch_size = 25

for i in range(0, len(test_set_total), batch_size):
  batches.append(test_set_total[i : i + batch_size])  # Append batches instead of assigning

In [None]:
import time

def gemini_completion_function(batch,current_batch,total_batch):
  """Function works in three steps:
  # Step-1: Convert the DataFrame to JSON using the to_json() method.
  # Step-2: Preparing the Gemini Prompt
  # Step-3: Calling Gemini API
  """

  print(f"Now processing batch#: {current_batch+1} of {total_batch}")

  json_data = batch[['clean_reviews','pred_label']].to_json(orient='records')

  prompt = f"""You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
  Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
  Customer reviews are provided between three backticks below.
  In your output, only return the Json code back as output - which is provided between three backticks.
  Your task is to update predicted labels under 'pred_label' in the Json code.
  Don't make any changes to Json code format, please.
  Error handling instruction: In case a Customer Review violates API policy, please assign it default sentiment as Negative (label=0).

  ```
  {json_data}
  ```
  """

  print(prompt)
  response = model.generate_content(prompt)
  time.sleep(5)

  return response

In [None]:
batch_count = len(batches)
responses = []

for i in range(0,len(batches)):
  responses.append(gemini_completion_function(batches[i],i,batch_count))

Now processing batch#: 1 of 4
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
  Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
  Customer reviews are provided between three backticks below.
  In your output, only return the Json code back as output - which is provided between three backticks.
  Your task is to update predicted labels under 'pred_label' in the Json code.
  Don't make any changes to Json code format, please.
  Error handling instruction: In case a Customer Review violates API policy, please assign it default sentiment as Negative (label=0).

  ```
  [{"clean_reviews":"nope still lot to be improved for most of the things we ask it says hmmmm dont know that","pred_label":""},{"clean_reviews":"it ok the speaker is pretty terrible google home is better product","pred_label":""},{"clean_reviews":"puts the pep back in my old tv all of the software was expired and enjoy netflix a

In [None]:
import json
import pandas as pd


df_list = []

for response in responses:
    json_data = response.candidates[0].content.parts[0].text.strip("`")
    data = json.loads(json_data)
    df_temp = pd.DataFrame(data)
    df_list.append(df_temp)

df_total = pd.concat(df_list, ignore_index=True)
print(df_total)


                                        clean_reviews  pred_label
0   nope still lot to be improved for most of the ...           0
1   it ok the speaker is pretty terrible google ho...           0
2   puts the pep back in my old tv all of the soft...           1
3   all of my echo devices stopped communicating p...           0
4                                alexa hardly came on           0
..                                                ...         ...
95  echo dot responds to us when we aren even talk...           0
96  returned 2 echo dots am only getting refund fo...           0
97  invasive and scared the crap out of me for spe...           0
98  decided to buy smart door lock and decided on ...           0
99  was loving it but starting in june hulu stoppe...           0

[100 rows x 2 columns]


In [None]:
test_set_total['pred_label'] = df_total['pred_label'].values
test_set_total

Unnamed: 0,review,label,clean_reviews,pred_label
2472,Nope. Still a lot to be improved. For most of ...,0,nope still lot to be improved for most of the ...,0
668,It's ok. The speaker is pretty terrible. Googl...,0,it ok the speaker is pretty terrible google ho...,0
2163,Puts the pep back in my old TV. All of the so...,1,puts the pep back in my old tv all of the soft...,1
1571,All of my echo devices stopped communicating p...,0,all of my echo devices stopped communicating p...,0
341,Alexa hardly came on..,0,alexa hardly came on,0
...,...,...,...,...
2696,Echo Dot responds to us when we aren't even ta...,0,echo dot responds to us when we aren even talk...,0
368,I returned 2 Echo Dots & am only getting refun...,0,returned 2 echo dots am only getting refund fo...,0
1386,Invasive and scared the crap out of me for spe...,0,invasive and scared the crap out of me for spe...,0
1814,I decided to buy smart door lock and decided o...,0,decided to buy smart door lock and decided on ...,0


In [None]:
# Plotting confusion matrix on the predictions

from sklearn.metrics import confusion_matrix

y_true = test_set_total["label"]
y_pred = test_set_total["pred_label"]

confusion_matrix(y_true, y_pred)

array([[58,  0],
       [ 2, 40]])