<a href="https://colab.research.google.com/github/HabilMB/sentiment-analysis-w-Gemini/blob/main/sentiment_analysis_w_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%cd /content/drive/MyDrive/datasets/
!ls

/content/drive/MyDrive/datasets
amazon_alexa.tsv  restaurant-menus.csv	restaurants.csv


In [3]:
import pandas as pd

data = pd.read_csv('amazon_alexa.tsv', delimiter = '\t', quoting = 3)
data.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"""Sometimes while playing a game, you can answe...",1
3,5,31-Jul-18,Charcoal Fabric,"""I have had a lot of fun with this thing. My 4...",1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [4]:
mydata = data[['verified_reviews', 'feedback']]
mydata.columns = ['review', 'label']

mydata.head()

Unnamed: 0,review,label
0,Love my Echo!,1
1,Loved it!,1
2,"""Sometimes while playing a game, you can answe...",1
3,"""I have had a lot of fun with this thing. My 4...",1
4,Music,1


In [5]:
mydata.value_counts('label')

label
1    2893
0     257
dtype: int64

In [6]:
negative = mydata[mydata.label == 0]
positive = mydata[mydata.label == 1]

In [7]:
print(negative.shape)
print(positive.shape)

(257, 2)
(2893, 2)


In [8]:
#do under-sampling
positive_sample = positive.sample(n=257)

data_balanced = pd.concat([positive_sample, negative], axis=0)
print(data_balanced["label"].value_counts())

1    257
0    257
Name: label, dtype: int64


## Data Preprocessing

In [9]:
import re

def clean_text(text):
  text = re.sub(r"[^\w\s]", " ", text)
  text = re.sub(r"^\b[a-zA-Z]\b", " ", text)
  text = re.sub(r"<[^>]*>", " ", text)
  text = text.lower()
  text = re.sub(r"\s+", " ", text)
  text = text.strip()
  return text

In [10]:
#extract the review column as a list
reviews = data_balanced['review'].tolist()

#clean the text in the list
cleaned_review = [clean_text(review) for review in reviews]

data_balanced['clean_reviews'] = cleaned_review

##Data Split

In [11]:
total_rows = len(data_balanced)
test_size = int(total_rows * 0.95)
print(total_rows, test_size)

test_set = data_balanced.sample(test_size)
train_set = data_balanced.drop(test_set.index)

514 488


#set up gemini API

In [12]:
!pip instal -q -U google-generativeai

ERROR: unknown command "instal" - maybe you meant "install"


In [13]:
#import packages
import pathlib
import textwrap

import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _:True))

# Used to securely store your API key
from google.colab import userdata

In [14]:
# retrieve google gemini API key first!

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

In [15]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-pro
models/gemini-pro-vision


In [16]:
model = genai.GenerativeModel('gemini-pro')

In [17]:
# test the model

response = model.generate_content('why do 1% of the richest population has more income than the rest of the world?')

to_markdown(response.text)

> **Factors Contributing to the Disproportionate Wealth of the 1%:**
> 
> **1. Capital Accumulation:**
> 
> * The richest 1% have accumulated significant wealth through investments, stock ownership, and real estate holdings.
> * They benefit from compounding returns over time, which further increases their wealth.
> 
> **2. Economic Concentration:**
> 
> * Globalization has led to the centralization of wealth in a few large corporations and industries.
> * This allows the owners and executives of these corporations to accumulate vast fortunes.
> 
> **3. Technological Advances:**
> 
> * Automation and the growth of the digital economy have created opportunities for the wealthy to exploit technological advancements and profit from intellectual property.
> 
> **4. Government Policies:**
> 
> * Tax cuts that favor the wealthy and deregulation of financial markets have widened the income gap.
> * Subsidies for corporations and wealthy individuals further contribute to their economic advantage.
> 
> **5. Historical Legacies:**
> 
> * Inherited wealth plays a significant role in perpetuating income inequality.
> * The wealthy are more likely to pass on their wealth to their children, who then have a head start in life.
> 
> **6. Lack of Social Mobility:**
> 
> * Educational and healthcare disparities make it difficult for people from lower socioeconomic backgrounds to improve their financial situation.
> * Discrimination and structural barriers also hinder social mobility.
> 
> **7. Political Influence:**
> 
> * The wealthy have disproportionate influence over political decision-making and can shape policies that favor their interests.
> * They fund political campaigns and lobby for policies that protect their wealth.
> 
> **8. Globalized Labor Market:**
> 
> * Globalization has allowed corporations to outsource labor to low-wage countries, reducing employment opportunities and wages for workers in developed nations.
> 
> **9. Wealth Diversification:**
> 
> * The wealthy invest their wealth in multiple assets, including stocks, bonds, real estate, and precious metals.
> * This diversification reduces risk and further increases their returns.
> 
> **10. Tax Loopholes and Offshore Accounts:**
> 
> * The wealthy often exploit tax loopholes and use offshore accounts to minimize their tax liability.
> * This allows them to accumulate more wealth and avoid contributing their fair share to society.

##Single API Call

In [18]:
test_set_sample = test_set.sample(20)

test_set_sample['pred_label'] = ''

test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
962,Works great!,1,works great,
636,Very fun to use and having morning briefing,1,very fun to use and having morning briefing,
638,"""Just like new, set-up was quick & easy.""",1,just like new set up was quick easy,
1246,"""Great product and I would give 5 stars - but ...",0,great product and i would give 5 stars but you...,
2171,"""The selection is wonderful and the ease of us...",1,the selection is wonderful and the ease of use...,
2104,,1,,
1689,"""Works fine, I just realize I don’t need this ...",0,works fine i just realize i don t need this be...,
58,"""Love Alexa, bought others for friends""",1,love alexa bought others for friends,
2987,I like the compact structure of the Dot. The s...,1,like the compact structure of the dot the soun...,
1691,This is very user friendly,1,this is very user friendly,


In [28]:
json_data = test_set_sample[['clean_reviews', 'pred_label']].to_json(orient='records')

print(json_data)

[{"clean_reviews":"works great","pred_label":""},{"clean_reviews":"very fun to use and having morning briefing","pred_label":""},{"clean_reviews":"just like new set up was quick easy","pred_label":""},{"clean_reviews":"great product and i would give 5 stars but you can t scroll face cards without having the stupid 34 try and ask alexa 34 suggestions pop up yes you can have it scroll once and just stay on the clock but i like having other cards as well god its the worst and so irritating i got it super cheap so i just face the screen toward the wall and treat it like a dot instead of a spot what a dumb move on amazons part","pred_label":""},{"clean_reviews":"the selection is wonderful and the ease of use on the home page is nice i wish however it didn t keep overcoming my settings for my other devices attached to my tv i have to keep switching back after any switches in programming including my stereo","pred_label":""},{"clean_reviews":"","pred_label":""},{"clean_reviews":"works fine i 

In [29]:
prompt = f"""
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the JSON Code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the json code.
Don't make any changes to JSON code format, please.

```
{json_data}
```
"""

print(prompt)


You are an expert linguist, who is good at classifying customer review sentiments into positive/negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer Reviews are provided between three back ticks.
In your output, only return the JSON Code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the json code.
Don't make any changes to JSON code format, please.

```
[{"clean_reviews":"works great","pred_label":""},{"clean_reviews":"very fun to use and having morning briefing","pred_label":""},{"clean_reviews":"just like new set up was quick easy","pred_label":""},{"clean_reviews":"great product and i would give 5 stars but you can t scroll face cards without having the stupid 34 try and ask alexa 34 suggestions pop up yes you can have it scroll once and just stay on the clock but i like having other cards as well god its the worst and so irritating i got it super cheap

In [31]:
response = model.generate_content(prompt)

to_markdown(response.text)

> ```
> [{"clean_reviews":"works great","pred_label":1},{"clean_reviews":"very fun to use and having morning briefing","pred_label":1},{"clean_reviews":"just like new set up was quick easy","pred_label":1},{"clean_reviews":"product and i would give 5 stars but you can t face cards without having the 34 try and ask alexa 34 suggestions pop up yes you can have it once and just stay on the clock but i like having other cards as well god its the worst and so autonom i got it super cheap so i just face the screen toward the wall and treat it like a dot instead of a spot what a move on amazon's part","pred_label":0},{"clean_reviews":"the selection is wonderful and the ease of use on the home page is nice i wish however it didn t keep overcoming my settings for my other devices attached to my tv i have to keep swithing back after any in including my stero","pred_label":0},{"clean_reviews":"","pred_label":0},{"clean_reviews":"works fine i just realize i don t need this because i don t use it","pred_label":0},{"clean_reviews":"love alexa bought others for friends","pred_label":1},{"clean_reviews":"like the structure of the dot the sound is a little too low","pred_label":1},{"clean_reviews":"this is very user friendly","pred_label":1},{"clean_reviews":"i am so in this product i tried to install two different units and neither one of them were successfully completed hours with technicians resetting each of them the app didn t connect the echo plus to the internet connection tried using the application on a phone and on a chrombook no luck there are no clean on the application probably this back to mark this as not ready for prime time","pred_label":0},{"clean_reviews":"not recommend this to anyone it won t load netflix or if it does on rare occasion it won t run the program all the way tru it tells you the we are unable to process this title at this time try again later it is not the netflix site cause i can stream it just fine on my computer its the stick very disappointing","pred_label":0},{"clean_reviews":"we have a great time using this as a family even my toddler can talk to her there are a few features i wish it would learn to do but overall we love it and glad we brought it over the google home","pred_label":1},{"clean_reviews":"all is fine with the spot exact for one massive in order to turn off a repeating in the morning you pretty much have to talk which is not a thing you usually want to do when first woken up and your bed partner will definitely not like yes you can swipe up on the face to but i swipping up rather than screen when you are half is very difficult and results in setting snooze more than half the time even if you really try to swipe to and ii if you do manage to swipe up to then you completely delete your recurring this is especially as could so easily fix this by simply making the screen turn off the but not delete it or at least make it a setting to decide whether the screen for a recurring snoozes or turns it off but as the clock has been out for a year now and they still haven t adding this setting clearly not going to happen this may be the biggest product design up has ever made","pred_label":0},{"clean_reviews":"m an echo fan but this one did not work","pred_label":0},{"clean_reviews":"nope still a lot to be improved for most of the things we ask it says hmmmm i dont know that","pred_label":0},{"clean_reviews":"product but overall too many features unless you have a smart home you don t need it","pred_label":0},{"clean_reviews":"just had to have it and now play and learn how to get the most out of it","pred_label":1},{"clean_reviews":"tock a little work to set up but i finally got it sound quality not the best but for what i got it for it works great i check weather and ask a few cooking questions in the kitchen with it i do play the radio on it when i want some music but will probably to the echo plus for better sound","pred_label":1},{"clean_reviews":"h haven t gure out how to make or receieve calls device tells me i need to register and i do not know what to do","pred_label":0}]
> ```

In [32]:
import json

#clean the data by stripping the backticks
json_data = response.text.strip("`")
print(json_data)

#load the cleaned data and convert to DataFrame
data = json.loads(json_data)
df_sample = pd.DataFrame(data)

df_sample


[{"clean_reviews":"works great","pred_label":1},{"clean_reviews":"very fun to use and having morning briefing","pred_label":1},{"clean_reviews":"just like new set up was quick easy","pred_label":1},{"clean_reviews":"product and i would give 5 stars but you can t face cards without having the 34 try and ask alexa 34 suggestions pop up yes you can have it once and just stay on the clock but i like having other cards as well god its the worst and so autonom i got it super cheap so i just face the screen toward the wall and treat it like a dot instead of a spot what a move on amazon's part","pred_label":0},{"clean_reviews":"the selection is wonderful and the ease of use on the home page is nice i wish however it didn t keep overcoming my settings for my other devices attached to my tv i have to keep swithing back after any in including my stero","pred_label":0},{"clean_reviews":"","pred_label":0},{"clean_reviews":"works fine i just realize i don t need this because i don t use it","pred_l

Unnamed: 0,clean_reviews,pred_label
0,works great,1
1,very fun to use and having morning briefing,1
2,just like new set up was quick easy,1
3,product and i would give 5 stars but you can t...,0
4,the selection is wonderful and the ease of use...,0
5,,0
6,works fine i just realize i don t need this be...,0
7,love alexa bought others for friends,1
8,like the structure of the dot the sound is a l...,1
9,this is very user friendly,1


In [33]:
#overwrite pred_label from 'df' into pred_label in 'train_set_sample'
test_set_sample['pred_label'] = df_sample['pred_label'].values.astype(int)
test_set_sample

Unnamed: 0,review,label,clean_reviews,pred_label
962,Works great!,1,works great,1
636,Very fun to use and having morning briefing,1,very fun to use and having morning briefing,1
638,"""Just like new, set-up was quick & easy.""",1,just like new set up was quick easy,1
1246,"""Great product and I would give 5 stars - but ...",0,great product and i would give 5 stars but you...,0
2171,"""The selection is wonderful and the ease of us...",1,the selection is wonderful and the ease of use...,0
2104,,1,,0
1689,"""Works fine, I just realize I don’t need this ...",0,works fine i just realize i don t need this be...,0
58,"""Love Alexa, bought others for friends""",1,love alexa bought others for friends,1
2987,I like the compact structure of the Dot. The s...,1,like the compact structure of the dot the soun...,1
1691,This is very user friendly,1,this is very user friendly,1


In [34]:
#evaluate the model
from sklearn.metrics import confusion_matrix

y_true = test_set_sample["label"]
y_pred = test_set_sample["pred_label"]

confusion_matrix(y_true, y_pred)

array([[9, 0],
       [2, 9]])