# SC290: Supplementary Notebook
## LLM Sentiment Scoring Test

As LLM's are black boxes, it is worth interrogating how variable they are in the results they produce. In this notebook we run sentiment analysis multiple times, scoring the same dataset 10 times. We then join the score datasets together and examine how the scores vary per song.

In [4]:

# Everything we need in one cell

from google import genai
import pandas as pd
from pydantic import BaseModel

# from google.colab import userdata
# GEMINI_KEY = userdata.get('GOOGLE_API_KEY')
from cred import GEMINI_KEY


def get_gemini_sentiment(api: genai.Client,
                          instructions: str,
                            texts_column: pd.Series,
                              schema: BaseModel,
                                temperature: int = 0.0) -> pd.DataFrame:
    
    # we first convert our column of texts into a json formatted string
    data_json = texts_column.to_json()

    # wrap our instructions and our data in the right objects, and join them together
    prompt = [genai.types.Part(text=instructions)]
    prompt = prompt + [genai.types.Part(text=data_json)]

    # Set our configuration to specify the type of response, the schema and the temperature
    config={'response_mime_type': 'application/json',
             'response_schema': list[schema],
               'temperature': temperature}
    
    # Send our request
    response = api.models.generate_content(model="gemini-2.0-flash", contents=prompt, config=config)
    
    # Convert the result into a dataframe and return it
    results_table = pd.DataFrame([dict(record) for record in response.parsed])
    return results_table
    

# our test data
sentences = ["That sounds good.", # Positive
             "I love my new record player", # More Positive
               "I really hate it when my brother steals my things", # Negative
                 "I am a human"] # Neutral
texts_column = pd.Series(sentences)

# our instructions
n_records = len(texts_column)
instructions = f"""You are a text sentiment analyser that returns polarity scores,
  where -1 indicates negative sentiment, 0 is neutral and 1 is positive sentiment. 
  You can use decimal values. You have been provided a JSON formatted set of {len(texts_column)} records. Each record has an ID number and a sentence. 
  Analyze each provided sentence and assign it a polarity score and a label of positive, neutral or negative. Return a JSON formatted response"""

# our schema
class SentimentRecord(BaseModel):
        index:int
        polarity:float
        label:str

# Connect to the API and send our request
# api = genai.Client(api_key=GEMINI_KEY)
# table = get_gemini_sentiment(api=api,
#                              instructions=instructions,
#                              texts_column=texts_column,
#                              schema=SentimentRecord)
# table


In [5]:
lyrics_data = pd.read_parquet('lyrics_data.parquet')
lyrics_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   id                  57 non-null     int64         
 1   title               57 non-null     object        
 2   artist_names        57 non-null     object        
 3   track_release_date  57 non-null     datetime64[ns]
 4   album_release_date  57 non-null     datetime64[ns]
 5   album_name          57 non-null     object        
 6   lyrics              57 non-null     object        
dtypes: datetime64[ns](2), int64(1), object(4)
memory usage: 3.2+ KB


In [55]:
n_samples = len(lyrics_data['lyrics'])

instructions = f"""You are a song lyric sentiment classifier that returns polarity scores,
  where -1 indicates negative sentiment, 0 is neutral and 1 is positive sentiment. 
  You can use decimal values. You have been provided a JSON formatted set of {n_samples} records. Each record has an ID number and a sentence. 
  Analyze each provided sentence and assign it a polarity score and a label of positive, neutral or negative. 
  Provide your reasoning. Return a JSON formatted response""" # Just an additional sentence for reasoning.

class SentimentRecordReasoning(SentimentRecord):
    reasoning: str

api = genai.Client(api_key=GEMINI_KEY)
from time import sleep
runs = []
for i in range(0,10):
    lyric_sentiment = get_gemini_sentiment(api,instructions=instructions,texts_column=lyrics_data['lyrics'], schema=SentimentRecordReasoning, temperature=1.0)
    runs.append(lyric_sentiment)
    print(f"Run {i} Finished")
    sleep(10)

Run 0 Finished
Run 1 Finished
Run 2 Finished
Run 3 Finished
Run 4 Finished
Run 5 Finished
Run 6 Finished
Run 7 Finished
Run 8 Finished
Run 9 Finished


In [56]:
for i, df in enumerate(runs):
    df['run'] = i

df = pd.concat(runs)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 572 entries, 0 to 57
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   index      572 non-null    int64  
 1   polarity   572 non-null    float64
 2   label      572 non-null    object 
 3   reasoning  572 non-null    object 
 4   run        572 non-null    int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 26.8+ KB


In [57]:
df = df.merge(lyrics_data, left_on='index', right_index=True, how='left')
df

Unnamed: 0,index,polarity,label,reasoning,run,id,title,artist_names,track_release_date,album_release_date,album_name,lyrics
0,0,0.1,neutral,The song contains a mix of longing and frustra...,0,10484097.0,Whatcha Want,Lawrence,2024-06-21,2024-06-21,Family Business,Eyes on me (Eyes on me)\nTell me what you want...
1,1,0.7,positive,The song expresses a transformation from a pas...,0,10244330.0,Guy I Used To Be,Lawrence,2024-04-05,2024-06-21,Family Business,Hey\nOoh-hoo ho hoo whoa\nHoo-ooo goodbye (goo...
2,2,0.6,positive,The song encourages listeners to let loose and...,0,10486911.0,Do,Lawrence,2024-06-21,2024-06-21,Family Business,We got nothing to lose\nSo everybody just...\n...
3,3,-0.3,negative,The song expresses discontent with societal no...,0,10486912.0,Something In The Water,Lawrence,2024-06-21,2024-06-21,Family Business,Everybody thinks that changing is cool\nBut I ...
4,4,0.7,positive,The song is about embracing oneself and not ne...,0,10484098.0,Hip Replacement,Lawrence,2024-06-21,2024-06-21,Family Business,"Oh, I don't need a hip replacement!\nLA party,..."
...,...,...,...,...,...,...,...,...,...,...,...,...
53,53,-0.6,negative,The song expresses nostalgia and regret over t...,9,2902846.0,So Damn Fast,Lawrence,2013-01-05,2013-01-05,Homesick,Do you remember when it all began?\nIt seemed ...
54,54,0.5,positive,The song encourages listeners to embrace creat...,9,3627731.0,Oranges,Lawrence,2013-01-05,2013-01-05,Homesick,It's funny how nobody sings about oranges\nThe...
55,55,0.2,neutral,The song expresses a desire for a romantic con...,9,3627732.0,Hall-Crossed Lovers,Lawrence,2013-01-05,2013-01-05,Homesick,"Do-do, do-do\nDo-do, do-do-do-da\nDo-do, do-do..."
56,56,-0.6,negative,The song expresses a longing for another chanc...,9,3627733.0,One More Time,Lawrence,2013-01-05,2013-01-05,Homesick,"One more time\nBaby, let me hit the rewind\nOn..."


In [58]:
df.groupby('title')['polarity'].std()

title
23                                         0.094868
Alibi                                      0.069921
Almost Grown                               0.073786
And Many More                              0.290767
Casualty                                   0.052705
Circle Back                                0.042164
Cold                                       0.067495
Columbus Avenue                            0.066667
Come On, Brother                           0.097183
Conflict Resolution                        0.082327
Death of Me                                0.103280
Do                                         0.091894
Do You Wanna Do Nothing With Me?           0.000000
Don’t Lose Sight                           0.084984
Don’t Move                                 0.091894
False Alarms                               0.066667
Family Business                            0.081650
Figure It Out (A Song Between Siblings)    0.099443
Freckles                                   0.078881
Friend

In [78]:
import plotly.express as px
df['index'] = df['index'].astype('string')
df.sort_values('polarity', inplace=True, ascending=False)
fig = px.box(data_frame=df, y='index', x='polarity',
        hover_data=['title','album_name'],points='all',
          color='album_name', title='Variation in LLM Polarity Scores (10 Runs)')

fig.write_html('variations.html')
fig.show()

In general songs tend to stay on one side of 0 or the other. However there is notable variation for some songs. This should remind us to be cautious when relying on LLM produced analysis as well as the difficulties of determining text sentiment too.