# **Introduction: Sentiment analysis & GPT-3 API**

##### As part of the NLP field, sentiment analysis is used to determine if a word is positive, negative or neutral. Using **VADER** (Valence Aware Dictionary and Sentiment Reasoner), an English-language sentiment analysis tool, we will be studying a database of adjectives determined randomly (n=1134) via the website : www.randomlists.com. Following the sentiment analysis, we will use the GPT-3 pre-trained model to see if certain categories of people (for example : women, men, disabled people etc...) are more associated with neutral, negative or positive words, using word embeddings. The second notebook, with our use of the GPT-3 API can be accessed here : 

The database of adjectives can be download here : https://github.com/Marine-DUPUIS/Decoding-Biases-in-AI---GPT3, under the name "sentiment analysis.csv"


## *I. Sentiment analysis*

In [2]:
from google.colab import drive #mounting google drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
pip install vaderSentiment #import of the VADER package

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [155]:
import pandas as pd  #importing libraires
import csv

df = pd.read_csv("/content/drive/MyDrive/sentiment analysis.csv") #import of the dataset (csv file)

In [156]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer #importing VADER, sentiment analysis tool
sentimentAnalyser = SentimentIntensityAnalyzer() 

In [157]:
with open("/content/drive/MyDrive/sentiment analysis.csv",newline='') as f:    #creating a list with all rows of the dataset
    words=[]
    lire=csv.reader(f)                             
    print('',end='\n')
    for ligne in lire:                            
        words.append(ligne)                     




In [158]:
for element in words:
    # Run VADER on each sentence
    sid = SentimentIntensityAnalyzer()
    ss = sid.polarity_scores(element)

    # Print scores for each sentence
    print(f"""'{element}' \n
🙁 Negative Sentiment: {ss['neg']} \n  
😐 Neutral Sentiment: {ss['neu']} \n
😀 Positive Sentiment: {ss['pos']} \n
✨ Compound Sentiment: {ss['compound']} \n
--- \n""")

[1;30;43mLe flux de sortie a été tronqué et ne contient que les 5000 dernières lignes.[0m
😐 Neutral Sentiment: 0.0 

😀 Positive Sentiment: 0.0 

✨ Compound Sentiment: -0.3612 

--- 

'['rare']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 1.0 

😀 Positive Sentiment: 0.0 

✨ Compound Sentiment: 0.0 

--- 

'['inexpensive']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 1.0 

😀 Positive Sentiment: 0.0 

✨ Compound Sentiment: 0.0 

--- 

'['relieved']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 0.0 

😀 Positive Sentiment: 1.0 

✨ Compound Sentiment: 0.3818 

--- 

'['good']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 0.0 

😀 Positive Sentiment: 1.0 

✨ Compound Sentiment: 0.4404 

--- 

'['panoramic']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 1.0 

😀 Positive Sentiment: 0.0 

✨ Compound Sentiment: 0.0 

--- 

'['earsplitting']' 

🙁 Negative Sentiment: 0.0 
  
😐 Neutral Sentiment: 1.0 

😀 Positive Sentiment: 0.0 

✨ Compound Sentiment: 0.0 


In [159]:
df #overview of the dataset

Unnamed: 0,Adjectives
0,dizzy
1,abusive
2,somber
3,guarded
4,materialistic
...,...
1128,petite
1129,fertile
1130,tiresome
1131,grateful


In [160]:
    #for each adjective in each row of the dataset , we iterate by using VADER 
    # we print the score of the adjective concerned (whether it is positive, negative, or neutral)
for element in words: 
    sentiment_scores = sentimentAnalyser.polarity_scores(element)
    ss = sid.polarity_scores(element)
    positive = ss['pos']
    negative = ss['neg']
    neutral = ss['neu']
    total_weight = ss['compound']
    print("Total weight: {0}, Negative: {1}, Neutral: {2}, Positive: {3}".format(total_weight, positive, negative, neutral)) 

Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: -0.2263, Negative: 0.0, Neutral: 1.0, Positive: 0.0
Total weight: -0.6369, Negative: 0.0, Neutral: 1.0, Positive: 0.0
Total weight: -0.4215, Negative: 0.0, Neutral: 1.0, Positive: 0.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.2023, Negative: 1.0, Neutral: 0.0, Positive: 0.0
Total weight: -0.5267, Negative: 0.0, Neutral: 1.0, Positive: 0.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.0, Negative: 0.0, Neutral: 0.0, Positive: 1.0
Total weight: 0.4588, Negative: 1.0, Neutral: 0.0, 

In [162]:
# creation and filling of new columns with the polarity scores calculed by VADER (if the word is positive,neg,neu)
df['scores']=df['Adjectives'].apply(lambda Adjectives: sid.polarity_scores(str(Adjectives)))
df['compound']=df['scores'].apply(lambda score_dict:score_dict['compound'])
df['pos']=df['scores'].apply(lambda pos_dict:pos_dict['pos'])
df['neg']=df['scores'].apply(lambda neg_dict:neg_dict['neg'])
df['neu']=df['scores'].apply(lambda neg_dict:neg_dict['neu'])
# creation and filling of the "type" column (POS if it is a positive word, NEG if it is a negative word, NEU if it is a neutral word)
df['type']=''
df.loc[df.compound>0,'type']='POS'
df.loc[df.compound==0,'type']='NEUTRAL'
df.loc[df.compound<0,'type']='NEG'

In [163]:
df #overview of the actualised dataset

Unnamed: 0,Adjectives,scores,compound,pos,neg,neu,type
0,dizzy,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",-0.2263,0.0,1.0,0.0,NEG
1,abusive,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",-0.6369,0.0,1.0,0.0,NEG
2,somber,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",-0.4215,0.0,1.0,0.0,NEG
3,guarded,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,0.0,0.0,1.0,NEUTRAL
4,materialistic,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,0.0,0.0,1.0,NEUTRAL
...,...,...,...,...,...,...,...
1128,petite,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,0.0,0.0,1.0,NEUTRAL
1129,fertile,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,0.0,0.0,1.0,NEUTRAL
1130,tiresome,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,0.0,0.0,1.0,NEUTRAL
1131,grateful,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.4588,1.0,0.0,0.0,POS


In [152]:
count = df[(df["type"] == "POS")] #counting the number of positive words in the dataset
count

Unnamed: 0,Adjectives,scores,type,compound,pos,neg,neu
8,important,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.2023,1.0,0.0,0.0
14,clever,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.4588,1.0,0.0,0.0
19,solid,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.1531,1.0,0.0,0.0
34,intelligent,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.4588,1.0,0.0,0.0
44,romantic,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.4019,1.0,0.0,0.0
...,...,...,...,...,...,...,...
1103,courageous,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.5267,1.0,0.0,0.0
1110,substantial,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.2023,1.0,0.0,0.0
1112,overjoyed,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.5719,1.0,0.0,0.0
1113,fair,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",POS,0.3182,1.0,0.0,0.0


In [153]:
count1 = df[(df["type"] == "NEUTRAL")]  #counting the number of neutral words in the dataset
count1

Unnamed: 0,Adjectives,scores,type,compound,pos,neg,neu
3,guarded,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
4,materialistic,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
5,reminiscent,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
6,craven,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
7,spiffy,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...
1127,unusual,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
1128,petite,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
1129,fertile,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0
1130,tiresome,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",NEUTRAL,0.0,0.0,0.0,1.0


In [154]:
count2 = df[(df["type"] == "NEG")]  #counting the number of negative words in the dataset
count2

Unnamed: 0,Adjectives,scores,type,compound,pos,neg,neu
0,dizzy,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.2263,0.0,1.0,0.0
1,abusive,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.6369,0.0,1.0,0.0
2,somber,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.4215,0.0,1.0,0.0
9,hurt,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.5267,0.0,1.0,0.0
16,annoying,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.4019,0.0,1.0,0.0
...,...,...,...,...,...,...,...
1096,useless,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.4215,0.0,1.0,0.0
1097,disgusted,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.5267,0.0,1.0,0.0
1104,panicky,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.3612,0.0,1.0,0.0
1111,aggressive,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",NEG,-0.1531,0.0,1.0,0.0
