## Analyzing CSA's Lessons Learned | Analyse des leçons apprises par l'ASC 
### Step 2: Sentiment Analysis | Étape 2 : Analyse des sentiments  
This notebook takes the translated lessons learned from step 1 and applies VADER sentiment analysis. 
This workflow could be adapted to any other spreadsheet or csv. 

Ce cahier reprend les leçons traduites de l'étape 1 et applique l'analyse des sentiments VADER. 
Ce workflow pourrait être adapté à tout autre tableur ou csv. 

Author/Auteur: N Fee, Canadian Space Agency/Agence spatiale canadienne, 2021-06-18 


In [1]:
import nltk #NLP library
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer #implementation of the VADER model 

Inputs - files and variables

In [2]:
infile = "2_Output/LessonsLearned_step1.xlsx" #Excel file produced in step 1 (ie. containing unilingual text columns)
outfile = "2_Output/LessonsLearned_step2.xlsx" 

stopwords_en = nltk.corpus.stopwords.words("english") #words that should be removed from the text prior to analysis. 

lesson_colname = 'Lessons EN' #Column containing the unilingual English lessons learned text  

In [3]:
#Functions

def sentiment_intensity(row, colname):
    sid = SentimentIntensityAnalyzer() #Initiate the sentiment analysis model 
    scores = sid.polarity_scores(row[colname]) #apply the model to the cell of interest (as determined by the row and colname)
    return scores


In [4]:
#Read the Excel file produced in step 1 into a dataframe 
df = pd.read_excel(infile)

In [5]:
#Apply the sentiment analysis function to the dataframe
scores = df.apply(lambda row: sentiment_intensity(row,lesson_colname), axis=1)
#This produces a column with scores (ie. how likely the text is to be negative, positive, or neutral - with 1 indicating absolute certainty. It also includes a compound score where -1 is very negative, 0 is neutral, and +1 is very positive)
scores_df = scores.to_frame(name ='scores') #easiest way of adding a column name 
print(scores_df)

                                               scores
0   {'neg': 0.0, 'neu': 0.932, 'pos': 0.068, 'comp...
1   {'neg': 0.013, 'neu': 0.822, 'pos': 0.164, 'co...
2   {'neg': 0.0, 'neu': 0.95, 'pos': 0.05, 'compou...
3   {'neg': 0.0, 'neu': 0.86, 'pos': 0.14, 'compou...
4   {'neg': 0.0, 'neu': 0.922, 'pos': 0.078, 'comp...
5   {'neg': 0.0, 'neu': 0.858, 'pos': 0.142, 'comp...
6   {'neg': 0.032, 'neu': 0.698, 'pos': 0.27, 'com...
7   {'neg': 0.0, 'neu': 0.82, 'pos': 0.18, 'compou...
8   {'neg': 0.0, 'neu': 0.977, 'pos': 0.023, 'comp...
9   {'neg': 0.04, 'neu': 0.734, 'pos': 0.226, 'com...
10  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
11  {'neg': 0.0, 'neu': 0.875, 'pos': 0.125, 'comp...
12  {'neg': 0.0, 'neu': 0.885, 'pos': 0.115, 'comp...
13  {'neg': 0.0, 'neu': 0.923, 'pos': 0.077, 'comp...
14  {'neg': 0.0, 'neu': 0.914, 'pos': 0.086, 'comp...
15  {'neg': 0.0, 'neu': 0.977, 'pos': 0.023, 'comp...
16  {'neg': 0.0, 'neu': 0.966, 'pos': 0.034, 'comp...
17  {'neg': 0.031, 'neu': 0.

In [6]:
#Split the scores into their own columns in a new dataframe for ease of analysis
scores_df = scores_df['scores'].apply(pd.Series)
scores_df = scores_df.reindex()
print(scores_df)

      neg    neu    pos  compound
0   0.000  0.932  0.068    0.6369
1   0.013  0.822  0.164    0.9184
2   0.000  0.950  0.050    0.5095
3   0.000  0.860  0.140    0.8225
4   0.000  0.922  0.078    0.6705
5   0.000  0.858  0.142    0.7003
6   0.032  0.698  0.270    0.9072
7   0.000  0.820  0.180    0.8934
8   0.000  0.977  0.023    0.3818
9   0.040  0.734  0.226    0.8934
10  0.000  1.000  0.000    0.0000
11  0.000  0.875  0.125    0.8126
12  0.000  0.885  0.115    0.9245
13  0.000  0.923  0.077    0.5994
14  0.000  0.914  0.086    0.7003
15  0.000  0.977  0.023    0.0772
16  0.000  0.966  0.034    0.2263
17  0.031  0.758  0.211    0.7783
18  0.000  0.815  0.185    0.7717


In [7]:
#Add the scores dataframe to the existing lessons learned dataframe
df1 = pd.concat([df,scores_df], axis =1)
print(df1)

    Unnamed: 0                                     Lesson Learned Language  \
0            0  I am honoured to present the State of the Cana...       en   
1            1  Les efforts de soutien à l’innovation et de co...       fr   
2            2  There are several differing programs and servi...       en   
3            3  Encourager le lancement de nouvelles entrepris...       fr   
4            4  Research and development (R&D) expenditures to...       en   
5            5  The plan proposes a commitment to fund the ini...       en   
6            6  Le Réseau d’innovation spatial canadien (RISC)...       fr   
7            7  To better reflect the current best practices a...       en   
8            8  Le plan opérationnel décrit les détails sur le...       fr   
9            9  Le gouvernement du Canada appuie depuis longte...       fr   
10          10  In order to measure the changes taking place i...       en   
11          11  Parallèlement aux efforts concertés avec l’OCD..

In [8]:
df1 = df1.rename(columns = {'neg': 'Negative Sentiment','neu':'Neutral Sentiment','pos':'Positive Sentiment','compound':'Compound Sentiment Score'}) #some better column names
df1.to_excel(outfile,index=False, encoding="utf-8") #write the results to a new file