# Sentiment Analysis

## Reading Datafile

In [1]:
import pandas as pd  #Importing pandas library 
df = pd.read_csv('CourseraDataset-Clean.csv') #reading a CSV file into a DataFrame
df.sample(10) # Displaying a sample of 10 rows from the DataFrame

Unnamed: 0,Course Title,Rating,Level,Schedule,What you will learn,Skill gain,Modules,Instructor,Offered By,Keyword,Course Url,Duration to complete (Approx.),Number of Review
5291,Differential Equations Part I Basic Theory,4.7,Beginner level,Flexible schedule,Not specified,Not specified,"Introduction, First Order Differential Equatio...","Kwon, Kil Hyun, Kil Hyun Kwon",Korea Advanced Institute of Science and Techno...,Math and Logic,https://www.coursera.org/learn/ordinary-differ...,14.0,1341
1188,Marketing Digital,4.8,Beginner level,Flexible schedule,Not specified,"Google Analytics, Marketing Performance Measur...","O que é marketing digital?, Tipos de canais e ...","Artur Vilas Boas, André Leme Fleury",Universidade de São Paulo,Business,https://www.coursera.org/learn/estrategia-mark...,16.0,2149
1935,Scalable Microservices for Developers Speciali...,0.0,Intermediate level,Flexible schedule,Not specified,"Spring Cloud, Spring Framework, Microservices,...","Building HTTP APIs with Spring, Microservice A...","Dr. Douglas C. Schmidt, Dr. Jules White",Vanderbilt University,Computer Science,https://www.coursera.org/specializations/micro...,120.0,0
5196,Finanzas corporativas Specialization,4.6,Intermediate level,Flexible schedule,Not specified,"Mathematical Finance, Corporate Finance, Micro...",Administración financiera y su función en la e...,"Carlos Martínez de la Fuente, Norman Wolf del ...",Universidad Nacional Autónoma de México,Math and Logic,https://www.coursera.org/specializations/finan...,80.0,1820
3318,Data Analysis with R,4.7,Intermediate level,Flexible schedule,Prepare data for analysis by handling missing ...,"Data Science, Data Analysis, Statistical Analy...","Introduction to Data Analysis with R, Data Wra...","Yiwen Li, Gabriela de Queiroz, Tiffany Zhu",IBM,DataScience,https://www.coursera.org/learn/data-analysis-w...,16.0,254
7857,Powerful Tools for Teaching and Learning: Digi...,4.5,Beginner level,Flexible schedule,Not specified,Not specified,WEEK 1: Choosing a Topic and Defining your Pu...,"Bernard R Robin, Sara G. McNeil",University of Houston,Social Sciences,https://www.coursera.org/learn/digital-storyte...,13.0,235
3199,Classical Sociological Theory,4.9,Not specified,Flexible schedule,Not specified,Not specified,Session 1: Classical Sociological Theory - An ...,"Danny de Vries, Bart van Heerikhuizen",University of Amsterdam,DataScience,https://www.coursera.org/learn/classical-socio...,13.0,2773
6895,"Pressure, Force, Motion, and Humidity Sensors",4.7,Intermediate level,Flexible schedule,"Choose the right pressure, force, strain, posi...","PSoc Programming, Analog hardware, Sensor arch...","Pressure Sensors, Force and Strain Sensors and...","James Zweighaft, Jay Mendelson",University of Colorado Boulder,Physical Science and Engineering,https://www.coursera.org/learn/pressure-force-...,23.0,224
6637,Semiconductor Packaging Specialization,4.6,Beginner level,Flexible schedule,Examine fundamental concepts and terms used in...,"Quality control in semiconductor packaging, Ma...","Introduction to Semiconductor Packaging, Semic...",Terry Alford,Arizona State University,Physical Science and Engineering,https://www.coursera.org/specializations/semic...,40.0,30
6291,Leading Through Effective Communication,0.0,Beginner level,Flexible schedule,Not specified,"Sending a message with relatability, Building ...","Start Here, Sending a message with relatabilit...",CareerCatalyst,Arizona State University,Personal Development,https://www.coursera.org/learn/leadingthroughe...,8.0,0


## Exploring Opinion Lexicon in NLTK Library

In [2]:
from sklearn import preprocessing #Importing preprocessing module from sklearn
import nltk #Importing Natural Language Toolkit library 
nltk.download('opinion_lexicon')#Downloading the opinion lexicon dataset from NLTK
from nltk.corpus import opinion_lexicon#Importing the opinion_lexicon corpus, which contains positive&negative opinion words
from nltk.tokenize import word_tokenize#Importing word_tokenize function, which is used for tokenizing words

print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))#Printing total no of words in opinion lexicon
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:10]) # Printing examples of positive words in the opinion lexicon
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:10]) # Printing examples of negative words in the opinion lexicon

Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     C:\Users\Alex\AppData\Roaming\nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


## Creation of Dictionary for Sentiment Analysis

In [4]:
# Let's create a dictionary which we can use for scoring our review text

# Downloading punkt from NLTK library
nltk.download('punkt')

# Renaming the column 'reviewText' to 'Modules' in the DataFrame
df.rename(columns={"reviewText": "Modules"}, inplace=True)

# Assigning positive and negative scores
pos_score = 1
neg_score = -1

# Initializing an empty dictionary
word_dict = {}
 
# Adding the positive words to the dictionary
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score
      
# Adding the negative words to the dictionary
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Alex\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## Calculating Sentiment Score with Bing Liu Lexicon

In [5]:
#Creating a fuunction text
def bing_liu_score(Modules):
    #Initializing the sentiment score
    sentiment_score = 0
    #Tokenizing the input text into words and convert them to lowercase
    bag_of_words = word_tokenize(Modules.lower())
     # creating loop to check each word in the bag of words
    for word in bag_of_words:
        #Checking if the word exists in the sentiment dictionary
        if word in word_dict:
            # If the word exist, adding its sentiment score to the sentiment score
            sentiment_score += word_dict[word]
    return sentiment_score  #Returning the sentiment score for the text

In [6]:
# Fill NaN values in the 'text' column
df['Modules'].fillna('no review', inplace=True)
#creating new column 'Bing_Liu_Score' to store the scores by applying  bing_liu_score to calculate sentiment scores for Module column
df['Bing_Liu_Score'] = df['Modules'].apply(bing_liu_score)

In [10]:
# Displaying the first 5 rows of the DataFrame with few columns
df[['Schedule','Rating',"Modules", 'Bing_Liu_Score']].head(5)

Unnamed: 0,Schedule,Rating,Modules,Bing_Liu_Score
0,Flexible schedule,4.8,"Introduction, Heroes, Silhouettes, Coutures, L...",1
1,Flexible schedule,4.4,"Orientation, Module 1, Module 2, Module 3, Mod...",0
2,Flexible schedule,4.5,"Week 1: Introduction to Pixel Art, Week 2: Pix...",0
3,Flexible schedule,0.0,"Semana 1, Semana 2, Semana 3, Semana 4",0
4,Flexible schedule,4.8,"Blues Progressions – Theory and Practice , Blu...",0


## Calculating Mean Sentiment Score

In [11]:
df.groupby('Schedule').agg({'Bing_Liu_Score':'mean'})

Unnamed: 0_level_0,Bing_Liu_Score
Schedule,Unnamed: 1_level_1
Flexible schedule,0.233474
Hands-on learning,0.0
