# Sentiment Analysis

In [1]:
# read keyline extracted
import pandas as pd
import re
tagset = pd.read_csv("Output/df_keyline_prison.csv")
text = ' '.join(str(x) for x in tagset['Line'])
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

# Vader

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import re
sa = SentimentIntensityAnalyzer()
sent_output = []
for doc in sentences:
    scores = sa.polarity_scores(doc)
    output = [scores['compound'],scores['neg'],scores['neu'],scores['pos'],doc]
    sent_output.append(output)
    
df = pd.DataFrame(sent_output, columns =['Score Compound', 
                                         'Score Negative','Score Neutral','Score Positive','Sentence']) 
df['Sentence'] = df['Sentence'].str.replace(r'[^\w\s]+', '')
df.to_csv("SA Model/df_Vader.csv")
df.head(5)

Unnamed: 0,Score Compound,Score Negative,Score Neutral,Score Positive,Sentence
0,0.0,0.0,1.0,0.0,jasmine and stars islamic civilization mus...
1,0.0,0.0,1.0,0.0,ernst and bruce b
2,0.0,0.0,1.0,0.0,lawrence editors j fat e m e h k es h ava r...
3,0.0,0.0,1.0,0.0,indd 3 chapel hill 83006 110211 am 2007 t...
4,0.7184,0.0,0.75,0.25,my good friends minoo riahysharifan orange cou...


In [3]:
# list top 10 positive sentences
top_10_pos = df.nlargest(10, 'Score Compound')
top_10_pos.to_csv('SA Model/Top_Pos_Vader.csv', index=False)
# list top 10 negative sentences
top_10_neg = df.nsmallest(10, 'Score Compound')
top_10_neg.to_csv('SA Model/Top_Neg_Vader.csv', index=False)
print("The top 10 positive sentences are:", top_10_pos)
print("The top 10 negative sentences are:", top_10_neg)

The top 10 positive sentences are:        Score Compound  Score Negative  Score Neutral  Score Positive  \
12417          0.9903           0.000          0.503           0.497   
13393          0.9850           0.000          0.626           0.374   
15834          0.9847           0.000          0.716           0.284   
14400          0.9795           0.037          0.615           0.348   
14431          0.9765           0.000          0.590           0.410   
21225          0.9719           0.000          0.500           0.500   
6959           0.9691           0.008          0.816           0.175   
14141          0.9689           0.000          0.772           0.228   
5574           0.9676           0.113          0.500           0.387   
6319           0.9666           0.049          0.634           0.317   

                                                Sentence  
12417  but i have grown wiser and more appreciative n...  
13393  many years ago my mother asked me if i knew wh.

# TextBlob

[Documentation Source](https://stackabuse.com/sentiment-analysis-in-python-with-textblob/)

TextBlob’s output for a polarity task is a float within the range [-1.0, 1.0] where -1.0 is a negative polarity and 1.0 is positive. This score can also be equal to 0, which stands for a neutral evaluation of a statement as it doesn’t contain any words from the training set.

Whereas, a subjectivity/objectivity identification task reports a float within the range [0.0, 1.0] where 0.0 is a very objective sentence and 1.0 is very subjective.

In [4]:
from textblob import TextBlob
import re
# Preparing an input sentence
sentence = '''The platform provides universal access to the world's best education, partnering with top universities and organizations to offer courses online.'''
sent_output = []
for doc in sentences:
    analysisPol = TextBlob(doc).polarity
    analysisSub = TextBlob(doc).subjectivity
    output = [analysisPol,analysisSub,doc]
    sent_output.append(output)
       
df = pd.DataFrame(sent_output, columns =['Polarity','Subjectivity','Sentence']) 
df['Sentence'] = df['Sentence'].str.replace(r'[^\w\s]+', '')
df.to_csv("SA Model/df_TextBlob.csv")
df.head(5)

Unnamed: 0,Polarity,Subjectivity,Sentence
0,0.0,0.0,jasmine and stars islamic civilization mus...
1,0.0,0.0,ernst and bruce b
2,0.5,0.5,lawrence editors j fat e m e h k es h ava r...
3,0.0,0.0,indd 3 chapel hill 83006 110211 am 2007 t...
4,0.418182,0.527273,my good friends minoo riahysharifan orange cou...


In [6]:
# list top 10 positive sentences
top_10_pos = df.nlargest(10, 'Polarity')
top_10_pos.to_csv('SA Model/Top_Pos_Textblob.csv', index=False)
# list top 10 negative sentences
top_10_neg = df.nsmallest(10, 'Polarity')
top_10_neg.to_csv('SA Model/Top_Neg_Textlob.csv', index=False)
print("The top 10 positive sentences are:", top_10_pos)
print("The top 10 negative sentences are:", top_10_neg)

The top 10 positive sentences are:       Polarity  Subjectivity  \
279        1.0           1.0   
798        1.0           1.0   
2297       1.0           0.3   
2886       1.0           0.3   
3331       1.0           0.3   
4028       1.0           1.0   
6276       1.0           0.3   
6355       1.0           1.0   
6626       1.0           1.0   
7342       1.0           1.0   

                                               Sentence  
279   she has always seemed perfect serving others h...  
798   he read them with authority as if they were hi...  
2297  i gave the best speech of my life there in tha...  
2886  maryam lived in one of the best an octagonal b...  
3331  60 farzanehs exceedingly unbiased viewpoint ma...  
4028  walid had listened to their arguments as a tee...  
6276  after school we were taken minas house where l...  
6355  being i  ate  to hear her talk it would seem m...  
6626  the prosecu  of my father with his greatest de...  
7342   azarmis voice could be he

# Transformer

[Documentation Source](https://huggingface.co/transformers/quicktour.html)

In [8]:
!pip install transformers

Collecting transformers
  Downloading https://files.pythonhosted.org/packages/3a/83/e74092e7f24a08d751aa59b37a9fc572b2e4af3918cb66f7766c3affb1b4/transformers-3.5.1-py3-none-any.whl (1.3MB)
Collecting sacremoses (from transformers)
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
Collecting protobuf (from transformers)
  Downloading https://files.pythonhosted.org/packages/1d/f4/089025cfa3ee62f89cae73f4d36daf46f339c6df61becfe4b24f3aeb3c0d/protobuf-3.14.0-cp37-cp37m-win_amd64.whl (798kB)
Collecting tokenizers==0.9.3 (from transformers)
  Downloading https://files.pythonhosted.org/packages/c4/eb/7391faa9651b568a233379d93e0754d4fc94498191e23d77d9ab8274a3e7/tokenizers-0.9.3-cp37-cp37m-win_amd64.whl (1.9MB)
Collecting sentencepiece==0.1.91 (from transformers)
  Downloading https://files.pythonhosted.org/packages/78/c7/fb817b7f0e8a4df1b1973a8a66c4db6fe10794a679cb3f39cd27cd1e182c/sentencepie

In [7]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to show you the 🤗 Transformers library.')

ModuleNotFoundError: No module named 'transformers'

In [None]:
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.",
...            "We hope you don't hate it."])
>>> for result in results:
...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309