https://www.nltk.org/howto/sentiment.html

In [59]:
import pandas as pd
reviews_df = pd.read_csv("data/BA_reviews.csv")

# data and structure overview
# reviews_df.head()
# reviews_df.shape

# drop empty reviews
reviews_df = reviews_df.dropna(subset = ['reviews'])

# reviews_df.iloc[0,1]

#Let's only include verified reviews
verified_df = reviews_df[reviews_df.iloc[:,1].str.contains("Trip Verified")]
# verified_df.shape

#remove the verified
verified_df.loc[:,'reviews'] = verified_df.loc[:,'reviews'].apply(lambda x: x[18:])
# verified_df.head()

In [64]:
# Idea: tokenize the words, add negations to negative words so that it is more obvious to the nlp libaries
import nltk
from nltk.tokenize import word_tokenize #I don't know if I need to tokenize??
from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm import tqdm
# nltk.download('punkt')

sia = SentimentIntensityAnalyzer()

In [65]:
#I think VADER SIA just doesn't have many words so I'll need to add some
# nevermind, it doesn't seem to help much
# new_words = {
#     'had no choice' : -1.0,
#     'poor product' : -2.0,
#     'WORST' : -3.5,
#     'pleasant' : 0.75,
#     'pleasantly' : 0.75,
#     'cramped' : -2.0,
#     'nothing' : -0.1,
#     'delicious' : 3.0,
#     'not nice' : -2.0,
#     'very nice' : 2.0,
# }

# sia.lexicon.update(new_words)

In [66]:
#### Using NLTK's VADER SIA
# the magnitude of positive ones are very high here
sentiments = []

for i in tqdm(range(len(verified_df.loc[:,'reviews'])), desc = "looking through reviews"):
    sentiment_score = sia.polarity_scores(verified_df.iloc[i,1])['compound']
    sentiments.append(sentiment_score)
    
verified_df['sentiments'] = sentiments

looking through reviews: 100%|████████████| 1174/1174 [00:00<00:00, 1749.48it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  verified_df['sentiments'] = sentiments


In [67]:
#### Using textblob
# the magnitude of each ine is closer to 0 than most
from textblob import TextBlob

sentiments = []

for i in tqdm(range(len(verified_df.loc[:,'reviews'])), desc = "looking through reviews"):
    blob = TextBlob(verified_df.iloc[i,1])
    sentiment_score = blob.sentiment.polarity
    sentiments.append(sentiment_score)
    
verified_df['sentiments'] = sentiments

looking through reviews: 100%|████████████| 1174/1174 [00:00<00:00, 3081.85it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  verified_df['sentiments'] = sentiments


In [77]:
#### Using spacy
# significantly slower and keeps giving me 0
import spacy
# !python -m spacy download en_core_web_lg
# !python -m spacy download en_core_web_trf

nlp = spacy.load("en_core_web_lg")

sentiments = []

for i in tqdm(range(len(verified_df.loc[:,'reviews'])), desc = "looking through reviews"):
    doc = nlp(str(verified_df.iloc[i,1]))
    sentiment_score = doc.sentiment
    sentiments.append(sentiment_score)
    
verified_df['sentiments'] = sentiments

looking through reviews: 100%|██████████████| 1174/1174 [00:30<00:00, 38.10it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  verified_df['sentiments'] = sentiments


In [78]:
verified_df.head()
for i in range(5):
    print(verified_df.iloc[i,1],verified_df.iloc[i,2])

 For the price paid (bought during a sale) it was a decent experience although the club class (business class) seats offer no more legroom than economy class (using short-haul fleet on a 4 hour flight). Fast track through security was not honoured. The lounge at Istanbul airport was over-crowded as it is also open to the public who can pay for usage, causing a long queue for entry , which was badly organised. Boarding was smooth, cabin crew were friendly but their service was hit-and-miss. Eg. Some people got a “welcome” and some didn’t; Half of the cabin was automatically offered coffee after dinner but not the other half. However, drinks were replenished generously and regularly and the meal was good (with a choice of three mains from the menu). 0.0
 Flight left on time and arrived over half an hour earlier than scheduled (thanks to strong tail winds). The flight was full but no catering had been loaded for economy passenger at Heathrow so for 3 and a half hours we only had a small b