# Sentiment Analysis on User Review on RBC app on Google playstore

The code below downloads a lexicon and saves it in a Python dictionary called `lexicon`.

In [None]:
import urllib.request, json
with urllib.request.urlopen("https://storage.googleapis.com/wd13/lexicon.txt") as url:
  lexicon_file = url.read().decode()
lexicon = {}
for line in lexicon_file.split('\n'):
  split_line = line.split('\t')
  token = split_line[0]
  score = float(split_line[1])
  lexicon[token] = score

The `lexicon` dictionary contains entries for approximately 7500 tokens that have either positive or negative sentiment. Each token is a key and the value is the sentiment score. Positive scores imply positive sentiment, negative scores imply negative sentiment. The further from zero the score, the more extreme the sentiment.

"good" has a score of 1.9.

In [None]:
lexicon['good']

1.9

"great" has a score of 3.1.

In [None]:
lexicon['great']

3.1

"bad" has a score of -2.5.

In [None]:
lexicon['bad']

-2.5

This function that takes a string and returns a sentiment score based on the lexicon downloaded above.

In [None]:
# put your answer here
import re

def sentiment_score(doc):
  tokens = re.findall('[A-Za-z0-9]+',doc)
  for i in range(len(tokens)):
    tokens[i] = tokens[i].lower()
    score=0
    for token in tokens:
      if token in lexicon:
        score += lexicon[token]
      return score

print(sentiment_score('good idea'))

1.9


Installing the google-play-scraper library.

In [None]:
!pip install google_play_scraper

Collecting google_play_scraper
  Downloading google_play_scraper-1.2.6-py3-none-any.whl (28 kB)
Installing collected packages: google_play_scraper
Successfully installed google_play_scraper-1.2.6


Importing the google-play-scraper library.

In [None]:
import google_play_scraper

 Variable `appid` stores app id for RBC app on the Google Play Store.

In [None]:
appid = 'com.rbc.mobile.android'

Downloading all available reviews and store them in the variable `rbc_reviews`.

In [None]:
rbc_reviews = google_play_scraper.reviews_all(
    appid,
    lang ='en',
    country ='ca'
)

Adding a `sentiment_score` to each review using the function defined above.

In [None]:
for review in rbc_reviews:
  review_text = review['content']
  if review_text:
    score = sentiment_score(review_text)
  else:
    score = 0
  review['sentiment_score'] = score

Adding a `sentiment_flag` variable to each review. It should be equal to 'pos' if the `sentiment_score` is greater than 0, 'neg' if the `sentiment_score` is less than 0, and 'neu' if the `sentiment_score` is equal to 0.

In [None]:
for review in rbc_reviews:
  if review['sentiment_score'] >0:
    review['sentiment_flag'] ='pos'
  elif review['sentiment_score'] <0:
    review['sentiment_flag'] = 'neg'
  else:
    review['sentiment_flag'] = 'neu'

Adding an year variable that indicates what `year` the review is from.

In [None]:
for review in rbc_reviews:
  review['year'] = review['at'].year

 `rbc_reviews` to dataframe

In [None]:
import pandas as pd
df = pd.DataFrame.from_records(rbc_reviews)
df

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion,sentiment_score,sentiment_flag,year
0,82ce7fef-0ddd-444b-8fe1-363a66fe8552,Ruth Thompson,https://play-lh.googleusercontent.com/a-/ALV-U...,Use it all the time,5,0,4.40,2024-06-02 21:15:23,,NaT,4.40,0.0,neu,2024
1,1462c7a9-31b9-44f5-a665-75b225539e79,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,errors not working as said,5,0,4.41,2024-06-02 06:04:42,,NaT,4.41,-1.4,neg,2024
2,7a0f42e7-beca-4b2b-aaba-8c55f4cf790e,Daniel Boudreau,https://play-lh.googleusercontent.com/a-/ALV-U...,App works good but still have room for improve...,3,0,4.41,2024-06-01 20:41:57,,NaT,4.41,0.0,neu,2024
3,6b025703-5246-4a96-a785-168d461b52dc,Remi Matlosz,https://play-lh.googleusercontent.com/a/ACg8oc...,They will refuse to release funds on time for ...,1,0,4.41,2024-06-01 20:14:23,,NaT,4.41,0.0,neu,2024
4,9d031668-3fb9-496f-94a5-689328f5ca01,Verö Neek,https://play-lh.googleusercontent.com/a-/ALV-U...,Edit June 2024: seems that every time I make a...,2,2,4.41,2024-06-01 16:54:30,,NaT,4.41,0.0,neu,2024
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
592,06f67cef-251b-4909-a612-5ade81f9acc0,Mike Jackson,https://play-lh.googleusercontent.com/a/ACg8oc...,Your app sux,1,1,4.34,2023-11-19 18:13:37,,NaT,4.34,0.0,neu,2023
593,54084090-8b74-4808-8950-4772fc0f703d,Maria Fellah,https://play-lh.googleusercontent.com/a/ACg8oc...,Very easy and friendly,5,0,4.32,2023-11-18 02:20:42,Thank for for the great review! -- Ray,2023-11-20 22:10:17,4.32,0.0,neu,2023
594,d66e451a-35b2-4513-8c83-89882912584f,Tammy McArthur,https://play-lh.googleusercontent.com/a/ACg8oc...,Love RBC been with them from my first job here...,5,0,4.24.1,2023-11-17 12:26:19,Thank you for the awesome review ! -- Ray,2023-11-20 22:09:24,4.24.1,3.2,pos,2023
595,abea9976-612b-4d00-9e83-27909f4f9b37,A Fish,https://play-lh.googleusercontent.com/a-/ALV-U...,The systems are FAR too slow to update. When y...,2,0,4.34,2023-11-16 22:23:45,Thank you for your time and feedback. You will...,2023-11-20 22:07:37,4.34,0.0,neu,2023


Percentage of reviews that are positive, negative, and neutral.

In [None]:
df['sentiment_flag'].value_counts() * 100 /df['sentiment_flag'].value_counts().sum()

sentiment_flag
neu    75.544389
pos    17.420436
neg     7.035176
Name: count, dtype: float64

Percentage of reviews that are positive, negative, and neutral for each year

In [None]:
yr = df['year'].unique()
yr
# unique year are only 2023 and 2024

array([2024, 2023])

In [None]:
pd.pivot_table(
    df,
    index=df['sentiment_flag'],
    columns=df['year'],
    aggfunc='size'
    ).transform(lambda x: x*100/sum(x))

year,2023,2024
sentiment_flag,Unnamed: 1_level_1,Unnamed: 2_level_1
neg,3.225806,8.033827
neu,70.16129,76.955603
pos,26.612903,15.010571
