# Analysis of Spotify Reviews from Google Playstore Data

Loading the libraries

In [2]:
from google_play_scraper import Sort, reviews_all
import pandas as pd
import numpy as np



## Scraping the data with google_play_scraper

In [138]:
us_reviews = reviews_all(
    'com.spotify.music', # the id is found within the app url at play.google.com/store
    sleep_milliseconds=0, 
    lang='en', 
    country='us', 
    sort=Sort.MOST_RELEVANT, # defaults to Sort.MOST_RELEVANT
)

Loading the data into a dataframe

In [139]:
df_spotify = pd.DataFrame(np.array(us_reviews),columns=['review'])


df_spotify = df_spotify.join(pd.DataFrame(df_spotify.pop('review').tolist()))

df_spotify.head()




Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt
0,65f84535-0b00-46a5-8764-f5b7f9a842e9,Maurice Ward,https://play-lh.googleusercontent.com/a-/AFdZucoIDk0EuYmoCxqC-96tQSnAk-BEZuoJtVdws-Eb,"Spotify has been playing very well for me. However, I am dealing with an annoying bug in which the music only plays for close to 10 seconds before quitting. I would clear my cache, re-install, and even made sure to fix my battery optimization. It would play, but then it would cut the song off at odd times. I can't really subscribe to a service that isn't working like it should. I don't know what happened, but this isn't the Spotify I remembered using.",2,36,8.7.48.1062,2022-07-26 13:17:18,,NaT
1,3fb5e8e1-04a6-4d62-be4e-59c2f175878f,Cathy Leavitt,https://play-lh.googleusercontent.com/a/AItbvmmIhobEmnhBvX-lXhTxGWO_N9Z5l3yKTHPm0ys7=mo,"Unusable. I've had this app for years. Always worked no problem. The past few days I cannot get it to do anything. Will not play any music. I've tried logging out and back in, updating the app,, uninstalled and reinstalled, cleared cache, nothing works. Every time I try to play a song it just buffers endlessly. It's a shame because I had a bunch of playlists but if I can never listen to them then what's the point.",1,23,8.7.48.1062,2022-07-26 14:46:05,,NaT
2,f204cddb-f95c-40b3-97b3-034f6674d134,Paul,https://play-lh.googleusercontent.com/a-/AFdZucrWQ7dz05_pqlTSSeL98rJIbdXLk_mAxpY_xWHQqQ,"Used to be really good. I use Spotify solely for podcasts, and lately it has been having lots of problems. Mainly, when I stop using the app for several minutes, it will not allow me to pause/stop the podcast upon playing. And if I close the app entirely, it won't let me open it again for several minutes. I basically have to wait for it to fully shutdown in the background I guess. Also, it will randomly skip podcasts mid stream. Just out of nowhere it will skip to the next podcast on the list.",2,15,8.7.48.1062,2022-07-25 17:31:16,,NaT
3,a37ecc47-47d2-45b5-862f-d751874723a8,Mike Galyardt,https://play-lh.googleusercontent.com/a-/AFdZucrrxZ4O1Vd5wXFtirgoYKbh5HJkDYTW_zUEVym-EA,"I really don't understand how one of the biggest music apps on the planet can be this buggy. Frequently missing the ""now playing"" tab at the bottom of the app, which means all controls (skip, pause, etc) are missing. Randomly says ""no internet connection"" while clearly connected to the internet and streaming. If you search for a song and play it, then let it autoplay similar songs afterwards, you can't go back and listen to the original song without backing out and selecting the song again.",1,361,8.7.48.1062,2022-07-21 21:47:38,,NaT
4,65a68c48-5d7c-4694-af47-741b296ce41d,Barb Zerbe,https://play-lh.googleusercontent.com/a/AItbvml_BqwIan-2Lsqv8sPGV3p7N6TWS30DphO9Pt0Z=mo,"My podcasts are breaking off after playing for about 2 minutes and I keep having to restart the phone. It's a new phone for me, so I'm probably doing something wrong, but it is frustrating. I really like Spotify on my old phone. I use it for my activities job at a nursing home. I ask loved ones of catatonic patients what their favorite music is, and they respond !",3,33,8.7.48.1062,2022-07-24 01:14:14,,NaT


I am dropping some columns that are not usefull for this analaysis and changing column names for a better understanding

In [140]:
df_spotify.drop(columns={'reviewId', 'userImage', 'reviewCreatedVersion'}, inplace=True )
df_spotify = df_spotify.rename(columns={'userName':'user', 'content':'review_description', 'score':'rating', 'at':'review_date', 'replyContent':'developer_response', 'repliedAt':'developer_response_date'})



Inspecting the dataframe

In [141]:
df_spotify.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 126166 entries, 0 to 126165
Data columns (total 7 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   user                     126166 non-null  object        
 1   review_description       126166 non-null  object        
 2   rating                   126166 non-null  int64         
 3   thumbsUpCount            126166 non-null  int64         
 4   review_date              126166 non-null  datetime64[ns]
 5   developer_response       4319 non-null    object        
 6   developer_response_date  4319 non-null    datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(3)
memory usage: 6.7+ MB


Sorting the rows by date (descending) and checking the count of ratings (1, 2, 3, 4, 5)

In [142]:
df_spotify = df_spotify.sort_values(['review_date'], ascending=False, inplace=False)
df_spotify.reset_index(drop=True)
df_spotify['rating'].value_counts()

1    38228
5    36357
4    17761
2    17406
3    16414
Name: rating, dtype: int64

In [144]:
df_spotify.head()

Unnamed: 0,user,review_description,rating,thumbsUpCount,review_date,developer_response,developer_response_date
4387,Bharti Singh,"It was a good app but from the past month it's malfunctioning, tye songs doesn't play in order, it literally skips so many songs and my playlist isn't even on shuffle. And a jpop mix playlist is supposed to only have Japanese songs what are English and kpop songs doing there ???? Edit : More problems each and every song doesn't play after 9 seconds.",1,0,2022-07-29 06:13:51,,NaT
1322,Rach HW Burns,"Usually works well but currently there are so horrible glitches. Doesn't sync, says I'm offline (I'm not) and some lyrics are completely wrong. Occasionally it just cuts out of a song despite my phone being in my bag and not being used. Weird. Never had issues before..",3,0,2022-07-28 23:47:11,,NaT
24,Jordan Barnard,"Pros I can listen to all my music in 1 place. I can import local tracks to hear. Podcasts are awesome to have. The concert information is very cool. Cons Costs $140 dollars a year and premium features rarely work better than free. I download my favorite tracks just to have them not load offline. Half the time my clicks don't register on the app. It can be rendered essential useless at random. Edit 7-28; absolutely terrible. I can't even open the all without the ""no connection"" message. 0/5",1,1,2022-07-28 21:28:38,,NaT
7352,Adam Wolske,"Amazing selection and ability to create your own playlists. Only get limited number of skips per hour with free version. Definitely worth downloading 👌. Gets FRUSTRATING when you want to hear a particular song but you get a ton of ""recommended"" songs. I just want to listen to the song I just searched up 😡🤬",4,0,2022-07-28 21:17:12,,NaT
125312,King Bobles,Love the app. But recently whenever I try to open my liked songs on mobile it just comes up as a black screen,1,0,2022-07-28 17:42:40,,NaT


In [143]:
df_spotify.tail()

Unnamed: 0,user,review_description,rating,thumbsUpCount,review_date,developer_response,developer_response_date
113319,A Google user,"Their newest update has made it so that your music stops after every track, even if you have songs in your queue. It is very frustrating",2,0,2018-09-12 13:05:58,,NaT
22908,A Google user,"Worked ok until i applied the student discount. Started working fine when i started paying full price, my colleagues said that would happen but i just wanted to test their theory. I'm a full time student so would love to take advantage of the student option as thats the only way the price competes with your competitors. The apps simple to use, so remove the trip wires and this would change to a 3/4 star 😎",1,0,2018-09-12 13:03:03,,NaT
91645,A Google user,"Great app, I just missing the thumbs down feature, to songs I never want to hear again! So annoying to have to bump in to songs I don't like on Playlists....",4,0,2018-09-12 11:49:13,,NaT
70942,A Google user,"Spotify on mobile embraces the marketplace's greed fully, most restricted 'free' experience I have ever . They're trying their absolute best to try and get you to pay, even if it means crippling the experience. Even Spotify's own desktop app puts this one to shame.",2,0,2018-09-12 10:55:09,,NaT
67809,A Google user,I've been using Spotify for almost 4 years. If you're looking for what you want when you want this is the app. You can build playlist or search through other playlists 💯 x better than Pandora,5,0,2018-09-12 09:36:43,,NaT


Extract all reviews containing the word design and saving it to a new df

In [160]:
design_word = ["design","Design","button","graphics","ui", "UI", "UX", "ux", "user experience"]
df_spotify_design = df_spotify[df_spotify["review_description"].str.contains('|'.join(design_word))]
df_spotify_design.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11701 entries, 28714 to 67809
Data columns (total 7 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   user                     11701 non-null  object        
 1   review_description       11701 non-null  object        
 2   rating                   11701 non-null  int64         
 3   thumbsUpCount            11701 non-null  int64         
 4   review_date              11701 non-null  datetime64[ns]
 5   developer_response       444 non-null    object        
 6   developer_response_date  444 non-null    datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(3)
memory usage: 731.3+ KB


There are 11701 reviews containing words connected to design (see above)

Creating dataframes for each rating

In [168]:
df_spotify_design.to_csv('spotify_design.csv', index=False)

In [3]:
df = pd.read_csv('spotify_design.csv')

design_1 = df[df["rating"] == 1]
design_2 = df[df["rating"] == 2]
design_3 = df[df["rating"] == 3]
design_4 = df[df["rating"] == 4]
design_5 = df[df["rating"] == 5]

The column review_description is truncated to 50 characters. To inspect some of the reviews fully, I change the display.max_colwidth

In [162]:
pd.set_option('display.max_colwidth', None)

## Create a Wordcloud with reviews containing the word design

In [4]:
from wordcloud import WordCloud, STOPWORDS

Saving the review description for each rating file seperatly 

In [5]:
design_1_review = design_1['review_description']
design_1_review.to_csv('design1_words.csv', index=False)

design_2_review = design_2['review_description']
design_2_review.to_csv('design2_words.csv', index=False)

design_3_review = design_3['review_description']
design_3_review.to_csv('design3_words.csv', index=False)

design_4_review = design_4['review_description']
design_4_review.to_csv('design4_words.csv', index=False)

design_5_review = design_5['review_description']
design_5_review.to_csv('design5_words.csv', index=False)

In [9]:
design1_words = open('design1_words.csv', mode='r', encoding='utf-8').read()
design2_words = open('design2_words.csv', mode='r', encoding='utf-8').read()
design3_words = open('design3_words.csv', mode='r', encoding='utf-8').read()
design4_words = open('design4_words.csv', mode='r', encoding='utf-8').read()
design5_words = open('design5_words.csv', mode='r', encoding='utf-8').read()
design_1_2_words = open('design1_words.csv', 'design2_words.csv', mode='r', encoding='utf-8').read()

TypeError: argument for open() given by name ('mode') and position (2)

In [11]:
design1_2_words = [design1_words, design2_words]
design3_4_words = [design3_words, design4_words]

By looking at the wordlcoud, I identified more words (without information content) for the STOPWORDS list and updated it.

In [7]:
stopword_add_ = "Spotify app music playlist play every now design song songs love best great premium design Design button graphics ui UI UX ux user experience"
list_stopadd = stopword_add_.split()
STOPWORDS.update(list_stopadd)

Creating the wordclouds

In [184]:
x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="white",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design1_words)

wc.to_file('design1_wordcloud.png')

x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="grey",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design2_words)

wc.to_file('design2_wordcloud.png')

x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="green",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design3_words)

wc.to_file('design3_wordcloud.png')

x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="blue",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design4_words)

wc.to_file('design4_wordcloud.png')

x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="red",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design5_words)

wc.to_file('design5_wordcloud.png')

<wordcloud.wordcloud.WordCloud at 0x29d76481600>

Comparing Wordlcoud with rating 1 & 2 to wordcloud with rating 3 & 4

In [12]:
x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="grey",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design1_2_words)

wc.to_file('design_1_2_wordcloud.png')



x, y = np.ogrid[:1000, :1000]

mask = (x - 500) ** 2 + (y - 500) ** 2 > 400 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(
                background_color="grey",
                stopwords=STOPWORDS,
                width=600, 
                height=400, 
                mask=mask).generate(design3_4_words)

wc.to_file('design_3_4_wordcloud.png')

TypeError: expected string or bytes-like object

The wordlcoud dows not appear to give meaningful insights.

Key Findings:
- Frequency rating containing design words- 1: 38228, 5: 36357, 4: 17761, 2: 17406, 3: 16414
- The reviews range from 2022-07-28 to 2018-09-12
- One has to analyze the reviews containing information about design manually to find repetitive feedback on a common aspect
- The information content seems to vary a lot within the reviews
- The wordcloud is not the ideal way to analyze reviews from the PlayStore
- A more specific approach (qualitative interviews) seems to be the better approach
