## What Spotify genres are the most popular?
by: Isabel Hayes

In this data analysis, we will be looking at data directly from Spotify. 

In [70]:
from matplotlib import pyplot as plt
import json
from datetime import datetime as dt
import seaborn
import pandas as pd
from scipy import stats

In [71]:
streamsBm = pd.read_csv(r"C:\Users\ihay0\DataTech\monthly_streams_index.csv")

In [59]:
streamsBm.head()

Unnamed: 0,month,standard_categories_list,index
0,Apr 2019,Arts,0.02287
1,Apr 2019,"Arts, Business, Education",0.00012
2,Apr 2019,"Arts, Business, Health & Fitness",0.00015
3,Apr 2019,"Arts, Business, Society & Culture",0.00214
4,Apr 2019,"Arts, Comedy",0.00673


The data is separated by month and year. Each month can one tag attached to it and multiple tags attached per genre. An example of this is that a tag can be a comedy tag or it can be comedy, true crime tags. Listens have been scaled to be in a range from 0 to 1 so something like .01 would have a greater listen rate than .001.

In [60]:
streamsBm['standard_categories_list']

0                                     Arts
1                Arts, Business, Education
2         Arts, Business, Health & Fitness
3        Arts, Business, Society & Culture
4                             Arts, Comedy
                       ...                
10549                    Sports, TV & Film
10550                           Technology
10551                           True Crime
10552                            TV & Film
10553                TV & Film, True Crime
Name: standard_categories_list, Length: 10554, dtype: object

In [61]:
streamsBm['month']

0        Apr 2019
1        Apr 2019
2        Apr 2019
3        Apr 2019
4        Apr 2019
           ...   
10549    Sep 2020
10550    Sep 2020
10551    Sep 2020
10552    Sep 2020
10553    Sep 2020
Name: month, Length: 10554, dtype: object

In [40]:
streams= streamsBm.month.str.split(expand=True)

In [41]:
streams.head()

Unnamed: 0,0,1
0,Apr,2019
1,Apr,2019
2,Apr,2019
3,Apr,2019
4,Apr,2019


Spliting the dataframe by Month and year and adding the split data to the end of the data frame

In [72]:
streamsBm[['Mon','Year']] = streamsBm['month'].str.split(expand = True)

In [69]:
streamsBm

Unnamed: 0,month,standard_categories_list,index,Mon,Year
0,Apr 2019,Arts,0.02287,Apr,2019
1,Apr 2019,"Arts, Business, Education",0.00012,Apr,2019
2,Apr 2019,"Arts, Business, Health & Fitness",0.00015,Apr,2019
3,Apr 2019,"Arts, Business, Society & Culture",0.00214,Apr,2019
4,Apr 2019,"Arts, Comedy",0.00673,Apr,2019
...,...,...,...,...,...
10549,Sep 2020,"Sports, TV & Film",0.00220,Sep,2020
10550,Sep 2020,Technology,0.01204,Sep,2020
10551,Sep 2020,True Crime,0.27128,Sep,2020
10552,Sep 2020,TV & Film,0.06240,Sep,2020


Removing repeat data from the dataframe to keep everything concise

In [73]:
streamsBm.drop('month',inplace=True,axis=1)

In [54]:
streamsBm

Unnamed: 0,standard_categories_list,index,Mon,Year
0,Arts,0.02287,Apr,2019
1,"Arts, Business, Education",0.00012,Apr,2019
2,"Arts, Business, Health & Fitness",0.00015,Apr,2019
3,"Arts, Business, Society & Culture",0.00214,Apr,2019
4,"Arts, Comedy",0.00673,Apr,2019
...,...,...,...,...
10549,"Sports, TV & Film",0.00220,Sep,2020
10550,Technology,0.01204,Sep,2020
10551,True Crime,0.27128,Sep,2020
10552,TV & Film,0.06240,Sep,2020


Spliting the main dataframe into seperate dataframe that are split by the year the data was pulled from.

### 2018

In [76]:
year18 = streamsBm[streamsBm['Year'] == '2018']

In [77]:
year18

Unnamed: 0,standard_categories_list,index,Mon,Year
601,Arts,0.01664,Aug,2018
602,"Arts, Business, Society & Culture",0.00188,Aug,2018
603,"Arts, Comedy",0.00426,Aug,2018
604,"Arts, Comedy, Education, Society & Culture",0.00022,Aug,2018
605,"Arts, Comedy, Fiction",0.00020,Aug,2018
...,...,...,...,...
9859,Sports,0.13889,Sep,2018
9860,"Sports, TV & Film",0.00035,Sep,2018
9861,Technology,0.00063,Sep,2018
9862,True Crime,0.02656,Sep,2018


### 2019

In [74]:
year19 = streamsBm[streamsBm['Year'] == '2019']

In [75]:
year19

Unnamed: 0,standard_categories_list,index,Mon,Year
0,Arts,0.02287,Apr,2019
1,"Arts, Business, Education",0.00012,Apr,2019
2,"Arts, Business, Health & Fitness",0.00015,Apr,2019
3,"Arts, Business, Society & Culture",0.00214,Apr,2019
4,"Arts, Comedy",0.00673,Apr,2019
...,...,...,...,...
10178,"Sports, TV & Film",0.00065,Sep,2019
10179,Technology,0.00550,Sep,2019
10180,True Crime,0.19690,Sep,2019
10181,TV & Film,0.03553,Sep,2019


### 2020

In [78]:
year20 = streamsBm[streamsBm['Year'] == '2020']
year20

Unnamed: 0,standard_categories_list,index,Mon,Year
264,Arts,0.07501,Apr,2020
265,"Arts, Business",0.00015,Apr,2020
266,"Arts, Business, Comedy, News, Technology",0.00015,Apr,2020
267,"Arts, Business, Education",0.00024,Apr,2020
268,"Arts, Business, Health & Fitness",0.00019,Apr,2020
...,...,...,...,...
10549,"Sports, TV & Film",0.00220,Sep,2020
10550,Technology,0.01204,Sep,2020
10551,True Crime,0.27128,Sep,2020
10552,TV & Film,0.06240,Sep,2020


### 2021

In [79]:
year21 = streamsBm[streamsBm['Year'] == '2021']
year21

Unnamed: 0,standard_categories_list,index,Mon,Year
3028,Arts,0.06901,Feb,2021
3029,"Arts, Business",0.00030,Feb,2021
3030,"Arts, Business, Education",0.00069,Feb,2021
3031,"Arts, Business, Health & Fitness",0.00020,Feb,2021
3032,"Arts, Business, Music",0.00156,Feb,2021
...,...,...,...,...
7061,"Sports, TV & Film",0.00274,Mar,2021
7062,Technology,0.03189,Mar,2021
7063,True Crime,0.30550,Mar,2021
7064,TV & Film,0.10623,Mar,2021
