In [22]:
top_genres = (
    genres_df.apply(pd.Series.value_counts)
    .apply(np.sum, axis=1)
    .sort_values(ascending=False)
    .reset_index()
    .rename(columns={'index': 'genre', 0: 'count'})
)

Then we can plot. Lets start with the total listens per genre. 

In [23]:
#hide_input
alt.Chart(top_genres[:10]).mark_bar().encode(
    x='count', 
    y=alt.Y('genre', sort='-x'),
    tooltip=['count'],
).properties(title='Listens per genre')

No big surprises here. My main music tastes are hip hop and electronic music, with main genres techno and drum and bass. However, for the latter two I mainly use youtube, which hosts sets that Spotify does not have. So my Spotify is mainly dominated by hip hop and its related genres, like _rap_, _hip hop_ and _pop rap_ (whatever that is? Drake maybe?). I expect many hip hop songs are also tagged as _pop_, which would explain the high _pop_ presence, while I normally am not such a pop fan. Lets dive a bit deeper into this!

In [24]:
#hide
top_genres_10 = top_genres[:10].genre.values
top_genres_20 = top_genres[:20].genre.values

Then we loop over the rows and for each present genre, we put a 1 in that row. We fill the others with 0. We only do this for the top 20 genres.  

In [25]:
rows = []
for i, row in comb.loc[:, [str(x) for x in range(21)]].iterrows():
    new_row = {}
    for value in row.values:
        if value in top_genres_20:
            new_row[value] = 1
    rows.append(new_row)
genre_presence = pd.DataFrame(rows)
genre_presence = genre_presence.fillna(0).astype(int)
genre_presence.head(2)

Unnamed: 0,hip hop,pop,pop rap,rap,edm,electro house,dance pop,tropical house,big room,brostep,bass trap,electronic trap,house,progressive electro house,progressive house,detroit hip hop,g funk,west coast rap,conscious hip hop,tech house
0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now that we have this data, we can do a correlation analysis of when each genre coincides with what other genre. Now, because genre is a nominal data type, we cannot use the _standard correlation_, which is the __[Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)__. Instead, we should use a metric that works with nominal values. I choose __[Kendall's tau](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient)__ for this, due to its simplicity. Normally, Kendall's tau is meant for _ordinal_ values (variables that have an ordering). However, because we are working with a binary situation (genre is either present or not) represented by 0 and 1, I think this should still work. One other thing to note is that Kendall's tau is _symmetric_, and this means `tau(a, b)` is the same as `tau(b, a)`. 

> Note: If you have thoughts on how to do this better, let me know cause I'm definitely open for ideas. 😉 

Lets loop over all the combinations of genres and compute their tau coefficient.

In [26]:
from scipy.stats import kendalltau
from itertools import product
rows = []
for genre_a, genre_b in product(genre_presence.columns.values, repeat=2):
    tau, p = kendalltau(genre_presence[genre_a].values, genre_presence[genre_b].values)
    rows.append({'genre_a': genre_a, 'genre_b': genre_b, 'tau': tau})
tau_values = pd.DataFrame(rows)
tau_values[:2]

Unnamed: 0,genre_a,genre_b,tau
0,hip hop,hip hop,1.0
1,hip hop,pop,-0.040954


We can make a nice correlation dataframe from this using the command below. However, because Altair wants data in columns, I won't use that for the visualization. 

In [27]:
corr = tau_values.pivot(index='genre_a', columns='genre_b', values='tau').fillna(0).style.background_gradient(cmap='coolwarm', axis=None)

Let's see how these genres correlate then.

In [28]:
#hide_input
alt.Chart(pd.DataFrame(rows)).mark_rect().encode(
    x=alt.X('genre_a:N',
           axis=alt.Axis(
               labelAngle=-45
           )
    ), 
    y='genre_b:N', 
    color=alt.Color('tau:Q', scale=alt.Scale(scheme='blueorange', domain=[-1, 1])),
    tooltip=['tau', 'genre_a', 'genre_b'],
).configure_axisY()

We immediately can see some interesting clusters. We can see a strong tau between most of the electronic music genres, like _edm_, _electro house_, _bass trap_, _big room_, _brostep_ and _electronic trap_. Then, looking at _hip hop_, we can see very strong coefficients with _rap_ and _pop rap_, neither of which are big suprises. My initial hypothesis that _pop_ would be correlated with hip hop has been debunked, though. _Pop_ seems to be more strongly related with _edm_ and some other electronic genres. 

In this overview, I think there are two interesting insights still:
- A strong coefficient between _conscious hip hop_ and _west coast rap_. I did not really expect this, but can likely be attributed to artists like Kendrick Lamar, who deal with social and political issues in their lyrics. Additionally, cities like Compton played a big role in west coast hip hop, and were often strongly related to their social and economical situation (Also for Kendrick Lamar).
- A strong coefficient between _G-funk_ and _Detroit hip hop_. G-funk is a is a subgenre of hip hop that originated in the west coast, while Detroit hip hop, as the name says, comes from Detroit. A strong coefficient between _G-funk_ and _west coast rap_ might have been more expected. Interesting to see, but I won't dive deeper into these findings for now. 

## Monthly change in genres 📅
This is a very interesting analysis in my opinion, but also one of the more challenging one. I've approached the problem the following way, given the data I had. 

1. Count the frequency of each genre on a certain interval, monthly in this case.
2. Divide these numbers by the total plays for those intervals, so we get a percentage of total plays of that month. This number means how much of the songs (or artist of that song) had that genre. This means that these percentages will not sum to one (or you know, it can, but it doesn't have to). 
3. Sort given these percentages and extract the monthly top 5.

**Step 1**: count the frequency per interval

In [30]:
# Step 1. Count all genre occurences per month.
counters_per_month = []
for year, month in product(df.year.sort_values().unique(), df.month.sort_values().unique()):
    if len(df.loc[(df.year == year) & (df.month == month)]) > 0:
        counter = {'year': year, 'month': month}
        for i, row in df.loc[(df.year == year) & (df.month == month)].iterrows():
            for genre in row.top_genres:
                counter[genre] = counter.get(genre, 0) + 1
        counters_per_month.append(counter)

In [104]:
counts_per_genre_per_month = pd.DataFrame(counters_per_month)    
monthly_sum = df.groupby(['year', 'month']).size().reset_index().rename(columns={0: 'count'})

In [33]:
#hide 
# counts_per_genre_per_month = counts_per_genre_per_month.loc[:, ['year', 'month'] + top_genres_10.tolist()]
# for i, row in monthly_sum.iterrows():
#     counts_per_genre_per_month.loc[(counts_per_genre_per_month.year == row.year) & (counts_per_genre_per_month.month == row.month), top_genres_10] = counts_per_genre_per_month.loc[(counts_per_genre_per_month.year == row.year) & (counts_per_genre_per_month.month == row.month), top_genres_10] / row['count']

**Step 2**: We then normalize all genre counts by the number of songs played in that time period. 

In [106]:
# 2.Normalize all genre counts by the number of songs played in that time period. 

# Select all columns except the time columns
columns = counts_per_genre_per_month.columns.tolist()
columns.remove('year')
columns.remove('month')
for i, row in monthly_sum.iterrows():
    counts_per_genre_per_month.loc[(counts_per_genre_per_month.year == row.year) & (counts_per_genre_per_month.month == row.month), columns] = counts_per_genre_per_month.loc[(counts_per_genre_per_month.year == row.year) & (counts_per_genre_per_month.month == row.month), columns] / row['count']

We now have a dataframe with 861 columns, which corresponds to 859 different genres. 

In [110]:
counts_per_genre_per_month.shape

(45, 862)

Which looks like this:

In [109]:
counts_per_genre_per_month.head(2)

Unnamed: 0,year,month,east coast hip hop,hip hop,pop,pop rap,rap,trap music,catstep,complextro,...,classical soprano,spanish hip hop,trap espanol,pop reggaeton,chinese hip hop,corrido,regional mexican pop,australian indigenous,witch house,ghettotech
0,2013,10,0.008696,0.017391,0.2,0.043478,0.043478,0.034783,0.13913,0.252174,...,,,,,,,,,,
1,2013,12,,,0.5,0.5,,,,,...,,,,,,,,,,


**Step 3**: Sort given these values and extract the top 5. Unfortunately, the data is not in a shape that we can do that, so we need to transform it a bit further by moving from a wide to a long data format and filtering out some values. 

We now have a dataframe with all the genres and what percentage of total plays they were present as a genre. To get a cleaner visual, we remove any data before August 2016. Keep in mind that an artist/song generally has more than one genre, so the sum of there fractions is not 1. 

In [111]:
counts_per_genre_per_month_filtered = counts_per_genre_per_month.loc[(counts_per_genre_per_month.year > 2016) | ((counts_per_genre_per_month.year == 2016) & (counts_per_genre_per_month.month > 8))]

In [113]:
#hide_input
counts_per_genre_per_month_filtered[:2]

Unnamed: 0,year,month,east coast hip hop,hip hop,pop,pop rap,rap,trap music,catstep,complextro,...,classical soprano,spanish hip hop,trap espanol,pop reggaeton,chinese hip hop,corrido,regional mexican pop,australian indigenous,witch house,ghettotech
16,2016,9,0.03876,0.449612,0.387597,0.488372,0.51938,0.069767,0.007752,,...,,,,,,,,,,
17,2016,10,0.055409,0.313984,0.343008,0.279683,0.337731,0.036939,0.026385,0.034301,...,,,,,,,,,,


The melting of the dataframe results in a single row per percentage per genre per timeunit. This makes it easier to plot with Altair. Furthermore, we create a datetime column from our year + month columns, which is also better for Altair to use. 

In [114]:
counts_per_genre_per_month_melted = pd.melt(counts_per_genre_per_month_filtered, id_vars=['year', 'month'], value_vars=columns, var_name='genre', value_name='percentage')
counts_per_genre_per_month_melted['datetime'] = pd.to_datetime(counts_per_genre_per_month_melted.month.astype(str) + '-' + counts_per_genre_per_month_melted.year.astype(str), format='%m-%Y')

In [40]:
#hide_input
counts_per_genre_per_month_melted[:2]

Unnamed: 0,year,month,genre,percentage,datetime
0,2016,9,rap,0.004026,2016-09-01
1,2016,10,rap,0.000891,2016-10-01


This looks great! But, there is one problem, and that is that we likely have way too many rows for Altair.

In [116]:
counts_per_genre_per_month_melted.shape

(24940, 5)

Welp, so we have almost 25k rows, while Altair's maximum is 5k. 😅

But since we have these percentages, we can likely filter on that! What if we group on the month and then get the n largest values. The `level_2` you see at the end of the command is what pandas gave as a name to that index (level 2 index). It just refers to the original index used in `counts_per_genre_per_month_melted` (i.e. the row number). Indexing on this leaves us only with the melted rows that correspond to the top genres per month.

In [218]:
top_genres_per_month_with_perc = counts_per_genre_per_month_melted.loc[counts_per_genre_per_month_melted.groupby(['year', 'month']).percentage.nlargest(5).reset_index().level_2.values, :]
top_genres_per_month_with_perc.set_index(['year', 'month']).head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,genre,percentage,datetime
year,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016,9,rap,0.51938,2016-09-01
2016,9,pop rap,0.488372,2016-09-01
2016,9,hip hop,0.449612,2016-09-01
2016,9,pop,0.387597,2016-09-01
2016,9,indie pop rap,0.131783,2016-09-01
2016,10,edm,0.37467,2016-10-01
2016,10,pop,0.343008,2016-10-01
2016,10,rap,0.337731,2016-10-01
2016,10,electro house,0.324538,2016-10-01
2016,10,hip hop,0.313984,2016-10-01


In [136]:
top_genres_per_month_with_perc.shape

(145, 5)

And we only have 145 rows left, so we can use it with Altair 😎.

In [137]:
#hide
top_genres_per_month_with_perc.to_csv('top_genres_per_month_per_year.csv', index=False)

In the chart below, there is a lot going on. On the x-axis we have time while on the y-axis we have the normalized percentages of the top 5 genres. This means that for each month, the top 5 genres' percentages sum to represent 1. This might be hard to grasp, so I've put the non-normalized one next to this plot to make the difference clear. Some colors are used twice, but there is no color scheme available in Altair that supports more than 20 colors, so this will have to do for now 😉. You can hover over the bars to get details of those bars and click on legenda items to highlight a genre. 

## Top genres with percentages 📊

In [219]:
#collapse-hide
selection = alt.selection_multi(fields=['genre'], bind='legend')

normalized = alt.Chart(top_genres_per_month_with_perc).mark_bar().encode(
    x=alt.X('yearmonth(datetime)', title='Month per year'),
    y=alt.Y('percentage', stack='normalize', title='Normalized percentage (%)'),
    color = alt.Color(
        'genre',
        scale=alt.Scale(
           scheme='tableau20',
        )
    ),
    tooltip=['genre', 'percentage', 'yearmonth(datetime)'],
    order=alt.Order(
      'percentage',
      sort='descending'
    ),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).properties(
    title='Normalized percentage occurences of top 5 genres per month',
    width=MAXWIDTH
).add_selection(
    selection
)

non_normalized = alt.Chart(top_genres_per_month_with_perc).mark_bar().encode(
    x=alt.X('yearmonth(datetime)', title='Month per year'),
    y=alt.Y('percentage', title='Percentage (%)'),
    color = alt.Color(
        'genre',
        scale=alt.Scale(
           scheme='tableau20',
        )
    ),
    tooltip=['genre', 'percentage', 'yearmonth(datetime)'],
    order=alt.Order(
      'percentage',
      sort='descending'
    ),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).properties(
    title='Percentage occurences of top 5 genres per month', 
    width=MAXWIDTH
).add_selection(
    selection
)

non_normalized & normalized


There are definitely some interesting things in theses plots. We can see some consistent attendees that we also saw in the most listened genres in general, so that's not a big surprise. For example, these include _rap_, _edm_ and _hip hop_. 

- **Seasonal effects**: What is quite interesting is to see when the very common genres are not dominating the chart, like in December of 2016. Both in November and December of 2016 we see I was in a very strong christmas mood. The top genres in decemer are _adult standard_ (whatever that may be), _easy listening_, _christmas_ and _lounge_. Those definitely are in the same league. We do not see this seasonal effect in 2017 and 2018, although 2018 has a peak in _emo rap_ 🤔. That might be interesting to look at in another blogpost. 
- **Electronic periods**: Something else that stands out is that there are _electronic music_ periods, like June, July and August of 2017 and January of 2018. However, both _edm_ and _electro house_ are present in essentially each month as high scorers, so I'm definitely a fan in general. But these peak months still stand out. 
- **Rise of Rap**:  The last thing that is interesting is probably the fact that _rap_ and _hip hop_ have almost exclusively been the top 2 from February 2018 to January 2019. This indicates a move away from the more electronic genres and more towards hip hop. I don't have a clear explenation for this, and might just be an actual preference shift. It will be quite interesting to see how this has progressed in 2019 and 2020. 

## Top genres without percentages 🏆
So we've seen how the genres relate to each other in terms of percentages per month. We can also see what the top genres are per month, but it can definitely still be improved. I really just want a list with the top 5 genres per month, ideally easily readable and pretty close to the example we had from Last.fm. 

As a reminder, that looked like this:

![Your top genres, plotted per week.](images/spotify_analysis/genre-timeline-lastfm.png "Your top genres, plotted per week. Source: Last.fm")


We can get a list of the top genres per month by grouping and then applying list on the Series. 

In [142]:
top_genres_per_month = top_genres_per_month_with_perc.groupby(['year',  'month']).genre.apply(list).reset_index()
top_genres_per_month[:2]

Unnamed: 0,year,month,genre
0,2016,9,"[rap, pop rap, hip hop, pop, indie pop rap]"
1,2016,10,"[edm, pop, rap, electro house, hip hop]"


We then create a numpy array from these values and apply them column by column to new dataframe columns.

In [143]:
genre_array = np.stack(top_genres_per_month.genre.values)
for i, new_col in enumerate([f'genre_{x}' for x in range(1, 6)]):
    top_genres_per_month[new_col] = genre_array[:, i]
top_genres_per_month = top_genres_per_month.drop('genre', axis=1)

Until we finally arrive at the following dataframe. Now, still is pretty much what I wanted, so I'm happy with the result. However, the lack of color makes interpreting this table still fairly challenging. Let's see if we can improve that a bit. 

In [144]:
top_genres_per_month = top_genres_per_month.set_index(['year', 'month']).T
top_genres_per_month

year,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,...,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019
month,9,10,11,12,1,2,3,4,5,6,...,4,5,6,7,8,9,10,11,12,1
genre_1,rap,edm,edm,adult standards,pop,electro house,pop rap,rap,rap,pop,...,rap,edm,rap,rap,rap,rap,rap,rap,rap,rap
genre_2,pop rap,pop,pop,easy listening,edm,filter house,rap,pop rap,pop rap,edm,...,hip hop,rap,pop rap,hip hop,edm,hip hop,hip hop,hip hop,hip hop,hip hop
genre_3,hip hop,rap,adult standards,christmas,rock,dance-punk,edm,hip hop,hip hop,electro house,...,edm,electro house,hip hop,pop rap,hip hop,pop rap,edm,pop rap,pop rap,pop rap
genre_4,pop,electro house,christmas,lounge,dance pop,electronic,hip hop,conscious hip hop,pop,brostep,...,pop,hip hop,edm,edm,pop rap,edm,pop rap,pop,pop,edm
genre_5,indie pop rap,hip hop,easy listening,dutch hip hop,tropical house,alternative dance,pop,west coast rap,conscious hip hop,electronic trap,...,pop rap,pop,pop,electro house,electro house,pop,pop,edm,emo rap,electro house


In [147]:
#hide
def top_n_genres_per_month_as_list(df, top_n=5):
    """
    From a dataframe with a numerical value for each genre, sort the genres on this value and extract the `top_n` genres. 
    Add these values again to a list of dictionaries, and put it in a dataframe. 
    """
    rows = []
    for i, row in counts_per_genre_per_month.iterrows():
        new_row = {'year': row.year, 'month': row.month}
        top_genres_this_month = [k for k, v in sorted(row[top_genres_10].to_dict().items(), key=lambda item: item[1], reverse=True)][:top_n]
        new_row = {**new_row, **{i: k for i, k in enumerate(top_genres_this_month)}}
        rows.append(new_row)
    return pd.DataFrame(rows)

In [148]:
#hide
top_n_genres_per_month_df = top_n_genres_per_month_as_list(counts_per_genre_per_month)
top_n_genres_per_month_df[['year', 'month']] = top_n_genres_per_month_df[['year', 'month']].astype(int)
top_n_genres_per_month_df = top_n_genres_per_month_df.loc[top_n_genres_per_month_df.year >= 2018]
top_n_genres_per_month_df = top_n_genres_per_month_df.set_index(['year', 'month']).T

To style, we can use the `style` ([docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html)) attribute of `pd.DataFrame`. This is an easy and super handy way of styling dataframes. It has two main methods: `.applymap` and `.apply`. The first one is applied to each cell individually, while the latter is applied to a whole row. That makes `.applymap` well suited for cell specific layouts, like min-max gradients for example, while `.apply` works very well for row-based operations, like highlighting the max. 

To use them, we need to define a coloring function to apply to the dataframe. As a parameter, we give all the unique values. This allows us to create a mapping, as well as define the number of colors required. The colors we use are RGB colors that aren't from the standard coloring libraries, like seaborn [color palette](https://seaborn.pydata.org/tutorial/color_palettes.html). This is because none of their palettes support the number of unique values we have, which is 26. So I used the tool called [i want hue](https://medialab.github.io/iwanthue/), that allows the generation of suitable color palettes. Getting 26 unique colors was still not easy (or a great succes in my opinion), but it works at least semi well. 

In [178]:
import seaborn as sns

colors_26 = [
    "#85cec7",
    "#f398d9",
    "#afe084",
    "#90a9f4",
    "#c0c15c",
    "#74aff3",
    "#e4e88b",
    "#d8afec",
    "#64ddab",
    "#f3a281",
    "#52ebd9",
    "#ebabbe",
    "#9de5a0",
    "#a2b8f0",
    "#e6bb6d",
    "#77cdef",
    "#b8c270",
    "#b6bee4",
    "#9ac68a",
    "#4cd1da",
    "#dfc299",
    "#a0ebe5",
    "#c0c38e",
    "#8cbca8",
    "#d8ebb4",
    "#a7e1c1"
]

def color_cells(val, unique_values):
    """
    Takes a cell value and applies coloring depending on the value. Should be applied to a cell, not a row. So use `.applymap`. If value is unknown, defaults to white. 
    """
    # Multiply with 255 to get into css RGB range (0, 255) instead of (0, 1).
    colors_arr = [tuple(int(y*255) for y in x) for x in sns.color_palette(colors_26)]  
    colormap = [f'rgb{x}' for x in colors_arr]
    colors = {k: v for k, v in zip(unique_values, colormap)}
    color = colors.get(val, 'white')
    return f'background-color: {color}'

In [171]:
#hide
color_cells('edm', unique_top_genres)

'background-color: rgb(62, 118, 49)'

In [192]:
unique_top_genres = np.unique(top_genres_per_month)
top_genres_per_month.style.applymap(color_cells, unique_values=unique_top_genres)

year,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019
month,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1
genre_1,rap,edm,edm,adult standards,pop,electro house,pop rap,rap,rap,pop,edm,pop,rap,pop,edm,rap,house,rap,rap,rap,edm,rap,rap,rap,rap,rap,rap,rap,rap
genre_2,pop rap,pop,pop,easy listening,edm,filter house,rap,pop rap,pop rap,edm,electro house,edm,pop rap,rap,electro house,pop,edm,hip hop,hip hop,hip hop,rap,pop rap,hip hop,edm,hip hop,hip hop,hip hop,hip hop,hip hop
genre_3,hip hop,rap,adult standards,christmas,rock,dance-punk,edm,hip hop,hip hop,electro house,pop,electro house,hip hop,edm,rap,pop rap,electro house,pop,pop rap,edm,electro house,hip hop,pop rap,hip hop,pop rap,edm,pop rap,pop rap,pop rap
genre_4,pop,electro house,christmas,lounge,dance pop,electronic,hip hop,conscious hip hop,pop,brostep,brostep,brostep,pop,pop rap,pop rap,hip hop,tech house,pop rap,edm,pop,hip hop,edm,edm,pop rap,edm,pop rap,pop,pop,edm
genre_5,indie pop rap,hip hop,easy listening,dutch hip hop,tropical house,alternative dance,pop,west coast rap,conscious hip hop,electronic trap,electronic trap,electronic trap,edm,electro house,pop,edm,pop,edm,pop,pop rap,pop,pop,electro house,electro house,pop,pop,edm,emo rap,electro house


Better get the 🚒 cause this table is 🔥. 

This is really close to the Last.fm plot, apart from the lines between points that require 10 years of D3.js experience. We see some similar pattern to those in the earlier plot, but also can see some new insights. Here, we can focus some more on the anomalies that are present, like _indie pop rap_, _dutch hip hop_, _filter house_ and _conscious hip hop_. These stand out more using this representation than before, which focused more on trends. 

**Insights**
- **More electronic peaks**: We can see that February 2017 was actually also a peak in electronic music, but due to similar colors in the previous plot this was a bit hidden. 
- **Pure hip hop periods**: Furthermore, we can also see there are some pure hip hop periods, like April and May of 2017, where EDM and electro house are not present at all, and we see more specific hip hop genres make way like _west coast rap_ and _conscious hip hop_. 

# In conclusion
We done a pretty thorough analyses of my listening history on Spotify. We evaluated the high level listening behaviour on a monthly and yearly basis. We have also seen my daily listening behaviour and how it has changed throughout the years. Then we took a pretty deep dive into the genres I listen to and certain patterns that are apparent in that data.

Topics for part 2:
1. An analysis of musical features, like energy, danceability and acousticness. Those are numeric values and thus allow for some different visualizations then all of the discrete values of this blogpost. 
2. A look into skipping behaviour -> which songs deserve to be skipped. 
3. Which songs do I listen to that are emo rap. This is my personal favorite item on the agenda. 

This blogpost has been a huge learning experience for me. It was my first time using Fastpages. It was my first time writing a blogpost in a jupyter notebook as well, and it was also my first time using Altair! All of those experiences were quite positive, and I especially like getting more familiar with Altair. Having a Grammar of Graphics tool in your toolbelt is an extremely valuable thing in the world of data science, although you might not use it on a daily basis. 

If you liked this blogpost, don't hesitate to reach out to me on [linkedin](https://www.linkedin.com/in/bauke-brenninkmeijer-40143310b) or [twitter](https://twitter.com/Bauke_B)