# The Radiohead Project - What can we learn about their discography?

I recently discovered this interesting data set (https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download), containing all studio album songs of one of my favorite bands of all time - Radiohead.


If you are interested in how this data was gathered, check this blog:
https://www.thompsonanalytics.com/blog/fitter-happier/


And now, let's have some fun with it and learn something new about this band, shall we?!

# Data Preperation

In [353]:
#Loading some libraries first

# Data

import numpy as np
import pandas as pd 

#Plots

import plotly.express as px 
import matplotlib.pyplot as plt

#Embedding plots
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [354]:
#Loading the data

df_radiohead = pd.read_csv(r'C:\Users\tobia\OneDrive\Desktop\radiohead.csv',
                        sep=',',
                        index_col=False,
                        skipinitialspace=True,
                        encoding='cp1252')

df_radiohead['amount_sad_words'] = round(df_radiohead['word_count'] * df_radiohead['pct_sad'])
df_radiohead

Unnamed: 0,track_name,valence,duration_ms,lyrics,album_name,album_release_year,album_img,pct_sad,word_count,lyrical_density,gloom_index,amount_sad_words
0,You,0.3050,208667,you are the sun and moon and stars are you and...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0000,19,0.091054,50.39,0.0
1,Creep,0.0960,238640,when you were here before couldn't look you in...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0784,51,0.213711,22.60,4.0
2,How Do You?,0.2640,132173,he's bitter and twisted he knows what he wants...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0952,21,0.158883,36.56,2.0
3,Stop Whispering,0.2790,325627,and the wise man said i don't want to hear you...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0435,46,0.141266,43.48,2.0
4,Thinking About You,0.4190,161533,been thinking about you your records are here ...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0000,39,0.241437,60.80,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Identikit,0.3540,266644,repeated background hook a moon shaped pool da...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1800,100,0.375032,32.25,18.0
97,The Numbers,0.0545,345887,it holds us like a phantom it touches like a b...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.0455,44,0.127209,22.82,2.0
98,Present Tense,0.3450,306581,this dance this dance is like a weapon is like...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1795,39,0.127209,35.56,7.0
99,Tinker Tailor Soldier Sailor Rich Man Poor Man...,0.0517,303689,all the holes at once are coming alive set fre...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1154,26,0.085614,15.80,3.0


In [355]:
#Checking the database

df_radiohead.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   track_name          101 non-null    object 
 1   valence             101 non-null    float64
 2   duration_ms         101 non-null    int64  
 3   lyrics              98 non-null     object 
 4   album_name          101 non-null    object 
 5   album_release_year  101 non-null    int64  
 6   album_img           101 non-null    object 
 7   pct_sad             101 non-null    float64
 8   word_count          101 non-null    int64  
 9   lyrical_density     101 non-null    float64
 10  gloom_index         101 non-null    float64
 11  amount_sad_words    101 non-null    float64
dtypes: float64(5), int64(3), object(4)
memory usage: 9.6+ KB


We see that there are 3 NaNs in the lyric column. To check the validity of this dataset, we should examine, which tracks these are:

In [356]:
#Checking for NaNs
df_radiohead.loc[df_radiohead.loc[:,'lyrics'].isna()]

Unnamed: 0,track_name,valence,duration_ms,lyrics,album_name,album_release_year,album_img,pct_sad,word_count,lyrical_density,gloom_index,amount_sad_words
40,Treefingers,0.0585,222600,,Kid A,2000,https://i.scdn.co/image/0a6b1c237ab9f7d7da0a04...,0.0,0,0.0,27.87,0.0
46,Untitled,0.0782,52695,,Kid A,2000,https://i.scdn.co/image/0a6b1c237ab9f7d7da0a04...,0.0,0,0.0,29.67,0.0
55,Hunting Bears,0.0779,121200,,Amnesiac,2001,https://i.scdn.co/image/7d2a9481f3136f8f9dda19...,0.0,0,0.0,29.64,0.0


We see, that only true instrumental songs come up, which is fine. They do indeed not have any words but must be left in as part of the albums recorded. The dataset is correct. It will be interesting to see how these songs with interlude character will influence scores and stats.

Nerdy side note: 

*Technically "Untitled" wasn't even its own song in the initial release of Kid A, but a hidden track that is part of Motion Picture Soundtrack. Spotify did split it up, when putting it up on the platform. However we will treat it as its own song in this analysis.*

In [357]:
#Get an overview over the general variables and their distributions

df_radiohead.describe()

Unnamed: 0,valence,duration_ms,album_release_year,pct_sad,word_count,lyrical_density,gloom_index,amount_sad_words
count,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0
mean,0.314836,252550.950495,2002.049505,0.065445,44.693069,0.185117,44.171782,3.178218
std,0.218394,60278.914774,7.058861,0.075814,24.945437,0.131948,21.518357,4.104622
min,0.0378,52695.0,1993.0,0.0,0.0,0.0,1.0,0.0
25%,0.131,217800.0,1997.0,0.0,28.0,0.115741,27.87,0.0
50%,0.272,257480.0,2001.0,0.0417,42.0,0.158883,40.71,2.0
75%,0.473,290213.0,2007.0,0.0968,57.0,0.234926,59.71,4.0
max,0.848,387213.0,2016.0,0.3571,126.0,1.031253,100.0,20.0


In [358]:
#Converting the duration of the songs to minutes

df_radiohead['duration_mins'] = df_radiohead['duration_ms'] / 60000

# EDA

## Albums overview

We can start with the basics. Up to this point Radiohead has released 9 studio albums. Let's check them out and see how the band has progressed over time.

### Hard facts #1 - Number of Tracks



In [359]:
#Plot Number of Tracks
fig = px.bar(df_radiohead, x = 'album_name', color = 'album_name', hover_name= 'track_name',
            labels={'album_name':'Album Name', 'duration_mins': 'Duration in Minutes', 'track_name':'Track Name', 'count':'Number of Tracks'},
            title= '<b>Radiohead Albums by Number of Tracks</b> <br>It seems like albums have been getting shorter, simply going by number of songs.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

fig.show()

While they started out with a fairly consistent output of 12 tracks per album, their work tended to feature less tracks in the later years with the exeption of 'Hail To the Thief'. 

### Hard facts #2 - Duration of the Albums

But did their albums really consist less music?

In [360]:
#Plot Duration of Albums
fig = px.bar(df_radiohead, x = 'album_name',y= 'duration_mins', color = 'album_name', hover_name= 'track_name',
            labels={'album_name':'Album Name', 'duration_mins': 'Duration in Minutes', 'track_name':'Track Name', 'count':'Number of Tracks'},
            title= '<b>Radiohead Albums by Duration of Tracks</b> <br>Going by actual track duration the picture looks a little different.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

fig.show()

Bringing track duration into the mix 'Hail To the Thief' stays the longest album of them all simply due to the sheer number of tracks. But 'OK Computer' and 'A Moon Shaped Pool' are almost catching up.

### Hard facts #3 - Temporal Profiles of the Albums

Radiohead always treated their album sequencing with great care to create a holistic experiencing for the listener. Can we find some pattterns in the data to learn a little more about their preferences?



In [361]:
#Plot Temporal Profiles
fig = px.bar(df_radiohead, y= 'duration_mins', color = 'album_name', hover_name= 'track_name',
            labels={'album_name':'Album Name', 'duration_mins': 'Duration in Minutes', 'track_name':'Track Name', 'count':'Number of Tracks'},
            title= '<b>The Temporal Profiles of All Radiohead Albums </b> <br>How about those valleys in the profiles?',
            template='plotly_dark', facet_col= 'album_name', facet_col_wrap=3, height=600, width=1500,)

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

fig.show()

Looking at these profiles we can see that the band loves to put in a tempo change song more often than not. In 6 out of 9 records Radiohead inserted sub-three-minute tracks at the halfway to tho third mark to alter the mood of the musical flow. It is an interesting stylistic choice.

### Hard facts #4 - Number of words

Lead songwriter Thom Yorke produces often haunting, but beautiful lyrics. By looking at the number of words in the individual tracks we can see, if the amount the band actually had to say or sing has changed at all:

In [362]:
#Plot Number of Words
fig = px.bar(df_radiohead, x = 'album_name',y= 'word_count', color = 'album_name', hover_name= 'track_name',
            labels={'album_name':'Album Name', 'duration_mins': 'Duration in Minutes', 'track_name':'Track Name', 'count':'Number of Tracks', 'word_count':'Number of Words'},
            title= '<b>Radiohead Albums by Number of Words</b> <br>The amount of lyrics over time has been fairly inconsistent.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

fig.show()

It's interesting to see that their more electronic and less rock-heavy albums 'Kid A' and 'Amnesiac' offer the least amount of lyrics. Next to these,  their shortest work, in terms of track number and duration album - 'The King of Limbs' -, also offers the least amount of lyrics. 'Hail To the Thief', regarded as their most political work, seems to have the most to say. 

Nerdy Sidenote:

*Spotting the iconic 'OK Computer' track 'Fitter Happier' (featuring a robot voice as a narrator) is rather easy looking at this plot.*

### Soft facts #1 - Album Valence

Valence is a metric Spotify has come up with. In their documentation they define this variable in the following way:

*A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).*

Hence, the metric scores the music itself in terms of sentiment. By determining the combined mean valence for the albums, we can compare them in musical positiveness (or rather negativeness in the case of Radiohead?).

In [363]:
#Data Prep
df_radiohead_soft_facts = df_radiohead.groupby(['album_name', 'album_release_year'], observed= True, as_index= False).agg(mean_valence = ('valence', 'mean'), mean_pct_sad = ('pct_sad', 'mean'), total_sad_words = ('amount_sad_words', 'sum'))
df_radiohead_soft_facts = df_radiohead_soft_facts.sort_values('album_release_year')
df_radiohead_soft_facts


Unnamed: 0,album_name,album_release_year,mean_valence,mean_pct_sad,total_sad_words
6,Pablo Honey,1993,0.315583,0.042858,16.0
7,The Bends,1995,0.334258,0.096892,56.0
5,OK Computer,1997,0.286167,0.08395,52.0
4,Kid A,2000,0.270427,0.0264,10.0
1,Amnesiac,2001,0.209755,0.033564,21.0
2,Hail To the Thief,2003,0.395971,0.071986,50.0
3,In Rainbows,2007,0.39309,0.05814,24.0
8,The King Of Limbs,2011,0.446738,0.093875,42.0
0,A Moon Shaped Pool,2016,0.203264,0.084155,50.0


In [364]:
#Plot Valence
fig = px.bar(df_radiohead_soft_facts, x = 'album_name',y= 'mean_valence', color = 'album_name',
            labels={'album_name':'Album Name', 'mean_valence':'Mean Valence'},
            title= '<b>Radiohead Albums by Musical Positiveness (Valence)</b> <br>In their album run from 2003 to 2011 Radiohead put forward greater musical positiveness than usual.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

#Add little annotation
fig.add_annotation(x=6, y=0.5,
            text="Was this their happy period?",
            showarrow=True,
            arrowhead=1,
            bordercolor='White',
            font_color = 'Black',
            bgcolor='white')

fig.show()

We can see why Radiohead is not known for their particularly cheerful music. None of their albums come even close to a valence score of 0.5. Meaning that all of their albums must be categorized as rather negative in their musical sentiment. However, their album run from 'Hail To the Thief' to 'The King Of Limbs' via 'In Rainbows' sticks out. In terms of musical positiveness this could be considered as their 'cheerful period'. 'A Moon Shaped Pool' scoring the lowest here, also makes sense. The production of the record was heavily colored by the Thom Yorke's separation from his partner, Rachel Owen, of almost 25 years.

### Soft facts #2 - Album Sad Lyrics - Relative

The data set also included the dimension 'pct_sad' measuring what percentage of lyrics in a particular song can be considered as sad. So after the tonal analysis of notes and arrangements of the songs, we can also check if the words that go with it mirror the sentiment of the tracks.

In [365]:
#Plot Relative Sadness
fig = px.bar(df_radiohead_soft_facts, x = 'album_name',y= 'mean_pct_sad', color = 'album_name',
            labels={'album_name':'Album Name', 'mean_pct_sad':'Mean Percentage of Sad Lyrics'},
            title= '<b>Radiohead Albums by Mean Percentage of Sad Lyrics</b> <br> "The Bends" and "The King Of Limbs" have the most amount of sad lyrics in them.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.07,
                    y= -0.7)

fig.show()

This is an interesting result. 'The King of Limbs' provides the most cheerful musical approach, but offers the most sad words in its lyrics at the same time. Kind of balancing out in terms of tonal approach. The same is true for 'The Bends' to a lesser extent. 'A Moon Shaped Pool' however seems to stay true to itself. It combines rather negative musical tone with sad lyrics.

### Soft facts #3 - Album Sad Lyrics - Absolute


And yet, we need to keep in mind here that this is only a relative approach. We can also look at this from an absolute angle and ask the question, which album posted the most total sad words in their lyrics (also taking lyrical density into account).

In [366]:
#Plot Absolute Sadness
fig = px.bar(df_radiohead_soft_facts, x = 'album_name',y= 'total_sad_words', color = 'album_name',
            labels={'album_name':'Album Name', 'total_sad_words':'Total Amount of Sad Words'},
            title= '<b>Radiohead Albums by Total Amount of Sad Lyrics</b> <br> "The Bends" is still on top, but "Hail To the Thief" makes a huge jump due to its lyrical density.',
            template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.055,
                    y= -0.7)

fig.show()

'The Bends' and 'OK Computer' both stay on top of the list here. 'A Moon Shaped Pool' also holds its own. However, 'The King Of Limbs' slides a bit, since it is Radiohead's shortest album with the least amount of words sung. 14 track 'monster release' 'Hail To the Thief' takes a huge leap in the other direction for the same reason, looking at the albums from an absolute perspective. A larger amount of songs and lyrics simply offers more opportunities to use sad words.

## Can we possibly combine these soft approaches to quantify the most melancholy record?

### The Triangulation of Sadness

As we have seen, to determine the sadness or negativeness of a track we can look at it from a musical and lyrical perspective. And there it makes sense to differentiate between relative and absolute apporaches. But what if we want to combine the three concepts of valence, percentage of sad words and lyrical density?

We can try to do this graphically and put all songs in a ternary plot:


In [367]:
#Plot Ternary Plot
fig = px.scatter_ternary(df_radiohead, a = 'pct_sad', b = 'lyrical_density', c = 'valence', color = 'album_name', 
            labels={'album_name':'Album Name', 'pct_sad':'Percentage of Sad Words', 'valence':'Valence', 'lyrical_density':'Lyrical Density'},
            title= '<b>Radiohead Songs - A Triangulation of Sadness</b>',
            template='plotly_dark', hover_name= 'track_name')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.025,
                    y= -0.5)

#Add Little Annotation
fig.add_annotation(x=0.425, y=0.6,
            text="'True Love Waits' - their saddest song?",
            showarrow=True,
            arrowhead=1,
            bordercolor='White',
            font_color = 'Black',
            bgcolor='white')


fig.show()

Looking at this plot we can see:

* There are quite a few happy songs in the lower right corner. They have high musical postivity and contain little to no sad words.

* The bulk of Radiohead's songs can be found in a zone, where less than 20% of the lyrics are sad, with varying degrees of general lyrical density and musical valence. Their carried sadness can depend on the eye of the beholder. 

* (Remember: *Taking this 20% threshold here, is kind of special. We are basically saying here, that a song is not sad if less than every fifth word sung has a negative sentiment. For more upbeat artists this infliction point might be much lower, triggering the sad label much earlier.*)

* All songs floating towards the upper left can be considered Radiohead's sad tracks. Their musical tone is rather depressing or angry. Same goes for the lyrics that contain more and more bitter, melancholy or sorrowful bits. 

* Checking this graph we have clear contenders for happiest ('Hunting Bears', 'Feral') and saddest songs ('True Love Waits', 'Motion Picture Soundtrack').


### The Gloom Index

Data scientist Charlie Thompson came up with an interesting way to translate this graphical apporach into a number. He came up with the gloom index incorporating musical and lyrical tone while also taking lyrical density into account.

As a formula he used this:

Gloom Index = (1 - ((1 - Valence) + (Percentage of sad words * (1 + Lyrical Density))))/2

He later then rescaled the metric to fit within 1 and 100. This spin makes comparing the entire Radiohead catalogue even easier. The saddest song has a score of 1 and the least sad track a rating of 100. Every other song places itself in between these two extremes.

A quick look at the general distribution of the entirety of tracks helps us to see, if this metric does a good job in evaluating our sample:

In [368]:
#Data Prep
df_radiohead_gloom_index = df_radiohead.sort_values('gloom_index')
df_radiohead_gloom_index

Unnamed: 0,track_name,valence,duration_ms,lyrics,album_name,album_release_year,album_img,pct_sad,word_count,lyrical_density,gloom_index,amount_sad_words,duration_mins
100,True Love Waits,0.0378,283464,i’ll drown my beliefs to have your babies i’ll...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.2381,42,0.148167,1.00,10.0,4.724400
88,Give Up The Ghost,0.1570,290067,don't haunt me don't hurt me don't haunt me ga...,The King Of Limbs,2011,https://i.scdn.co/image/4a93b23fa39b39a4050a95...,0.2742,62,0.213744,6.46,17.0,4.834450
45,Motion Picture Soundtrack,0.0425,200483,red wine and sleeping pills help me get back t...,Kid A,2000,https://i.scdn.co/image/0a6b1c237ab9f7d7da0a04...,0.1667,24,0.119711,9.35,4.0,3.341383
28,Let Down,0.1310,299560,transport motorways and tramlines starting and...,OK Computer,1997,https://i.scdn.co/image/f89c1ecdd0cc5a23d5ad73...,0.1875,64,0.213647,13.70,12.0,4.992667
48,Pyramid Song,0.0655,288733,i jumped in the river and what did i see black...,Amnesiac,2001,https://i.scdn.co/image/7d2a9481f3136f8f9dda19...,0.1364,44,0.152390,14.15,6.0,4.812217
...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,Go To Sleep,0.6540,201507,something for the rag and bone man over my dea...,Hail To the Thief,2003,https://i.scdn.co/image/5ded47fd3d05325dd0faaf...,0.0000,37,0.183616,82.28,0.0,3.358450
80,Jigsaw Falling Into Place,0.8180,248893,just as you take my hand just as you write my ...,In Rainbows,2007,https://i.scdn.co/image/00d97c99f9fb5872e9a44f...,0.1077,65,0.261156,84.85,7.0,4.148217
73,Bodysnatchers,0.7210,242293,i do not understand what it is i've done wrong...,In Rainbows,2007,https://i.scdn.co/image/00d97c99f9fb5872e9a44f...,0.0000,52,0.214616,88.40,0.0,4.038217
85,Feral,0.7510,192743,you are not mine and i am not yours and that's...,The King Of Limbs,2011,https://i.scdn.co/image/4a93b23fa39b39a4050a95...,0.0000,5,0.025941,91.14,0.0,3.212383


In [369]:
#Plot Gloom Index
fig = px.violin(df_radiohead_gloom_index, y = 'gloom_index', hover_name= 'track_name',
            labels={'gloom_index': 'Gloom Index'}, box = True, points= 'all', title = 'Distribution of the Gloom Index', template='plotly_dark')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.05,
                    y= -0.5)


fig.show()

We can see that even only compared among each other the distribution of the band's songs is not even. The majority of their repertoire is on the less cheerful side. Which checks out with the general feel of their music. For further reference it is important to remember that their median song scores a 40.7.

## So, what album is the gloomiest?

By averaging out the gloom scores by LP we can find the saddest album of the band:

In [370]:
#Data Prep
df_radiohead_gloom_index = df_radiohead.groupby(['album_name']).agg(gloom_mean = ('gloom_index', 'mean'))
df_radiohead_combined =  pd.merge(df_radiohead, df_radiohead_gloom_index, on='album_name')
df_radiohead_combined

Unnamed: 0,track_name,valence,duration_ms,lyrics,album_name,album_release_year,album_img,pct_sad,word_count,lyrical_density,gloom_index,amount_sad_words,duration_mins,gloom_mean
0,You,0.3050,208667,you are the sun and moon and stars are you and...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0000,19,0.091054,50.39,0.0,3.477783,46.851667
1,Creep,0.0960,238640,when you were here before couldn't look you in...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0784,51,0.213711,22.60,4.0,3.977333,46.851667
2,How Do You?,0.2640,132173,he's bitter and twisted he knows what he wants...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0952,21,0.158883,36.56,2.0,2.202883,46.851667
3,Stop Whispering,0.2790,325627,and the wise man said i don't want to hear you...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0435,46,0.141266,43.48,2.0,5.427117,46.851667
4,Thinking About You,0.4190,161533,been thinking about you your records are here ...,Pablo Honey,1993,https://i.scdn.co/image/e17011b2aa33289dfa6c08...,0.0000,39,0.241437,60.80,0.0,2.692217,46.851667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Identikit,0.3540,266644,repeated background hook a moon shaped pool da...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1800,100,0.375032,32.25,18.0,4.444067,31.927273
97,The Numbers,0.0545,345887,it holds us like a phantom it touches like a b...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.0455,44,0.127209,22.82,2.0,5.764783,31.927273
98,Present Tense,0.3450,306581,this dance this dance is like a weapon is like...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1795,39,0.127209,35.56,7.0,5.109683,31.927273
99,Tinker Tailor Soldier Sailor Rich Man Poor Man...,0.0517,303689,all the holes at once are coming alive set fre...,A Moon Shaped Pool,2016,https://i.scdn.co/image/0d1460c036897175f4631e...,0.1154,26,0.085614,15.80,3.0,5.061483,31.927273


In [371]:
#Plot Saddest Album
fig = px.scatter(df_radiohead_combined, x = 'album_name',y= 'gloom_index', color = 'album_name',
            labels={'album_name':'Album Name', 'gloom_index':'Gloom Index'},
            title= '<b>Radiohead Albums by Gloom Index</b> <br>And we have a winner: "A Moon Shaped Pool" is the saddest album.',
            template='plotly_dark', hover_name= 'track_name')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -0.025,
                    y= -0.7)

#Add Line For Means
fig.add_scatter(x= df_radiohead_combined['album_name'], y= df_radiohead_combined['gloom_mean'], mode='markers, lines', showlegend= False, text='gloom_mean')
fig.update_traces(textposition="bottom right")

fig.show()

'A Moon Shaped Pool' (31.9) has by far the lowest mean gloom value, meaning it is comes in as the saddest Radiohead album. It is a close call for rank #2, but 'Amnesiac' (38.0) edges out 'OK Computer' (39.2) by a hair. Interestingly the run 'Hail To the Thief', 'In Rainbows' and 'The King Of Limbs' again comes up as possibly the band's happy period, posting the highest mean gloom values, translating into the happiest works of the band to date. The five most positive songs of Radiohead can be found on these three records.

## Let's translate this work into playlists

With this lengthy look at the band's discography, we can try to translate our findings into playlists that could offer you numbers-based song suggestions depending on your mood.

If you are looking for gut-wrenching time full of heartache, sorrow and despair the data would suggest these ten songs:




In [372]:
#Data Prep
df_radiohead_sad_playlist = df_radiohead.sort_values('gloom_index').head(10)
df_radiohead_sad_playlist = df_radiohead_sad_playlist.sort_values('gloom_index', ascending=False)

#Plot Sad Plalist
fig = px.scatter(df_radiohead_sad_playlist, x= 'gloom_index', y = 'track_name',
            labels={'track_name':'Track Name', 'gloom_index':'Gloom Index'},
            title= '<b>The Ultimate Radiohead Playlist</b><br>The Numbers-Based Sadness Edition',
            template='plotly_dark', hover_name= 'track_name')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -1.03,
                    y= -0.5)
fig.show()

If you are looking for more positivity and also danceability, then the data points you this way. According to the numbers, these are the least sad songs of the band:



In [373]:
#Data Prep
df_radiohead_sad_playlist = df_radiohead.sort_values('gloom_index', ascending= False).head(10)
df_radiohead_sad_playlist = df_radiohead_sad_playlist.sort_values('gloom_index')

#Plot Least Sad Playlist
fig = px.scatter(df_radiohead_sad_playlist, x= 'gloom_index', y = 'track_name', color_discrete_sequence= ['yellow'],
            labels={'track_name':'Track Name', 'gloom_index':'Gloom Index'},
            title= '<b>The Ultimate Radiohead Playlist</b><br>The Numbers-Based Least Sad Edition',
            template='plotly_dark', hover_name= 'track_name')

#Add Signature
fig.add_annotation(text='Data Source: <a href=”https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download”>Radiohead Data</a> <br>Viz: <a href=”https://twitter.com/_prospecttheory”>@_prospecttheory</a> <br> <br> <br> <br> <br>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x= -1.03,
                    y= -0.5)
fig.show()

However, be careful. Let's just say the computer made a few interesting choices here. These tracks won't be a joyous love fest free of any challenging thoughts - it is Radiohead after all. Enjoy them anyway!