# The Radiohead Project - What can we learn about their discography?

I recently discovered this interesting data set (https://www.kaggle.com/datasets/lavagod/radiohead/versions/1?resource=download), containing all studio album songs of one of my favorite bands of all time - Radiohead.


If you are interested in how this data was gathered, check this blog article: https://www.thompsonanalytics.com/blog/fitter-happier/


In this notebook I want to see, if I can come to similar results using Python.


And now, let's have some fun with it and learn something new about this band, shall we?!

# Data Preperation

In [129]:
#Loading some libraries first

# Data

import numpy as np
import pandas as pd 

#Plots

import plotly.express as px 
import matplotlib.pyplot as plt


In [130]:
#Loading the data

url = 'https://raw.githubusercontent.com/TobiasBergerData/The-Radiohead-Project/main/radiohead_with_sentiment.csv'
#url = 'https://raw.githubusercontent.com/TobiasBergerData/The-Radiohead-Project/main/radiohead_with_sentiment_spacy.csv'
#url = 'https://raw.githubusercontent.com/TobiasBergerData/The-Radiohead-Project/main/radiohead_with_sentiment_gensim.csv'
#url = 'https://raw.githubusercontent.com/TobiasBergerData/The-Radiohead-Project/main/radiohead_with_sentiment_scikitlearn.csv'

df_radiohead = pd.read_csv( url,
                        sep=',',
                        index_col=False,
                        skipinitialspace=True,
                        encoding='cp1252',
                        )

In [131]:
#Checking the data

df_radiohead.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Artist                101 non-null    object 
 1   Album                 101 non-null    object 
 2   Track                 101 non-null    object 
 3   TrackNumber_On_Album  101 non-null    int64  
 4   Duration_ms           101 non-null    int64  
 5   ReleaseYear           101 non-null    int64  
 6   Valence               101 non-null    float64
 7   Energy                101 non-null    float64
 8   Danceability          101 non-null    float64
 9   Acousticness          101 non-null    float64
 10  Instrumentalness      101 non-null    float64
 11  Speechiness           101 non-null    float64
 12  Liveness              101 non-null    float64
 13  Tempo                 101 non-null    float64
 14  Key                   101 non-null    int64  
 15  Mode                  1

We see that there are 3 NaNs in the lyrics column. To check the validity of this dataset, we should examine, which tracks these are:

In [132]:
#Checking for NaNs
df_radiohead.loc[df_radiohead.loc[:,'Lyrics'].isna()]

Unnamed: 0,Artist,Album,Track,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,...,Liveness,Tempo,Key,Mode,TimeSignature,Lyrics,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density
40,Radiohead,Kid A,Treefingers,5,222600,2000,0.0577,0.146,0.165,0.827,...,0.109,134.508,6,1,3,,0.0,0.0,0.0,0.0
46,Radiohead,Kid A,Untitled,11,52694,2000,0.0769,0.225,0.369,0.992,...,0.106,64.655,7,1,3,,0.0,0.0,0.0,0.0
55,Radiohead,Amnesiac,Hunting Bears,9,121200,2001,0.0736,0.264,0.295,0.853,...,0.0962,143.191,7,1,3,,0.0,0.0,0.0,0.0


We see, that only true instrumental songs come up, which is fine. They do indeed not have any words but must be left in as part of the albums recorded. The dataset is correct. It will be interesting to see how these songs with interlude character will influence scores and stats.

Nerdy side note: 

*Technically "Untitled" wasn't even its own song in the initial release of Kid A, but a hidden track that is part of Motion Picture Soundtrack. Spotify did split it up, when putting it up on the platform. However, we will treat it as its own song in this analysis.*

In [133]:
#Get an overview over the general variables and their distributions

df_radiohead.describe()

Unnamed: 0,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,Instrumentalness,Speechiness,Liveness,Tempo,Key,Mode,TimeSignature,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density
count,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0
mean,6.217822,252550.584158,2002.049505,0.317191,0.558554,0.397,0.346389,0.392118,0.050522,0.157369,118.171634,5.336634,0.584158,3.851485,69.455446,4.019802,0.051459,0.285989
std,3.416442,60278.963655,7.058861,0.219619,0.231679,0.15709,0.353678,0.330421,0.046542,0.102232,31.409519,3.441736,0.495325,0.497718,34.500007,5.718357,0.061087,0.174365
min,1.0,52694.0,1993.0,0.0378,0.11,0.105,2e-05,4e-06,0.0254,0.0545,58.996,0.0,0.0,1.0,0.0,0.0,0.0,0.0
25%,3.0,217800.0,1997.0,0.135,0.383,0.271,0.0193,0.054,0.0318,0.0974,91.915,2.0,0.0,4.0,47.0,0.0,0.0,0.178805
50%,6.0,257480.0,2001.0,0.273,0.587,0.366,0.178,0.338,0.0358,0.111,114.436,5.0,1.0,4.0,64.0,2.0,0.033333,0.261872
75%,9.0,290213.0,2007.0,0.48,0.735,0.515,0.712,0.725,0.0519,0.18,139.149,7.0,1.0,4.0,89.0,6.0,0.069767,0.354884
max,14.0,387213.0,2016.0,0.844,0.976,0.721,0.992,0.941,0.343,0.619,200.127,11.0,1.0,5.0,181.0,40.0,0.350877,1.261367


In [134]:
#Converting the duration of the songs to minutes

df_radiohead['Duration_mins'] = df_radiohead['Duration_ms'] / 60000

# EDA

## Albums overview

We can start with the basics. Up to this point Radiohead has released 9 studio albums. Let's check them out and see how the band has progressed over time.

### Hard fact #1 - Number of Tracks



In [135]:
#Plot Number of Tracks
fig = px.bar(df_radiohead, x = 'Album', color = 'Album', hover_name= 'Track',
            labels={'Album':'Album Name', 'Duration_mins': 'Duration in Minutes', 'Track':'Track Name', 'count':'Number of Tracks'},
            title= '<b>Radiohead Albums by Number of Tracks</b> <br>It seems like albums have been getting shorter, simply going by number of songs.',
            template='plotly_dark')

fig.show()

While they started out with a fairly consistent output of 12 tracks per album, their work tended to feature less tracks in the later years with the exeption of 'Hail To the Thief'. 

### Hard fact #2 - Duration of the Albums

But do their later albums really offer less music overall?

In [136]:
#Plot Duration of Albums
fig = px.bar(df_radiohead, x = 'Album',y= 'Duration_mins', color = 'Album', hover_name= 'Track',
            labels={'Album':'Album Name', 'Duration_mins': 'Duration in Minutes', 'Track':'Track Name', 'count':'Number of Tracks'},
            title= '<b>Radiohead Albums by Duration of Tracks</b> <br>Going by actual track duration the picture looks a little different.',
            template='plotly_dark')

fig.show()

Bringing track duration into the mix 'Hail To the Thief' stays #1. All simply due to the sheer number of tracks. But 'OK Computer' and 'A Moon Shaped Pool' are now revealed as LPs with much music to offer.

### Hard fact #3 - Temporal Profiles of the Albums

Radiohead always treated their album sequencing with great care to create a holistic experiencing for the listener. Can we find some patterns in the data to learn a little more about their preferences?



In [137]:
#Plot Temporal Profiles
fig = px.bar(df_radiohead, y= 'Duration_mins', color = 'Album', hover_name= 'Track',
            labels={'Album':'Album Name', 'Duration_mins': 'Duration in Minutes', 'Track':'Track Name'},
            title= '<b>The Temporal Profile of Every Radiohead Album</b> <br>How about those valleys in the profiles?',
            template='plotly_dark', facet_col= 'Album', facet_col_wrap=3, height=600, width=2000,)


fig.show()

Looking at these profiles we can see that the band loves to put in a tempo change song more often than not. In 6 out of 9 records Radiohead inserted sub-three-minute tracks at the halfway to two-third mark to alter the mood of the musical flow. It is an interesting stylistic choice.

### Hard facts #4 - Number of words

Lead songwriter Thom Yorke produces often haunting, but beautiful lyrics. By looking at the number of words in the individual tracks we can see, if the amount the band actually had to say did change over time:

In [138]:
#Plot Number of Words
fig = px.bar(df_radiohead, x = 'Album',y= 'Total_Words', color = 'Album', hover_name= 'Track',
            labels={'Album':'Album Name', 'Duration_mins': 'Duration in Minutes', 'Track':'Track Name', 'count':'Number of Tracks', 'Total_Words':'Number of Words'},
            title= '<b>Radiohead Albums by Number of Words</b> <br>The amount of lyrics over time has been fairly inconsistent.',
            template='plotly_dark')


fig.show()

It is interesting to see that their more electronic and less rock-heavy albums 'Kid A' and 'Amnesiac' offer the least amount of lyrics. Next to these,  their shortest work, in terms of track number and duration album - 'The King of Limbs' -, also offers the least amount of lyrics. 'Hail To the Thief', regarded as their most political work, seems to have the most to say. 

Nerdy Sidenote:

*Spotting the iconic 'OK Computer' track 'Fitter Happier' (featuring a robot voice as a narrator) is rather easy looking at this plot.*

### Soft facts #1 - Album Valence

Valence is a metric Spotify has come up with. In their documentation they define this variable in the following way:

*A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).*

Hence, the metric scores the music itself in terms of sentiment. By determining the combined mean valence for the albums, we can compare them in musical positiveness (or rather negativeness in the case of Radiohead?).

In [139]:
#Data Aggregation
df_radiohead_soft_facts = df_radiohead.groupby(['Album', 'ReleaseYear'], observed= True, as_index= False).agg(mean_valence = ('Valence', 'mean'), total_sad_words = ('Sad_Words', 'sum'), total_words = ('Total_Words', 'sum'))
df_radiohead_soft_facts['pct_sad_album'] = df_radiohead_soft_facts['total_sad_words'] / df_radiohead_soft_facts['total_words'] * 100
df_radiohead_soft_facts = df_radiohead_soft_facts.sort_values('ReleaseYear')
df_radiohead_soft_facts


Unnamed: 0,Album,ReleaseYear,mean_valence,total_sad_words,total_words,pct_sad_album
6,Pablo Honey,1993,0.322,30.0,772.0,3.88601
7,The Bends,1995,0.3388,62.0,946.0,6.553911
5,OK Computer,1997,0.291733,56.0,882.0,6.349206
4,Kid A,2000,0.270564,15.0,667.0,2.248876
1,Amnesiac,2001,0.207945,43.0,626.0,6.86901
2,Hail To the Thief,2003,0.399529,52.0,1046.0,4.971319
3,In Rainbows,2007,0.39673,27.0,642.0,4.205607
8,The King Of Limbs,2011,0.44355,68.0,543.0,12.52302
0,A Moon Shaped Pool,2016,0.203018,53.0,891.0,5.948373


In [140]:
#Plot Valence
fig = px.bar(df_radiohead_soft_facts, x = 'Album',y= 'mean_valence', color = 'Album',
            labels={'Album':'Album Name', 'mean_valence':'Mean Valence'},
            title= '<b>Radiohead Albums by Musical Positiveness (Valence)</b> <br>In their album run from 2003 to 2011 Radiohead put forward greater musical positiveness than usual.',
            template='plotly_dark')

#Add little annotation
fig.add_annotation(x=6, y=0.5,
            text="Looks like this was their happy period!",
            showarrow=False,
            bordercolor='White',
            font_color = 'Black',
            bgcolor='white')

fig.show()

We can see why Radiohead is not known for their particularly cheerful music. None of their albums come even close to a valence score of 0.5. Meaning that all of their albums must be categorized as rather negative in their musical sentiment. However, their album run from 'Hail To the Thief' to 'The King Of Limbs' via 'In Rainbows' sticks out. In terms of musical positiveness this could be considered as their 'cheerful period'. 'A Moon Shaped Pool' scoring the lowest here, also makes sense. The production of the record was heavily colored by the Thom Yorke's separation from his partner, Rachel Owen, of almost 25 years.

### Soft fact #2 - Album Sad Lyrics in Relative Terms

The data set also includes a dimension showing what percentage of lyrics in a particular song can be considered as sad. So after the tonal analysis of the albums, we can also check if the words that go with it mirror the musical arrangement of the LPs.

In [141]:
#Plot Relative Sadness
fig = px.bar(df_radiohead_soft_facts, x = 'Album',y= 'pct_sad_album', color = 'Album',
            labels={'Album':'Album Name', 'pct_sad_album':'Percentage of Sad Lyrics'},
            title= '<b>Radiohead Albums by Percentage of Sad Lyrics</b> <br> "The King Of Limbs" contains by far the highest percentage of sad words.',
            template='plotly_dark')


fig.show()

This is an interesting result. 'The King of Limbs' provides the most cheerful musical approach, but offers the most sad words in its lyrics at the same time. Kind of balancing out in terms of tonal approach. 'A Moon Shaped Pool' and 'Amnesiac' however seem to stay true to themselves. They both combine rather negative musical tone with sad lyrics. The opposite seems to be happening in 'In Rainbows'. There a more cheerful musical tone it matched with a only a small percentage of depressing lyrics. Might be the reason, why many fans find this album the most approachable.

### Soft fact #3 - Album Sad Lyrics in Absolute Terms


And yet, we need to keep in mind here that this is only a relative approach. We can also look at this from an absolute angle and ask the question, which album posted the most total sad words in their lyrics.

In [142]:
df_radiohead

Unnamed: 0,Artist,Album,Track,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,...,Tempo,Key,Mode,TimeSignature,Lyrics,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density,Duration_mins
0,Radiohead,Pablo Honey,You,1,208666,1993,0.2980,0.707,0.222,0.000945,...,112.663,9,1,3,you are the sun and moon and stars are you and...,33.0,0.0,0.000000,0.158147,3.477767
1,Radiohead,Pablo Honey,Creep,2,238640,1993,0.1040,0.430,0.515,0.009700,...,91.844,7,1,4,when you were here before couldnt look you in ...,85.0,5.0,0.058824,0.356185,3.977333
2,Radiohead,Pablo Honey,How Do You?,3,132173,1993,0.2380,0.964,0.185,0.000659,...,147.351,9,1,4,hes bitter and twisted he knows what he wants ...,38.0,2.0,0.052632,0.287502,2.202883
3,Radiohead,Pablo Honey,Stop Whispering,4,325626,1993,0.2980,0.696,0.212,0.000849,...,122.350,2,1,4,and the wise man said i dont want to hear your...,82.0,2.0,0.024390,0.251823,5.427100
4,Radiohead,Pablo Honey,Thinking About You,5,161533,1993,0.4200,0.370,0.365,0.705000,...,103.442,7,1,4,been thinking about you your records are here ...,81.0,0.0,0.000000,0.501446,2.692217
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Radiohead,A Moon Shaped Pool,Identikit,7,266643,2016,0.3500,0.459,0.692,0.063200,...,150.695,11,0,4,a moon shaped pool dancing clothes wont let me...,176.0,18.0,0.102273,0.660059,4.444050
97,Radiohead,A Moon Shaped Pool,The Numbers,8,345886,2016,0.0551,0.372,0.282,0.651000,...,105.271,5,1,3,it holds us like a phantom it touches like a ...,73.0,2.0,0.027397,0.211052,5.764767
98,Radiohead,A Moon Shaped Pool,Present Tense,9,306581,2016,0.3360,0.407,0.462,0.912000,...,91.915,1,0,4,this dance this dance is like a weapon is like...,63.0,8.0,0.126984,0.205492,5.109683
99,Radiohead,A Moon Shaped Pool,Tinker Tailor Soldier Sailor Rich Man Poor Man...,10,303688,2016,0.0508,0.436,0.355,0.730000,...,90.388,11,1,4,all the holes at once are comin alive set free...,39.0,3.0,0.076923,0.128421,5.061467


In [143]:
#Plot Absolute Sadness
fig = px.bar(df_radiohead_soft_facts, x = 'Album',y= 'total_sad_words', color = 'Album',
            labels={'Album':'Album Name', 'total_sad_words':'Total Amount of Sad Words'},
            title= '<b>Radiohead Albums by Total Amount of Sad Lyrics</b> <br> "The King Of Limbs" is still on top, but "The Bends" jumps to #2.',
            template='plotly_dark')

fig.show()

'The King Of Limbs' slides a bit, since it is Radiohead's shortest album with the least amount of words sung. 'The Bends' and 'OK Computer' both stay on top of the list here. 'A Moon Shaped Pool' also holds its own.  14 track 'monster release' 'Hail To the Thief' jumps 'Amnesiac' and enters the top 5, which makes a lot of sense. A larger amount of songs and lyrics simply offers more opportunities to use sad words.

## Analysis: Can we possibly combine these soft approaches to quantify the most melancholy record?

### The Triangulation of Sadness

As we have seen, to determine the sadness or negativeness of a track we can look at it from a musical and lyrical perspective. And there it makes sense to differentiate between relative and absolute apporaches. But what if we want to combine the three concepts of valence, percentage of sad words and lyrical density?

We can try to do this graphically and put all songs in a ternary plot:


In [144]:
#Plot Ternary Plot
fig = px.scatter_ternary(df_radiohead, a = 'Pct_Sad_Words', b = 'Lyrical_Density', c = 'Valence', color = 'Album', 
            labels={'Album':'Album Name', 'Pct_Sad_Words':'Percentage of Sad Words', 'valence':'Valence', 'Lyrical_Density':'Lyrical Density'},
            title= '<b>Radiohead Songs - A Triangulation of Sadness</b>',
            template='plotly_dark', hover_name= 'Track', height=600, width=1000)



#Add Little Annotation
fig.add_annotation(x=0.415, y=0.61,
            text="'True Love Waits' - their saddest song?",
            showarrow=True,
            arrowhead=1,
            bordercolor='White',
            font_color = 'Black',
            bgcolor='white')


fig.show()

Looking at this plot we can see:

* There are quite a few happy songs in the lower right corner. They have high musical postivity and contain little to no sad words.

* The bulk of Radiohead's songs can be found in a zone, where less than 20% of the lyrics are sad, with varying degrees of general lyrical density and musical valence. Their carried sadness can depend on the eye of the beholder. 

* (Remember: *Taking this 20% threshold here, is kind of special. We are basically saying here, that a song is not sad if less than every fifth word sung has a negative sentiment. For more upbeat artists this infliction point might be much lower, triggering the sad label much earlier.*)

* All songs floating towards the upper left can be considered Radiohead's sad tracks. Their musical tone is rather depressing or angry. Same goes for the lyrics that contain more and more bitter, melancholy or sorrowful bits. 

* Checking this graph we have clear contenders for happiest ('Hunting Bears', 'Feral') and saddest songs ('True Love Waits', 'Give Up The Ghost', 'Motion Picture Soundtrack').


### The Gloom Index

Data scientist Charlie Thompson came up with an interesting way to translate this graphical apporach into a number. He came up with the gloom index incorporating musical and lyrical tone while also taking lyrical density into account.

As a formula he used this:

Gloom Index = (1 - ((1 - Valence) + (Percentage of sad words * (1 + Lyrical Density))))/2

He later then rescaled the metric to fit within 1 and 100. This spin makes comparing the entire Radiohead catalogue even easier. The saddest song has a score of 1 and the least sad track a rating of 100. Every other song places itself in between these two extremes.

A quick look at the general distribution of the entirety of tracks helps us to see, if this metric does a good job in evaluating our sample:

In [145]:
# Rescaling Function
def rescale_metric(original_value, min_value, max_value, new_min=1, new_max=100):
    scaled_value = ((original_value - min_value) / (max_value - min_value)) * (new_max - new_min) + new_min
    return scaled_value

# Creating Gloom Index With Rescaling
df_radiohead['Gloom_Index'] = (1 - ((1 - df_radiohead['Valence']) + (df_radiohead['Pct_Sad_Words'] * (1 + df_radiohead['Lyrical_Density'])))) / 2
df_radiohead['Gloom_Index'] = df_radiohead['Gloom_Index'].apply(lambda x: round(rescale_metric(x, df_radiohead['Gloom_Index'].min(), df_radiohead['Gloom_Index'].max()), 2))
df_radiohead_gloom_index = df_radiohead.sort_values('Gloom_Index')
df_radiohead_gloom_index


Unnamed: 0,Artist,Album,Track,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,...,Key,Mode,TimeSignature,Lyrics,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density,Duration_mins,Gloom_Index
88,Radiohead,The King Of Limbs,Give Up The Ghost,7,290067,2011,0.1570,0.262,0.305,0.88600,...,7,1,4,dont hurt me dont hurt me dont hurt me dont hu...,114.0,40.0,0.350877,0.393013,4.834450,1.00
100,Radiohead,A Moon Shaped Pool,True Love Waits,11,283463,2016,0.0381,0.132,0.401,0.93800,...,0,1,4,ill drown my beliefs to have your babies ill d...,43.0,12.0,0.279070,0.151695,4.724383,5.08
45,Radiohead,Kid A,Motion Picture Soundtrack,10,200482,2000,0.0427,0.130,0.112,0.92100,...,7,1,4,red wine and sleeping pills help me get back t...,48.0,4.0,0.083333,0.239423,3.341367,23.83
54,Radiohead,Amnesiac,Dollars and Cents,8,291733,2001,0.0883,0.565,0.327,0.39500,...,7,1,4,there are better things to talk about be const...,120.0,12.0,0.100000,0.411335,4.862217,24.49
64,Radiohead,Hail To the Thief,We Suck Young Blood,7,296706,2003,0.0378,0.239,0.164,0.87800,...,3,0,4,are you hungry are you sick are you begging fo...,43.0,3.0,0.069767,0.144925,4.945100,25.39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,Radiohead,Hail To the Thief,Go To Sleep,5,201506,2003,0.6600,0.860,0.288,0.10300,...,2,0,4,something for the rag and bone man over my dea...,57.0,0.0,0.000000,0.282870,3.358433,84.51
80,Radiohead,In Rainbows,Jigsaw Falling Into Place,9,248893,2007,0.8070,0.832,0.462,0.10800,...,11,0,4,just as you take my hand just as you write my ...,111.0,7.0,0.063063,0.445975,4.148217,89.21
73,Radiohead,In Rainbows,Bodysnatchers,2,242293,2007,0.7330,0.976,0.342,0.00443,...,2,1,4,i do not understand what it is ive done wrong ...,81.0,1.0,0.012346,0.334306,4.038217,89.27
85,Radiohead,The King Of Limbs,Feral,4,192742,2011,0.7290,0.777,0.490,0.00101,...,7,1,4,youre not youre not mine im not yours its all ...,12.0,0.0,0.000000,0.062259,3.212367,90.32


In [146]:
#Plot Gloom Index
fig = px.violin(df_radiohead_gloom_index, y = 'Gloom_Index', hover_name= 'Track',
            labels={'Gloom_Index': 'Gloom Index'}, box = True, points= 'all', title = 'Distribution of the Gloom Index', template='plotly_dark', height = 500, width = 1000)

fig.show()

We can see that even only compared among each other the distribution of the band's songs is not even. The majority of their repertoire is on the less cheerful side. Which checks out with the general feel of their music. For further reference it is important to remember that their median song scores a 46.35.

## So, what album is the gloomiest?

By averaging out the gloom scores by LP we can find the saddest album of the band:

In [147]:
#Data Grouping
df_radiohead_gloom_index = df_radiohead.groupby(['Album']).agg(Gloom_Mean = ('Gloom_Index', 'mean'))
df_radiohead_combined =  pd.merge(df_radiohead, df_radiohead_gloom_index, on='Album')
df_radiohead_combined

Unnamed: 0,Artist,Album,Track,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,...,Mode,TimeSignature,Lyrics,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density,Duration_mins,Gloom_Index,Gloom_Mean
0,Radiohead,Pablo Honey,You,1,208666,1993,0.2980,0.707,0.222,0.000945,...,1,3,you are the sun and moon and stars are you and...,33.0,0.0,0.000000,0.158147,3.477767,54.03,51.881667
1,Radiohead,Pablo Honey,Creep,2,238640,1993,0.1040,0.430,0.515,0.009700,...,1,4,when you were here before couldnt look you in ...,85.0,5.0,0.058824,0.356185,3.977333,30.98,51.881667
2,Radiohead,Pablo Honey,How Do You?,3,132173,1993,0.2380,0.964,0.185,0.000659,...,1,4,hes bitter and twisted he knows what he wants ...,38.0,2.0,0.052632,0.287502,2.202883,43.27,51.881667
3,Radiohead,Pablo Honey,Stop Whispering,4,325626,1993,0.2980,0.696,0.212,0.000849,...,1,4,and the wise man said i dont want to hear your...,82.0,2.0,0.024390,0.251823,5.427100,51.46,51.881667
4,Radiohead,Pablo Honey,Thinking About You,5,161533,1993,0.4200,0.370,0.365,0.705000,...,1,4,been thinking about you your records are here ...,81.0,0.0,0.000000,0.501446,2.692217,64.30,51.881667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Radiohead,A Moon Shaped Pool,Identikit,7,266643,2016,0.3500,0.459,0.692,0.063200,...,0,4,a moon shaped pool dancing clothes wont let me...,176.0,18.0,0.102273,0.660059,4.444050,44.11,38.807273
97,Radiohead,A Moon Shaped Pool,The Numbers,8,345886,2016,0.0551,0.372,0.282,0.651000,...,1,3,it holds us like a phantom it touches like a ...,73.0,2.0,0.027397,0.211052,5.764767,30.78,38.807273
98,Radiohead,A Moon Shaped Pool,Present Tense,9,306581,2016,0.3360,0.407,0.462,0.912000,...,0,4,this dance this dance is like a weapon is like...,63.0,8.0,0.126984,0.205492,5.109683,44.34,38.807273
99,Radiohead,A Moon Shaped Pool,Tinker Tailor Soldier Sailor Rich Man Poor Man...,10,303688,2016,0.0508,0.436,0.355,0.730000,...,1,4,all the holes at once are comin alive set free...,39.0,3.0,0.076923,0.128421,5.061467,25.90,38.807273


In [153]:
#Plot Saddest Album
fig = px.scatter(df_radiohead_combined, x = 'Album',y= 'Gloom_Index', color = 'Album',
            labels={'Album':'Album Name', 'Gloom_Index':'Gloom Index'},
            title= '<b>Radiohead Albums by Gloom Index</b> <br>And we have a winner: "A Moon Shaped Pool" is their saddest album.',
            template='plotly_dark', hover_name= 'Track', width= 1500)

#Add Line For Means
fig.add_scatter(x= df_radiohead_combined['Album'], y= df_radiohead_combined['Gloom_Mean'], mode='markers, lines', showlegend= False, text='Gloom_Mean')
fig.update_traces(textposition="bottom right")

fig.show()

'A Moon Shaped Pool' (38.8) has the lowest mean gloom value, meaning it is comes in as the saddest Radiohead album. 'Amnesiac' (41.6) ranks second, edging out 'OK Computer' (46.3). Interestingly the run of 'Hail To the Thief', 'In Rainbows' and 'The King Of Limbs' again comes up as the band's happy period, posting the highest mean gloom values. The five most positive songs of Radiohead can be found on these three records. Once more we can find evidence why 'In Rainbows' might be their most approachable album. The numbers mark it as their most positive LP to date.

## Checking for some correlations

We are now officially in the random statistical experimentation part, you should not take too seriously here. I will check for some correlations. Nonetheless, let's plot some stuff for fun:

### Does a longer duration make it more likely for a song to be gloomy?

One could assume that more extensive tracks might be sadder, so why not check:

In [149]:
#Plot Duration vs Gloom
fig = px.scatter(df_radiohead_combined, x = 'Duration_mins',y= 'Gloom_Index',
            labels={'Duration_mins':'Duration of the Tracks', 'Gloom_Index':'Gloom Index'},
            title= '<b>Radiohead Tracks - Duration vs. Gloom</b><br>Longer songs might me a little sadder than shorter ones. Yet, the effect is marginal.',
            template='plotly_dark', hover_name= 'Track', trendline= 'ols')

fig.show()

Maybe there is a slight effect. But this is to marginal to call it anyting. Which makes a lot of sense, if you think about it. Even a few chords or notes can build a sad song. The saddest story could consist of one sentence, if it only leaves enough room for projection. So as expected, there seems to be no correlation between duration and the gloom index ranking. Especially keeping in mind, that their entire catalogue is tilted towards the gloomy side in general.

### Do more lyrics make it more likely for a song to be gloomy?

One could assume that more words sung might make a song sadder. Let's have a quick look:

In [150]:
#Plot Number of Words vs Gloom
fig = px.scatter(df_radiohead_combined, x = 'Total_Words',y= 'Gloom_Index',
            labels={'Total_Words':'Number of Words in Track', 'Gloom_Index':'Gloom Index'},
            title= '<b>Radiohead Tracks - Number of Words vs. Gloom</b><br>The bands does not need many words to create a sad song exerience.',
            template='plotly_dark', hover_name= 'Track', trendline= 'ols')

fig.show()

Interesting tendency here - the more lyrics we find in a Radiohead song the more likely it gets that it is one of the happier ones. Again, the effect does not look like something I would much stock in. However, it shows that that Thom Yorke is an expert in painting sad images even with the smallest amount of words.

## Let's translate this work into playlists

With this lengthy look at the band's discography, we can try to translate our findings into playlists that could offer you numbers-based song suggestions depending on your mood.

If you are looking for gut-wrenching time full of heartache, sorrow and despair, the data would suggest these ten songs:




In [156]:
df_radiohead.sort_values('Gloom_Index').head(10)


Unnamed: 0,Artist,Album,Track,TrackNumber_On_Album,Duration_ms,ReleaseYear,Valence,Energy,Danceability,Acousticness,...,Key,Mode,TimeSignature,Lyrics,Total_Words,Sad_Words,Pct_Sad_Words,Lyrical_Density,Duration_mins,Gloom_Index
88,Radiohead,The King Of Limbs,Give Up The Ghost,7,290067,2011,0.157,0.262,0.305,0.886,...,7,1,4,dont hurt me dont hurt me dont hurt me dont hu...,114.0,40.0,0.350877,0.393013,4.83445,1.0
100,Radiohead,A Moon Shaped Pool,True Love Waits,11,283463,2016,0.0381,0.132,0.401,0.938,...,0,1,4,ill drown my beliefs to have your babies ill d...,43.0,12.0,0.27907,0.151695,4.724383,5.08
45,Radiohead,Kid A,Motion Picture Soundtrack,10,200482,2000,0.0427,0.13,0.112,0.921,...,7,1,4,red wine and sleeping pills help me get back t...,48.0,4.0,0.083333,0.239423,3.341367,23.83
54,Radiohead,Amnesiac,Dollars and Cents,8,291733,2001,0.0883,0.565,0.327,0.395,...,7,1,4,there are better things to talk about be const...,120.0,12.0,0.1,0.411335,4.862217,24.49
64,Radiohead,Hail To the Thief,We Suck Young Blood,7,296706,2003,0.0378,0.239,0.164,0.878,...,3,0,4,are you hungry are you sick are you begging fo...,43.0,3.0,0.069767,0.144925,4.9451,25.39
99,Radiohead,A Moon Shaped Pool,Tinker Tailor Soldier Sailor Rich Man Poor Man...,10,303688,2016,0.0508,0.436,0.355,0.73,...,11,1,4,all the holes at once are comin alive set free...,39.0,3.0,0.076923,0.128421,5.061467,25.9
28,Radiohead,OK Computer,Let Down,5,299560,1997,0.143,0.676,0.351,0.000121,...,9,1,4,transport motorways and tramlines starting and...,95.0,12.0,0.126316,0.317132,4.992667,26.97
27,Radiohead,OK Computer,Exit Music (For A Film),4,267186,1997,0.195,0.276,0.293,0.224,...,4,0,4,wake from your sleep the drying of your tears ...,56.0,10.0,0.178571,0.209592,4.4531,27.17
48,Radiohead,Amnesiac,Pyramid Song,2,288733,2001,0.0679,0.336,0.12,0.786,...,11,0,4,i jumped in the river and what did i see black...,59.0,4.0,0.067797,0.204341,4.812217,27.78
57,Radiohead,Amnesiac,Life In a Glasshouse,11,276693,2001,0.0466,0.397,0.252,0.728,...,7,0,4,once again im in trouble with my only friend s...,83.0,3.0,0.036145,0.299971,4.61155,28.9


In [151]:
#Data Prep
df_radiohead_sad_playlist = df_radiohead.sort_values('Gloom_Index').head(10)
df_radiohead_sad_playlist = df_radiohead_sad_playlist.sort_values('Gloom_Index', ascending=False)

#Plot Sad Plalist
fig = px.scatter(df_radiohead_sad_playlist, x= 'Gloom_Index', y = 'Track',
            labels={'Track':'Track Name', 'Gloom_Index':'Gloom Index'},
            title= '<b>The Ultimate Radiohead Playlist</b><br>The Numbers-Based Sadness Edition',
            template='plotly_dark', hover_name= 'Track', width= 1000)

fig.show()

If you are looking for more positivity and also danceability, then the data points you this way. According to the numbers, these are the least sad songs of the band:



In [152]:
#Data Prep
df_radiohead_sad_playlist = df_radiohead.sort_values('Gloom_Index', ascending= False).head(10)
df_radiohead_sad_playlist = df_radiohead_sad_playlist.sort_values('Gloom_Index')

#Plot Least Sad Playlist
fig = px.scatter(df_radiohead_sad_playlist, x= 'Gloom_Index', y = 'Track', color_discrete_sequence= ['yellow'],
            labels={'Track':'Track Name', 'Gloom_Index':'Gloom Index'},
            title= '<b>The Ultimate Radiohead Playlist</b><br>The Numbers-Based Least Sad Edition',
            template='plotly_dark', hover_name= 'Track', width= 1000)

fig.show()

You can find this playlist on Spotify: https://open.spotify.com/playlist/0pEGqpboPL46oAD85JwuVS?si=ff5c4aacb41146b5

However, be careful. Let's just say the computer made a few interesting choices here. These tracks won't be a joyous love fest free of any challenging thoughts - it is Radiohead after all. Enjoy them anyway!