Music Python Notebook

Crystal Huynh

Dataset: https://corgis-edu.github.io/corgis/python/music/

"This library comes from the Million Song Dataset, which used a company called the Echo Nest to derive data points about one million popular contemporary songs. The Million Song Dataset is a collaboration between the Echo Nest and LabROSA, a laboratory working towards intelligent machine listening. The project was also funded in part by the National Science Foundation of America (NSF) to provide a large data set to evaluate research related to algorithms on a commercial size while promoting further research into the Music Information Retrieval field. The data contains standard information about the songs such as artist name, title, and year released. Additionally, the data contains more advanced information; for example, the length of the song, how many musical bars long the song is, and how long the fade in to the song was."

To open the file and set it up:

In [1]:
# open csv file, and assign it to a handler
import csv
music = "music.csv"
csvfile = open(music, 'r')

# create csv reader, and assign to an intermediate object
csv_rd_obj = csv.reader(csvfile)

# cast the intermediate object to a list
music = list(csv_rd_obj)

# close the csv file
csvfile.close()

To start off easy, let's work with the first row of data. However, we want the first row with actual values, not the column headers.

In [2]:
with open('music.csv', newline='') as f:
  reader = csv.reader(f)
  row1 = next(reader)  # gets the first line which is the header
  row2 = next(f) #gets the second line that actually has data
print(row2)

"0.581793766","0.401997543","ARD7TVE1187B99BFB1","0.0","0","0.0","Casual","0.0","hip hop","1.0","300848","0","0.0","0.0","0.643","0.58521","0.834","0.58521","218.93179","0.247","0.60211999","SOMZWCG12A8C13C480","1.0","0.736","-11.197","0","0.636","218.932","0.779","0.28519","92.198","4.0","0.778","0","0"



If we try to use a value from this row of data, we'll notice that row2[0] prints out '"'. We could use a range but this would take a lot of work so we need to figure out a way for our code to recognize 0.581793766 as row2[0], 0.401997543 as row2[1], and so on.

In [3]:
# gets rid of the "" so we can work with this list easier
row2 = [x.strip('"') for x in row2.split(',')]
print(row2)

['0.581793766', '0.401997543', 'ARD7TVE1187B99BFB1', '0.0', '0', '0.0', 'Casual', '0.0', 'hip hop', '1.0', '300848', '0', '0.0', '0.0', '0.643', '0.58521', '0.834', '0.58521', '218.93179', '0.247', '0.60211999', 'SOMZWCG12A8C13C480', '1.0', '0.736', '-11.197', '0', '0.636', '218.932', '0.779', '0.28519', '92.198', '4.0', '0.778', '0', '0"\n']


Now that we have values we can work with, we can try to compute the total popularity for Casual as an artist by adding up the values for artist familiarity and artist hottness. 

Artist familiarity = "A measure of 0 to 1 for how familiar the artist is to listeners."

Artist hottness = "A measure of the artists's popularity, when downloaded (in December 2010). Measured on a scale of 0 to 1."

In [4]:
# Casual's artist familiarity
Casual_artist_fam = float(row2[0])
Casual_artist_fam

0.581793766

In [5]:
# Casual artist hottness
Casual_artist_hot = float(row2[1])
Casual_artist_hot

0.401997543

In [6]:
# mathematical operations
# artist familiarity + artist hottness for the artist Casual
Casual_artist_fam + Casual_artist_hot

0.983791309

The total populairity for Casual is 0.983791309 when the max is 2, which tells us Casual is relatively average in terms of people knowing who the artist is and liking the artist.

We can also compare values to differentiate certain traits. To see if Casual is an artist that is well known to listeners, is popular, or both, we can compaure the values for artist familiarity and artist hottness.

In [7]:
# conditional expression
if Casual_artist_fam == Casual_artist_hot:
    print('Casual is familiar as an artist to the public and is a trending artist.')
elif Casual_artist_fam > Casual_artist_hot:
    print('Casual is familar as an artst but is not a trending artst')

Casual is familar as an artst but is not a trending artst


This tells us that more people know of Casual than like the artist and their music.

We can go through the dataset and make similar comparisons and conclusions. To check whether or not an artist listed is popular, makes hit songs, or both, we can use a for loop rather than checking manually.

In [8]:
import pandas as pd

In [9]:
# for loop
k = 1 # we want to start at the first row
for i in music:
    artist_hot = float((music[k])[1]) # this is the value for an artist's hotness or popularity
    song_hot = float((music[k])[20]) # this is the value for the artist's song's hotness or popularity
    if artist_hot == song_hot:
        print((music[k])[6] + " is a popular artist that makes hit songs")
        k+=1 # we want the loop to go throuh all the rows of the dataset
    elif artist_hot > song_hot:
        print((music[k])[6] + " is a popular artist")
        k+=1
    elif artist_hot < song_hot:
        print((music[k])[6] + " makes hit songs")
        k+=1

Casual makes hit songs
The Box Tops is a popular artist
Sonora Santanera is a popular artist
Adam Ant is a popular artist
Gob makes hit songs
Jeff And Sheri Easter is a popular artist
Rated R is a popular artist
Tweeterfriendly Music is a popular artist
Planet P Project is a popular artist
Clp is a popular artist
JennyAnyKind is a popular artist
Wayne Watson is a popular artist
Andy Andy is a popular artist
Bob Azzam is a popular artist
Lionel Richie is a popular artist
Blue Rodeo is a popular artist
Richard Souther makes hit songs
Faiz Ali Faiz is a popular artist
Tesla makes hit songs
lextrical is a popular artist that makes hit songs
Jimmy Wakely is a popular artist
Alice Stuart is a popular artist
Elena is a popular artist
The Dillinger Escape Plan makes hit songs
SUE THOMPSON makes hit songs
Five Bolt Main makes hit songs
Clp is a popular artist
Tim Wilson is a popular artist
Willie Bobo is a popular artist
Faye Adams is a popular artist
Terry Callier is a popular artist
John Wesl

IndexError: list index out of range

We can see that some artists make hit songs but aren't necessarily popular and vice versa. 

This data set has a lot of confidence values about each artist's song's elements such as beat, key, time signature, and more. If we ever needed to combine all these song element values and see the confidence value overall, we can add all the individual confidence values using a function and compare it to the max value fo 6.

In [10]:
# function (both a new function definition and an execution of that function)
# function to calculate total song confidence for The Black Eyed Peas
def f(bars, beats, key, mode, tatums, timesig):
    return song_bars_conf + song_beats_conf + song_key_conf + song_mode_confi + song_tatums_conf + song_timesig_conf

In [11]:
song_bars_conf = float((music[75])[14])
song_beats_conf = float((music[75])[16])
song_key_conf = float((music[75])[23])
song_mode_confi = float((music[75])[26])
song_tatums_conf = float((music[75])[28])
song_timesig_conf = float((music[75])[32])
f(song_bars_conf, song_beats_conf, song_key_conf, song_mode_confi, song_tatums_conf, song_timesig_conf)

3.572

With the Black Eyed Peas, we can see that the total confidence value is just above the average of 3. With more statistical tests we can use this value find a confidence interval and later test the null hypothesis that we set out to analyze.

In [12]:
# math check for the function
float((music[75])[14]) + float((music[75])[16]) + float((music[75])[23]) + float((music[75])[26]) + float((music[75])[28]) + float((music[75])[32])

3.572

To use the music module, we will need to import it, use a function called get_music, and see the keys to know what we have to work with.

In [13]:
# module import and use of the module somewhere in the codefor loop
import music
music = music.get_music()

In [13]:
music[0]

{'artist': {'familiarity': 0.581793766,
  'hotttnesss': 0.401997543,
  'id': 'ARD7TVE1187B99BFB1',
  'latitude': 0.0,
  'location': 0,
  'longitude': 0.0,
  'name': 'Casual',
  'similar': 0.0,
  'terms': 'hip hop',
  'terms_freq': 1.0},
 'release': {'id': 300848, 'name': 0},
 'song': {'artist_mbtags': 0.0,
  'artist_mbtags_count': 0.0,
  'bars_confidence': 0.643,
  'bars_start': 0.58521,
  'beats_confidence': 0.834,
  'beats_start': 0.58521,
  'duration': 218.93179,
  'end_of_fade_in': 0.247,
  'hotttnesss': 0.60211999,
  'id': 'SOMZWCG12A8C13C480',
  'key': 1.0,
  'key_confidence': 0.736,
  'loudness': -11.197,
  'mode': 0,
  'mode_confidence': 0.636,
  'start_of_fade_out': 218.932,
  'tatums_confidence': 0.779,
  'tatums_start': 0.28519,
  'tempo': 92.198,
  'time_signature': 4.0,
  'time_signature_confidence': 0.778,
  'title': 0,
  'year': 0}}

In [14]:
music[0].keys()

dict_keys(['artist', 'release', 'song'])

With the module, we can create a for loop to see what genre of music each artist listed in the dataset creates without the distraction of other variables.

In [15]:
#a for loop that generates a sentance detailing what kind of music eeach artist listed in the dataset makes
for i in music:
    print(str(i['artist']['name']) + ' makes ' + i['artist']['terms'] + ' music.')

Casual makes hip hop music.
The Box Tops makes blue-eyed soul music.
Sonora Santanera makes salsa music.
Adam Ant makes pop rock music.
Gob makes pop punk music.
Jeff And Sheri Easter makes southern gospel music.
Rated R makes breakbeat music.
Tweeterfriendly Music makes post-hardcore music.
Planet P Project makes new wave music.
Clp makes breakcore music.
JennyAnyKind makes alternative rock music.
Wayne Watson makes ccm music.
Andy Andy makes bachata music.
Bob Azzam makes chanson music.
Lionel Richie makes quiet storm music.
Blue Rodeo makes country rock music.
Richard Souther makes chill-out music.
Faiz Ali Faiz makes qawwali music.
Tesla makes hard rock music.
lextrical makes indietronica music.
Jimmy Wakely makes classic country music.
Alice Stuart makes electric blues music.
Elena makes uk garage music.
The Dillinger Escape Plan makes math-core music.
SUE THOMPSON makes pop rock music.
Five Bolt Main makes post-grunge music.
Clp makes breakcore music.
Tim Wilson makes filk music.

We can see that there is a diverse group of artists who make music that span across multiple genres.