# Billboard Top 100

This dataset, available via Data.world, contains every weekly Hot 100 singles chart between 8/2/1958 and 12/28/2019 from Billboard.com. Each row of data represents a song and the corresponding position on that week's chart. Included in each row are the following elements:
1. Billboard Chart URL
2. WeekID
3. Song name
4. Performer name
5. SongID - Concatenation of song & performer
6. Current week on chart
7. Instance (this is used to separate breaks on the chart for a given song. Example, an instance of 6 tells you that this is the sixth time this song has appeared on the chart)
8. Previous week position
9. Peak Position (as of the corresponding week)
10. Weeks on Chart (as of the corresponding week)

Also available is a dataset containing each song's "audio features."

These include common attributes like genre as well as the following:
1. Danceability
2. Energy
3. Key
4. Loudness
5. Mode
6. Speechiness
7. Acousticness
8. Instrumentalness
9. Liveness
10. Valence
11. Tempo
12. Time Signature

Definitions of those terms are available at https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

Here's an example:

Liveliness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

I'm excited about this data set. In addition to already numeric things like tempo, the ability to measure things like instrumentalness and speechiness give us some fun quantative measurements to play around with. I'm interested in looking for trends over time as well as what clusters of music might exist. It might be interesting to see if there's some seasonality to songs that spend the most time at the top, that is, is a song more likely to stay on the chart if it's released in the winter? I think there's a lot to explore here. Though admittedly, with so much to explore, I might have to be really careful with how I tidy the data. I think some of the choices I make in handling the data will make some questions easier to answer and some harder.


In [43]:
import numpy as np
import pandas as pd

In [42]:
df1= pd.read_csv("Hot Stuff.csv")
print(len(df1))
df1.head()

320495


Unnamed: 0,url,WeekID,Week Position,Song,Performer,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart
0,http://www.billboard.com/charts/hot-100/1958-0...,8/2/1958,1,Poor Little Fool,Ricky Nelson,Poor Little FoolRicky Nelson,1,,1,1
1,http://www.billboard.com/charts/hot-100/1995-1...,12/2/1995,1,One Sweet Day,Mariah Carey & Boyz II Men,One Sweet DayMariah Carey & Boyz II Men,1,,1,1
2,http://www.billboard.com/charts/hot-100/1997-1...,10/11/1997,1,Candle In The Wind 1997/Something About The Wa...,Elton John,Candle In The Wind 1997/Something About The Wa...,1,,1,1
3,http://www.billboard.com/charts/hot-100/2006-0...,7/1/2006,1,Do I Make You Proud,Taylor Hicks,Do I Make You ProudTaylor Hicks,1,,1,1
4,http://www.billboard.com/charts/hot-100/2009-1...,10/24/2009,1,3,Britney Spears,3Britney Spears,1,,1,1


In [41]:
df1.isnull().sum()

url                           0
WeekID                        0
Week Position                 0
Song                          0
Performer                     0
SongID                        0
Instance                      0
Previous Week Position    30784
Peak Position                 0
Weeks on Chart                0
dtype: int64

This is actually a really clean set. Those missing Previous Week Position values are all from songs appearing on the chart for the first time and shouldn't have a value there. I think I'll end up collapse each song down to the single observation of how many weeks a song was on the chart and what it's peak position was and remove all that missing data along the way.

In [27]:
df1['Song']

0                                          Poor Little Fool
1                                             One Sweet Day
2         Candle In The Wind 1997/Something About The Wa...
3                                       Do I Make You Proud
4                                                         3
                                ...                        
320490                                     Jingle Bell Rock
320491                                     Jingle Bell Rock
320492                                     Jingle Bell Rock
320493                                     Jingle Bell Rock
320494                                     Jingle Bell Rock
Name: Song, Length: 320495, dtype: object

In [33]:
df1[(df1['Song']=='Jingle Bell Rock') & (df1['Performer'] =='Bobby Helms')].sort_values(by='Weeks on Chart', ascending = False)

Unnamed: 0,url,WeekID,Week Position,Song,Performer,SongID,Instance,Previous Week Position,Peak Position,Weeks on Chart
320494,https://www.billboard.com/charts/hot-100/2019-...,12/28/2019,9,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,8,15.0,9,29
320493,https://www.billboard.com/charts/hot-100/2019-...,12/21/2019,15,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,8,23.0,15,28
320492,https://www.billboard.com/charts/hot-100/2019-...,12/14/2019,23,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,8,47.0,23,27
320491,https://www.billboard.com/charts/hot-100/2019-...,12/7/2019,47,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,8,,47,26
320490,https://www.billboard.com/charts/hot-100/2019-...,1/5/2019,8,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,7,13.0,8,25
320489,https://www.billboard.com/charts/hot-100/2018-...,12/29/2018,13,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,7,15.0,13,24
320488,https://www.billboard.com/charts/hot-100/2018-...,12/22/2018,15,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,7,26.0,15,23
320487,https://www.billboard.com/charts/hot-100/2018-...,12/15/2018,26,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,7,33.0,26,22
320486,https://www.billboard.com/charts/hot-100/2018-...,12/8/2018,33,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,7,,33,21
320470,http://www.billboard.com/charts/hot-100/2017-0...,1/7/2017,29,Jingle Bell Rock,Bobby Helms,Jingle Bell RockBobby Helms,6,46.0,29,20


This is peculiar for a variety of reasons. That seasonal music, like Christmas music though is there any other really seasonal music?, returns seasonally makes sense. If not for the season, this music probably wouldn't be played and likely doesn't describe any bigger trends. In a separate analysis, I might be interested in how Christmas music has changed over time and look for evidence of how long it takes pop music trends to work their way into the most played Christmas songs. I think Christmas music is so specific though, that I'm going to remove it from the dataframe. I notice the second file has genre information. I'll look to see if I can use that. I could always scrap December, or songs that originate in late November/early December. To that end, I could also use the point where Christmas music starts up to measure "Christmas Creep," the feeling that the Christmas season begins earlier each year.

In [8]:
df2 = pd.read_excel("Song_data.xlsx")
df2.head()

Unnamed: 0,SongID,Performer,Song,spotify_genre,spotify_track_id,spotify_track_preview_url,spotify_track_album,spotify_track_explicit,spotify_track_duration_ms,spotify_track_popularity,...,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
0,"AdictoTainy, Anuel AA & Ozuna","Tainy, Anuel AA & Ozuna",Adicto,['pop reggaeton'],3jbT1Y5MoPwEIpZndDDwVq,,Adicto (with Anuel AA & Ozuna),0.0,270740.0,91.0,...,10.0,-4.803,0.0,0.0735,0.017,1.6e-05,0.179,0.623,80.002,4.0
1,The Ones That Didn't Make It Back HomeJustin M...,Justin Moore,The Ones That Didn't Make It Back Home,"['arkansas country', 'contemporary country', '...",,,,,,,...,,,,,,,,,,
2,ShallowLady Gaga & Bradley Cooper,Lady Gaga & Bradley Cooper,Shallow,"['dance pop', 'pop']",2VxeLyX666F8uXCJ0dZF8B,,A Star Is Born Soundtrack,0.0,215733.0,88.0,...,7.0,-6.362,1.0,0.0308,0.371,0.0,0.231,0.323,95.799,4.0
3,EnemiesPost Malone Featuring DaBaby,Post Malone Featuring DaBaby,Enemies,"['dfw rap', 'melodic rap', 'rap']",0Xek5rqai2jcOWCYWJfVCF,,Hollywood's Bleeding,1.0,196760.0,86.0,...,6.0,-4.169,1.0,0.21,0.0588,0.0,0.0955,0.667,76.388,4.0
4,"Bacc At It AgainYella Beezy, Gucci Mane & Quavo","Yella Beezy, Gucci Mane & Quavo",Bacc At It Again,"['dfw rap', 'rap', 'southern hip hop', 'trap']",2biNa12dMbHJrHVFRt8JyO,https://p.scdn.co/mp3-preview/fa6fa6f6f363be29...,Bacc At It Again,1.0,228185.0,61.0,...,8.0,-5.725,0.0,0.168,0.00124,1e-06,0.0716,0.856,135.979,4.0


In [35]:
df2['spotify_genre'].value_counts()

[]                                                                                                                              2541
['contemporary country', 'country', 'country road']                                                                              315
['contemporary country', 'country', 'country road', 'modern country rock']                                                       279
['dance pop', 'pop', 'post-teen pop']                                                                                            252
['glee club', 'hollywood', 'post-teen pop']                                                                                      205
                                                                                                                                ... 
['deep house', 'disco house', 'diva house', 'hip house', 'tribal house', 'vocal house']                                            1
['adult standards', 'big band', 'cool jazz', 'easy listening', 'loung

# Next time:
1. df1:
    clean up multiple occurences
    handle seasonal music
2. df2:
    explore more df2
    expand genre column
    clean up performer/song id to join the data sets

    