# Data Exploration
## Background

From the _List of Billboard number-one singles_ Wikipedia [page](https://en.wikipedia.org/wiki/List_of_Billboard_number-one_singles): 
>The following year-by-year, week-by-week listings are based on statistics accrued by Billboard magazine before and after the inception of its Hot 100 popularity chart in August 1958.
All data is pooled from record purchases and radio/jukebox play within the United States. Later charts also include digital single sales, online streaming, and YouTube hits.


## Goals
Explore the data, answer preliminary questions.

### Preliminary Questions

1. Of all songs that went number one since 1950, are there any commonalities between the songs? Like tempo, valence, etc. ?
2. Is there any cyclical patterns? For example, popular songs today sound very disco-like from the 70s and very synth-like from the 80s. 

3. Are genres consistent among number ones? Do all songs have similar genres?
4. Are songs that go number one sung more by women or men? Solo acts or ensembles/groups?

These questions are just a starting point to guide this analysis. I'll describe further questions as I go.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')
%matplotlib inline

songs = pd.read_csv("../data/final_dataset.csv", index_col=0)
songs.head()

Unnamed: 0,track_id,decade,track_name,artist_name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,zero_crossing_rate,duration_ms,time_signature,genres
0,6ymkab3FTjiFzSJwhal59m,1950,Rudolph The Red-Nosed Reindeer,Gene Autry,0.596,0.315,8,-9.175,1,0.0428,0.961,0.0,0.258,0.64,119.935,0.080147,171773,4,holiday
1,4oP8eYnsSKJPC4VNfPB7dZ,1950,"I Can Dream, Can't I? - Single Version",The Andrews Sisters,0.27,0.177,0,-9.791,1,0.0298,0.922,0.0,0.104,0.237,87.373,0.090326,160000,4,pop
2,0fVtEGoXeRhllDU9ChQAZl,1950,Rag Mop,The Ames Brothers,0.589,0.396,7,-13.58,1,0.237,0.705,0.0,0.108,0.979,200.533,0.073879,159948,4,"['adult standards', 'deep adult standards', 'e..."
3,7Jf323ttHKUnPylFWiaGl3,1950,Chattanoogie Shoe Shine Boy - 1949 Single Version,Red Foley,0.725,0.373,10,-15.925,1,0.0494,0.613,0.0202,0.118,0.846,148.367,0.066668,169000,4,country
4,0lO5EKoz1Rb1pJoPoldE4D,1950,(Put Another Nickel In) Music! Music! Music!,Teresa Brewer,0.752,0.443,2,-14.392,1,0.0398,0.667,2.1e-05,0.154,0.919,99.136,0.110896,160667,4,vocal


A question that came to mind was which decade had the most number one hits?

In [4]:
songs['decade'].value_counts()

1970    253
1980    234
1960    216
1990    143
2000    129
1950    122
2010    116
Name: decade, dtype: int64

It looks like the 1970s had the most number one hits, but the 80s and the 60s are not too far behind. Interestingly, the most recent decade, the 2010s, has the least number one hits.

From this NME [article](https://www.nme.com/blogs/nme-blogs/billboard-100-shorter-explicit-2054269):
### The number of unique Number One singles is decreasing.
>In the ‘70s, the average number of Number Ones could be as high as 30 per year.In the past four years we haven’t had more than 13 unique Number Ones per year. As BBC Radio 1 boss Chris Price explained to NME last year, the stagnancy of the chart is probably down to streaming being counted in the charts:
“We’ve moved away from somebody walking into HMV and being a physical single, or downloading a 99p download from iTunes, and we’re moving much more towards measuring engagement over time. It’s less like somebody walking into a shop and making a purchase, and more like somebody sitting at home in their bedroom and listening to something several times over.”