# XPN 2020

After taking a year off from playlists, they are back.
This time instead of an A-Z walk though their catalog
[WXPN](https://www.xpn.org) is back to listener curated lists
with this years [2002 Countdown](https://xpn.org/music-artist/885-countdown/2020/).
This time the station is doing their own stats page,
[2020 Countdown, by the Numbers](https://thekey.xpn.org/wxpn-2020-numbers/).
But for fun, I'll still do something.

In [1]:
%matplotlib inline
from IPython.display import display, HTML

## Status

It's Saturday morning.
The it's day 3 of the countdown, which wil probably only run a week.
[WXPN](https://xpn.org) had done a great job of doing their own stats.
Checkout the thier stuff at [XPN 2020, by the numbers](https://thekey.xpn.org/wxpn-2020-numbers/).
I'll keep re-running this to kep the data up.
But I'm going to try to find some interesting insights to drill down on to add value,
since the station has the basics like "which artists are playing the most" covered.

* Add comparision with 2014's 885 best and 88 worst
* make joins to best and worst case insensitive, so ABBA matches Abba.  Thanks to [Joe Lynch](https://twitter.com/jwlynchjr) for catching that.



## Loading the Data

Most people are probably not too interested in how I pull the data, clean it and augment it.
If you do the details are in my [Data Loading notebook](DataLoading.ipynb).
If you want a copy of the raw data to play with yourself, 
feel free to look at [xpn2020.csv](./data/xpn2020.csv).
If you do something interesting with the data,
please let me know.
And tell the station and post to [twitter](https://twitter.com) with the hash tag `#XPN2020`.
I'm sure other listeners will be glad to see what you have done.

In [2]:
import pandas as pd
from datetime import date, datetime, time, timedelta
from os import path
data_dir = './data'
playlist_file = path.join(data_dir, 'xpn2020.csv')
playlist = pd.read_csv(playlist_file)

playlist['Air Time'] = pd.to_datetime(playlist['Air Time'], errors='coerce')
last_play = playlist.loc[playlist['Air Time'].idxmax()]
end_time = last_play['Air Time'] + timedelta(seconds = 60 * last_play['Duration'])
HTML('<p>So far, as of %s, we have seen %d tracks with %d unique titles, from %d artists.</p>' %\
    (end_time.strftime('%b %d %I:%M%p'),
     len(playlist),
     playlist.describe(include='all')['Title']['unique'], 
     playlist.describe(include='all')['Artist']['unique']
     ))

## Data Analysis

### Most Popular Artists

It's always interesting to see which artists show up the most.
There tends to be bias to nostalgia.
But there's usually some new stuff mixed in.

In [3]:
import seaborn as sns
import matplotlib.pyplot as plt


### Comparision with other XPN Playlists

#### 885 Best and 88 Worst

Back in 2014 there was a very similar [885 All Time Greatest Song](https://xpn.org/music-artist/885-countdown/2014/885-countdown-2014) countdown.
While this countdown has almost 3 times as many songs,
it might be interesting to see how they compare.

In [4]:

def overlapping_songs(list1, list2):
    list1a = list1.copy()
    list2a = list2.copy()
    list1a['Title_lc'] = list1a['Title'].str.lower()
    list1a['Artist_lc'] = list1a['Artist'].str.lower()
    list2a['Title_lc'] = list2a['Title'].str.lower()
    list2a['Artist_lc'] = list2a['Artist'].str.lower()
    intersect = pd.merge(list1a, list2a, how='inner', on=['Title_lc', 'Artist_lc'])
    intersect.drop(intersect.columns.difference(['Title_x','Artist_x']), 1, inplace=True)
    intersect.columns = ['Artist', 'Title']
    return intersect

    
best885_file = path.join(data_dir, '885best.csv')
best885 = pd.read_csv(best885_file)
besties = overlapping_songs(playlist, best885)
besties.to_csv(path.join(data_dir, 'XPN2020_and_885Best.csv'), index=False)


worst88_file = path.join(data_dir, '88worst.csv')
worst88 = pd.read_csv(worst88_file)
horrors = overlapping_songs(playlist, worst88)
horrors.to_csv(path.join(data_dir, 'XPN2020_and_88Worst.csv'), index=False)

s= "<p>Of the %d tracks in the 2020 Countdown so far, " + \
    "%d or %0.2f%% where in 2014's 885 best playlist. " + \
    "Those are available as <a href='data/XPN2020_and_885Best.csv'>XPN2020_and_885Best.csv</a>. " + \
    "Sadly %d were in 2014's 88 worst playlist. " + \
    "Those are available as <a href='data/XPN2020_and_88Worst.csv'>XPN2020_and_88Worst.csv</a>.</p>"
HTML(s %(len(playlist), len(besties), float(len(besties) * 100) / float(len(playlist)),
         len(horrors)))

It's odd to be seeing so many of the 885 best already, 
as we're not even in the top 1000, so they have dropped a bit.

And what were those worst songs that were apparently someone's favorites?

####  How the Tracks changed place

In [10]:
best885.insert(0, 'Rank', range(1, 1 + len(best885)))
HTML(best885.to_html())

Unnamed: 0,Rank,Title,Artist
0,1,Thunder Road,Bruce Springsteen
1,2,Like A Rolling Stone,Bob Dylan
2,3,Imagine,John Lennon
3,4,A Day In The Life,The Beatles
4,5,Born To Run,Bruce Springsteen
5,6,Stairway To Heaven,Led Zeppelin
6,7,Gimme Shelter,The Rolling Stones
7,8,Layla,Derek And The Dominoes
8,9,Sympathy For The Devil,The Rolling Stones
9,10,Hey Jude,The Beatles


In [21]:
#playlist.insert(0, 'Rank', range(2020, 2020 - len(playlist), -1))
#HTML(playlist.to_html(columns=['Rank', 'Title', 'Artist']))
HTML(playlist.to_html())

Unnamed: 0,Rank,Artist,Title,Air Time,Letter,First Word,Duration,Year
0,2020,Booker T. & The MG's,Time Is Tight,2020-12-10 08:02:00,T,Time,3,1980
1,2019,AC/DC,T.N.T.,2020-12-10 08:05:00,T,T,6,1975
2,2018,Peter Frampton,Show Me the Way,2020-12-10 08:11:00,S,Show,5,1975
3,2017,The Drifters,Under The Boardwalk,2020-12-10 08:16:00,U,Under,3,1989
4,2016,Adele,Rumor Has It,2020-12-10 08:19:00,R,Rumor,5,0
5,2015,Smith,Baby It's You,2020-12-10 08:24:00,B,Baby,4,1969
6,2014,Aretha Franklin,Call Me,2020-12-10 08:28:00,C,Call,3,1970
7,2013,Marvin Gaye & Kim Weston,It Takes Two,2020-12-10 08:31:00,I,It,3,1966
8,2012,Curtis Mayfield,Superfly,2020-12-10 08:34:00,S,Superfly,7,1973
9,2011,Shawn Colvin,I Don't Know Why,2020-12-10 08:41:00,I,I,4,1992


In [32]:
best885['Title_lc'] = best885['Title'].str.lower().str.translate(None, string.punctuation).str.translate(None, string.whitespace)
best885['Artist_lc'] = best885['Artist'].str.lower().str.translate(None, string.punctuation).str.translate(None, string.whitespace)

xpn2020 = playlist.copy()
xpn2020.drop(xpn2020.columns.difference(['Rank', 'Title', 'Artist']), 1, inplace=True) 
xpn2020['Title'] = xpn2020['Title'].str.lower().str.translate(None, string.punctuation).str.translate(None, string.whitespace)
xpn2020['Artist'] = xpn2020['Artist'].str.lower().str.translate(None, string.punctuation).str.translate(None, string.whitespace)
xpn2020.columns = ['Rank', 'Artist_lc', 'Title_lc']
composite = pd.merge(best885, xpn2020, how='left', on=['Title_lc', 'Artist_lc'], suffixes=['_885', '_2020'])
composite.drop(['Title_lc', 'Artist_lc'], 1, inplace=True)
composite['Change'] = composite['Rank_885'] - composite['Rank_2020']
HTML(composite.to_html())
#HTML(xpn2020.to_html())


Unnamed: 0,Rank_885,Title,Artist,Rank_2020,Change
0,1,Thunder Road,Bruce Springsteen,,
1,2,Like A Rolling Stone,Bob Dylan,,
2,3,Imagine,John Lennon,,
3,4,A Day In The Life,The Beatles,,
4,5,Born To Run,Bruce Springsteen,,
5,6,Stairway To Heaven,Led Zeppelin,,
6,7,Gimme Shelter,The Rolling Stones,,
7,8,Layla,Derek And The Dominoes,,
8,9,Sympathy For The Devil,The Rolling Stones,,
9,10,Hey Jude,The Beatles,,
