# Analyzing the ASOT Top 1000
> Celebrating 1,000 episodes of A State of Trance.

- toc: true 
- badges: true
- comments: false
- categories: [asot, bpm, artist, year]
- image: images/annual-avg-bpm.png

In [2]:
#hide
%pip install spotipy pyyaml altair

Collecting altair
  Downloading altair-4.1.0-py3-none-any.whl (727 kB)
[K     |████████████████████████████████| 727 kB 9.6 MB/s eta 0:00:01
Collecting pandas>=0.18
  Downloading pandas-1.2.1-cp38-cp38-manylinux1_x86_64.whl (9.7 MB)
[K     |████████████████████████████████| 9.7 MB 108.2 MB/s eta 0:00:01�████▌           | 6.2 MB 108.2 MB/s eta 0:00:01
Collecting numpy
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
[K     |████████████████████████████████| 14.9 MB 93.3 MB/s eta 0:00:01
Collecting pytz>=2017.3
  Downloading pytz-2020.5-py2.py3-none-any.whl (510 kB)
[K     |████████████████████████████████| 510 kB 94.8 MB/s eta 0:00:01
[?25hCollecting spotipy
  Downloading spotipy-2.16.1-py3-none-any.whl (24 kB)
Collecting toolz
  Downloading toolz-0.11.1-py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 518 kB/s  eta 0:00:01
[?25hInstalling collected packages: pytz, numpy, toolz, pandas, spotipy, altair
Successfully installed altair-4

In [3]:
#hide
import os
import yaml
import spotipy
import json
import altair as alt
import numpy as np
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

sp = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

## Introduction

To celebrate the 1,000th episode of A State of Trance the radioshow invited viewers to vote for their all-time favorite trance tracks, and the resulting list was broadcast as [ASOT 1000](https://www.astateoftrance.com/episodes/asot1000/).

In this post we'll analyze the top 1,000 - which artists, BPMs, and years are most-represented? And more!

## Some Housekeeping

As with previous posts here, we'll be pulling data from Spotify and graphing the results. While there is [an official "ASOT Top 1000"](https://open.spotify.com/playlist/5QafFMGgQKGwqgV7k3qHy6) playlist on Spotify, I'm opting to instead use the "[ASOT TOP 1000 Countdown Extended](https://open.spotify.com/playlist/5DCcjCLMlPjTwKLCcYyzIj)" playlist [compiled by reddit user turbodevin](https://www.reddit.com/r/trance/comments/l2ae9y/relive_the_asot_top_1000_countdown_in_your_own/). As Devin writes,
> MISSING
>
>    531 || Sean Callery - The Longest Day (Armin van Buuren Remix)
>
> REMIX NOT AVAILABLE
>
>    414 || Faithless - Insomnia (Andrew Rayel Remix)
>
>    520 || Safri Duo - Played A Live (The Bongo Song) [NWYR & Willem de Roo Remix]
>
>    530 || Kensington - Sorry (Armin van Buuren Remix)
>
>    635 || Ilse de Lange - The Great Escape (Armin van Buuren Remix)
>
>    661 || Zedd feat. Foxes - Clarity (Andrew Rayel Remix)

While the playlist may not be complete, I'd still consider to be the most-complete playlist available on Spotify - using extended mixes over the official playlist's radio mixes is certainly more preferrable, at least.

Remember, all data here is pulled directly from Spotify's API without any modification from my end*. See the post on [Methodology](https://scottbrenner.github.io/asot-jupyter/asot/bpm/2020/04/27/methodology.html) for details on what data we can pull from Spotify, and how.

\*[Spotify's API for "Get a Playlist's Items" limits us to getting 100 tracks at a time](https://developer.spotify.com/documentation/web-api/reference/#category-playlists). Let's make 10 API calls for 100 tracks each, incrementing `offset` each time, and save the results.

In [4]:
"""
User: https://open.spotify.com/user/113444659
Playlist: ASOT TOP 1000 Countdown Extended
Playlist link: https://open.spotify.com/playlist/5DCcjCLMlPjTwKLCcYyzIj
Playlist ID: 5DCcjCLMlPjTwKLCcYyzIj
"""
top_1000_playlist = '5DCcjCLMlPjTwKLCcYyzIj'

top_1000_tracks = []

# Get full details of the tracks and episodes of a playlis
# https://spotipy.readthedocs.io/en/2.16.1/#spotipy.client.Spotify.playlist_items
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=100)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=200)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=300)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=400)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=500)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=600)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=700)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=800)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=900)['items'])
print(len(top_1000_tracks))

1000


In [5]:
# What's #1?
print(top_1000_tracks[999]['track']['artists'][0]['name'], '-', top_1000_tracks[999]['track']['name'])

Armin van Buuren - Shivers


## Artists

Let's begin by looking at the artists who made the top 1000 - how many unique artists were featured?

In [6]:
unique_artists = set()

for track in top_1000_tracks:
    for artist in track['track']['artists']:
            unique_artists.add(artist['name'])      

print(len(unique_artists))

639


Which artists were featured the most?

In [7]:
from collections import defaultdict

artist_counter = defaultdict(int)

for track in top_1000_tracks:
    for artist in track['track']['artists']:
         artist_counter[artist['name']] += 1


top_artists = sorted(artist_counter.items(), key=lambda k_v: k_v[1], reverse=True)

Alright, let's see the top 25 in a graph..

In [8]:
source = pd.DataFrame.from_dict(top_artists[:25])

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', sort='-x', title='Artist')
).properties(
    title="ASOT Top 1000 - Most-played artists",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

No surprise at _who_ the #1 is, but the sheer number of their tracks featured is pretty impressive - over 10% of the ASOT Top 1000 was produced by Armin van Buuren, more than twice the number of the second-most featured artist!

Which artists were featured exactly once, with what track, at what position?

In [9]:
# Find all artists with one play, then find that track in the top 1000
for artist in top_artists:
    if artist[1] == 1:
        for position, track in enumerate(top_1000_tracks):
            if track['track']['artists'][0]['name'] == artist[0]:
                print(1000 - position, '.', track['track']['artists'][0]['name'], '-', track['track']['name'])

997 . ATN - Miss A Day - Original Mix
996 . Late Night Alumni - Empty Streets - Lumïsade Balearic Mix
993 . Ron van den Beuken - Timeless - Ron van den Beuken Remix
992 . Greg Downey - These Hands I Hold - Sean Tyas Remix
991 . M.I.K.E. - Chocolate Infusion - Original Mix
989 . Adam Nickey - Never Gone - Original Mix [Above & Beyond Respray]
987 . Salt Tank - Eugina - Michael Woods Remix
981 . A Force - Crystal Dawn [ASOT 254] - A Tribute To '99 Remix
969 . Myon & Shane 54 - Ibiza Sunrise - Classic Dub
967 . Neptune Project - Aztec - Original Mix
964 . Ava Mea - In The End - Original Mix
963 . Rodg - High On Life - Extended Mix
962 . Midway - Monkey Forest - Original Mix Edit
961 . Ramin Djawadi - Game Of Thrones Theme - Armin van Buuren Extended Remix
960 . Filterheadz - Yimanya - Original Mix
950 . Jody Wisternoff - The Bridge - Chicane Rework
943 . Probspot - Foreplay - Original Mix
940 . Selu Vibra - Stargazing [ASOT 224] - Original Mix
928 . Ernesto vs. Bastian - Dark Side Of The 

Note that we're only listing the artist on the track credits that's _only_ featured on that track. For example, "120. Darren Tate & Jono Grant – Shine (Let The Light Shine In)" is listed here but lists _only_ Darren Tate as the producer because Jono Grant also appears in "562. Jono Grant vs Mike Koglin  – Circuits".

## Tracks

Let's looks at some track-specific numbers now.

In which years were the tracks produced?

In [10]:
annual_total = defaultdict(int)

for track in top_1000_tracks:
    annual_total[track['track']['album']['release_date'][:4]] += 1

top_years = sorted(annual_total.items(), key=lambda k_v: k_v[1])
print(top_years)

[('1992', 1), ('1995', 1), ('1997', 2), ('1996', 5), ('1998', 5), ('1999', 7), ('2002', 12), ('2001', 14), ('2000', 18), ('2003', 23), ('2004', 25), ('2005', 32), ('2006', 37), ('2007', 38), ('2008', 45), ('2017', 49), ('2015', 50), ('2014', 54), ('2020', 55), ('2016', 55), ('2010', 56), ('2009', 62), ('2013', 64), ('2018', 64), ('2011', 66), ('2012', 68), ('2019', 92)]


In a graph:

In [11]:
source = pd.DataFrame.from_dict(top_years)

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', sort='-x', title='Year')
).properties(
    title="ASOT Top 1000 - Most-represented years",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

Might be better to see it sorted by year:

In [12]:
source = pd.DataFrame.from_dict(top_years)

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', title='Year')
).properties(
    title="ASOT Top 1000 - Yearly representation",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

What's the average BPM of tracks in the top 1,000?

In [13]:
total_bpm = 0

for track in top_1000_tracks:
    total_bpm += sp.audio_features(track['track']['uri'])[0]['tempo']

print(total_bpm/1000)

134.03799499999994


Maybe that's not so useful. How does the track BPM vary throughout the top 1,000? With #1,000 on the left, down to #1 on the right.

In [14]:
bpm = []
for track in top_1000_tracks:
    tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
    if tempo < 100 or tempo > 150: # outliers, details below
        bpm.append(138)
    else:
        bpm.append(sp.audio_features(track['track']['uri'])[0]['tempo'])

x = np.arange(len(top_1000_tracks))   

source = pd.DataFrame({
  'track': x,
  'bpm': np.array(bpm)
})

source['138'] = 138

base = alt.Chart(source).mark_line().encode(
    alt.X('track'),
    alt.Y('bpm', scale=alt.Scale(domain=(100, 150))),
).properties(
    title="ASOT Top 1000 - BPM of track"
)

rule = alt.Chart(source).mark_rule(color='red').encode(
    y='138'
)

base + rule

Not the best way to visualize it, how about a semi-interactive scatter plot?

In [75]:
detail = (
    alt.Chart(source)
    .mark_point()
    .encode(
        x=alt.X(
            "track:T", scale=alt.Scale(domain={"selection": zoom.name, "encoding": "x"})
        ),
        y=alt.Y(
            "bpm:Q",
            scale=alt.Scale(domain={"selection": zoom.name, "encoding": "y"}),
        ),
        color="bpm",
        tooltip=['bpm', 'track']
    )
    .properties(width=600, height=400, title="BPM of ASOT Top 1000 -- detail view")
).interactive()

detail

(Couldn't figure out how to get track titles and artists in the tooltips)

There's a few "outliers" that kind of throw off the graph - let's look at the tracks in the top 1,000 with the lowest and highest BPMs.

In [152]:
for position, track in enumerate(top_1000_tracks):
    tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
    if tempo <= 100 or tempo >= 142: # "outliers"
        track_artist = track['track']['artists'][0]['name']
        for artist in track['track']['artists'][1:]:
            track_artist += " & " + artist['name']
        print(1000 - position, '.', track_artist, '-', track['track']['name'], '-', tempo, 'BPM')

716 . Armin van Buuren & Alexander Popov - Popcorn - Extended Mix - 94.915 BPM
654 . Cygnus X - The Orange Theme - Ferry Corsten's Moonman Orange Juice Remix - 144.988 BPM
635 . Ilse DeLange - The Great Escape - 98.69 BPM
531 . Kyuss - Yeah - 0 BPM
432 . Will Atkinson - Telescope - Extended Mix - 142.002 BPM
176 . Above & Beyond - Sun In Your Eyes - Original Mix - 183.967 BPM
97 . John O'Callaghan & Bryan Kearney - Exactly - Original Mix - 142.016 BPM
88 . Giuseppe Ottaviani - Linking People - Original Mix - 143.012 BPM


## Results

Let's see what we've got!

In [15]:
source = pd.DataFrame([(k, v) for k, v in annual_avg_bpm.items()], 
                   columns=['Year', 'Average Episode BPM'])
source['138'] = 138

base = alt.Chart(source).mark_line().encode(
    x=alt.X('Year'),
    y=alt.Y('Average Episode BPM', scale=alt.Scale(domain=(130, 140))),
).properties(
    title="A State of Trance - Annual Average BPM of Episode",
    width=600
)

rule = alt.Chart(source).mark_rule(color='red').encode(
    y='138'
)

base + rule

NameError: name 'annual_avg_bpm' is not defined

Straightforward enough. In the coming posts we'll do something similar, looking at the most-played artists and tracks each year.