# Analyzing the ASOT Top 1000
> Celebrating 1,000 episodes of A State of Trance.

- toc: true 
- badges: true
- comments: false
- categories: [asot, bpm, artist, year]
- image: images/most-played-artists.png

In [1]:
#hide
%pip install spotipy pyyaml altair

Collecting altair
  Downloading altair-4.1.0-py3-none-any.whl (727 kB)
[K     |████████████████████████████████| 727 kB 11.7 MB/s eta 0:00:01
Collecting pandas>=0.18
  Downloading pandas-1.2.1-cp38-cp38-manylinux1_x86_64.whl (9.7 MB)
[K     |████████████████████████████████| 9.7 MB 21.0 MB/s eta 0:00:01
Collecting numpy
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
[K     |████████████████████████████████| 14.9 MB 89.4 MB/s eta 0:00:01
Collecting pytz>=2017.3
  Downloading pytz-2020.5-py2.py3-none-any.whl (510 kB)
[K     |████████████████████████████████| 510 kB 101.4 MB/s eta 0:00:01
[?25hCollecting spotipy
  Downloading spotipy-2.16.1-py3-none-any.whl (24 kB)
Collecting toolz
  Downloading toolz-0.11.1-py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 595 kB/s  eta 0:00:01
[?25hInstalling collected packages: pytz, numpy, toolz, pandas, spotipy, altair
Successfully installed altair-4.1.0 numpy-1.19.5 pandas-1.2.1 pytz-2020.5 spot

In [2]:
#hide
import os
import yaml
import spotipy
import json
import altair as alt
import numpy as np
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

sp = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

## Introduction

To celebrate the 1,000th episode of A State of Trance the radioshow invited viewers to vote for their all-time favorite trance tracks, and the resulting list was broadcast as [ASOT 1000](https://www.astateoftrance.com/episodes/asot1000/).

In this post we'll analyze the top 1,000 - which artists, BPMs, and years are most-represented? And more!

## Some Housekeeping

As with previous posts here, we'll be pulling data from Spotify and graphing the results. While there is [an official "ASOT Top 1000"](https://open.spotify.com/playlist/5QafFMGgQKGwqgV7k3qHy6) playlist on Spotify, I'm opting to instead use the "[ASOT TOP 1000 Countdown Extended](https://open.spotify.com/playlist/5DCcjCLMlPjTwKLCcYyzIj)" playlist [compiled by reddit user turbodevin](https://www.reddit.com/r/trance/comments/l2ae9y/relive_the_asot_top_1000_countdown_in_your_own/). As Devin writes,

> I used a filler track (4 seconds) for the missing song, to keep the song numbers corresponding to the ranking. When an extended version was not available, a shorter version is used. When a remix is not available, the regular version is used when available.
> MISSING
>
>    531 || Sean Callery - The Longest Day (Armin van Buuren Remix)
>
> REMIX NOT AVAILABLE
>
>    414 || Faithless - Insomnia (Andrew Rayel Remix)
>
>    520 || Safri Duo - Played A Live (The Bongo Song) [NWYR & Willem de Roo Remix]
>
>    530 || Kensington - Sorry (Armin van Buuren Remix)
>
>    635 || Ilse de Lange - The Great Escape (Armin van Buuren Remix)
>
>    661 || Zedd feat. Foxes - Clarity (Andrew Rayel Remix)

While the playlist may not be complete, I'd still consider to be the most-complete playlist available on Spotify - using extended mixes over the official playlist's radio mixes is certainly more preferrable, at least.

Remember, all data here is pulled directly from Spotify's API without any modification from my end. See the post on [Methodology](https://scottbrenner.github.io/asot-jupyter/asot/bpm/2020/04/27/methodology.html) for details on what data we can pull from Spotify, and how. Notably, Spotify's [`AudioFeaturesObject`](https://developer.spotify.com/documentation/web-api/reference/#objects-index) lists `tempo` as "overall estimated tempo of a track in beats per minute (BPM)" - keyword being _estimate_. I've done little to account for any inconsistencies and nothing to address them!

[Spotify's API for "Get a Playlist's Items" limits us to getting 100 tracks at a time](https://developer.spotify.com/documentation/web-api/reference/#category-playlists). Let's make 10 API calls for 100 tracks each, incrementing `offset` each time, and save the results.

In [3]:
"""
User: https://open.spotify.com/user/113444659
Playlist: ASOT TOP 1000 Countdown Extended
Playlist link: https://open.spotify.com/playlist/5DCcjCLMlPjTwKLCcYyzIj
Playlist ID: 5DCcjCLMlPjTwKLCcYyzIj
"""
top_1000_playlist = '5DCcjCLMlPjTwKLCcYyzIj'

top_1000_tracks = []

# Get full details of the tracks and episodes of a playlis
# https://spotipy.readthedocs.io/en/2.16.1/#spotipy.client.Spotify.playlist_items
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=100)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=200)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=300)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=400)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=500)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=600)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=700)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=800)['items'])
top_1000_tracks.extend(sp.playlist_tracks(top_1000_playlist, offset=900)['items'])
print(len(top_1000_tracks))

1000


What's number 1?

In [4]:
print(top_1000_tracks[999]['track']['artists'][0]['name'], '-', top_1000_tracks[999]['track']['name'])

Armin van Buuren - Shivers


## Artists

Let's begin by looking at the artists who made the top 1000 - how many unique artists were featured?

In [5]:
unique_artists = set()

for track in top_1000_tracks:
    for artist in track['track']['artists']:
            unique_artists.add(artist['name'])      

print(len(unique_artists))

639


Which artists were featured the most?

In [6]:
from collections import defaultdict

artist_counter = defaultdict(int)

for track in top_1000_tracks:
    for artist in track['track']['artists']:
         artist_counter[artist['name']] += 1


top_artists = sorted(artist_counter.items(), key=lambda k_v: k_v[1], reverse=True)

Alright, let's see the top 25 in a graph..

In [7]:
source = pd.DataFrame.from_dict(top_artists[:25])

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', sort='-x', title='Artist')
).properties(
    title="ASOT Top 1000 - Most-played artists",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

No surprise at _who_ the #1 is, but the sheer number of their tracks featured is pretty impressive - over 10% of the ASOT Top 1000 was produced by Armin van Buuren, more than twice the number of the second-most featured artist!

Which artists were featured exactly once, with what track, at what position?

In [21]:
#collapse_output
# Find all artists with one play, then find that track in the top 1000
for artist in top_artists:
    if artist[1] == 1:
        for position, track in enumerate(top_1000_tracks):
            if track['track']['artists'][0]['name'] == artist[0]:
                print(1000 - position, '.', track['track']['artists'][0]['name'], '-', track['track']['name'])

997 . ATN - Miss A Day - Original Mix
996 . Late Night Alumni - Empty Streets - Lumïsade Balearic Mix
993 . Ron van den Beuken - Timeless - Ron van den Beuken Remix
992 . Greg Downey - These Hands I Hold - Sean Tyas Remix
991 . M.I.K.E. - Chocolate Infusion - Original Mix
989 . Adam Nickey - Never Gone - Original Mix [Above & Beyond Respray]
987 . Salt Tank - Eugina - Michael Woods Remix
981 . A Force - Crystal Dawn [ASOT 254] - A Tribute To '99 Remix
969 . Myon & Shane 54 - Ibiza Sunrise - Classic Dub
967 . Neptune Project - Aztec - Original Mix
964 . Ava Mea - In The End - Original Mix
963 . Rodg - High On Life - Extended Mix
962 . Midway - Monkey Forest - Original Mix Edit
961 . Ramin Djawadi - Game Of Thrones Theme - Armin van Buuren Extended Remix
960 . Filterheadz - Yimanya - Original Mix
950 . Jody Wisternoff - The Bridge - Chicane Rework
943 . Probspot - Foreplay - Original Mix
940 . Selu Vibra - Stargazing [ASOT 224] - Original Mix
928 . Ernesto vs. Bastian - Dark Side Of The 

Note that we're only listing the artist on the track credits that's _only_ featured on that track. For example, "120. Darren Tate & Jono Grant – Shine (Let The Light Shine In)" is listed here but lists _only_ Darren Tate as the producer because Jono Grant also appears in "562. Jono Grant vs Mike Koglin  – Circuits".

## Tracks

Let's looks at some track-specific numbers now.

In which years were the tracks produced?

In [9]:
annual_total = defaultdict(int)

for track in top_1000_tracks:
    annual_total[track['track']['album']['release_date'][:4]] += 1

top_years = sorted(annual_total.items(), key=lambda k_v: k_v[1])
print(top_years)

[('1992', 1), ('1995', 1), ('1997', 2), ('1996', 5), ('1998', 5), ('1999', 7), ('2002', 12), ('2001', 14), ('2000', 18), ('2003', 23), ('2004', 25), ('2005', 32), ('2006', 37), ('2007', 38), ('2008', 45), ('2017', 49), ('2015', 50), ('2014', 54), ('2020', 55), ('2016', 55), ('2010', 56), ('2009', 62), ('2013', 64), ('2018', 64), ('2011', 66), ('2012', 68), ('2019', 92)]


In a graph:

In [10]:
source = pd.DataFrame.from_dict(top_years)

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', sort='-x', title='Year')
).properties(
    title="ASOT Top 1000 - Most-represented years",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

Might be better to see it sorted by year:

In [11]:
source = pd.DataFrame.from_dict(top_years)

bars = alt.Chart(source).mark_bar().encode(
    x=alt.X('1:Q', title='Plays'),
    y=alt.Y('0:N', title='Year')
).properties(
    title="ASOT Top 1000 - Yearly representation",
    width=600
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='1:Q'
)

bars + text

What are the oldest tracks in the list? Sorted by position.

In [12]:
for position, track in enumerate(top_1000_tracks):
    if int(track['track']['album']['release_date'][:4]) < 2000:
        track_artist = track['track']['artists'][0]['name']
        for artist in track['track']['artists'][1:]:
            track_artist += " & " + artist['name']
        print(1000 - position, '.', track_artist, '-', track['track']['name'], '- released', track['track']['album']['release_date'])

983 . Vincent de Moor - Flowtation - released 1996-01-01
887 . Niels Van Gogh - Pulverturm (Original) - released 1998-09-21
691 . Lost Witness & Lange - Happiness Happening - Lange Remix - released 1999-03-22
647 . Bedrock - Heaven Scent - Original Mix - released 1999-12-01
638 . Agnelli & Nelson - El Niño - released 1998-08-01
566 . Freefall - Skydive - released 1998-01-01
531 . Kyuss - Yeah - released 1992-01-01
499 . York - The Reachers of Civilisation - released 1999-10-20
414 . Faithless & Rollo Armstrong & Sister Bliss - Insomnia - released 1996
355 . Nalin & Kane - Beachball - Original Club Mix - released 1997-01-01
233 . Sasha - Xpander - released 1999-07-05
167 . Darude - Sandstorm - released 1999-01-01
127 . Three Drives On A Vinyl - Greece 2000 - Original Mix - released 1997-06-01
125 . ATB - 9 Pm - Till I Come - released 1998-10-26
113 . William Orbit & Ferry Corsten - Barber's Adagio for Strings - Ferry Corsten Remix - released 1995-01-17
99 . Armin van Buuren - Blue Fear 

A lot of tracks released in 2020 made the list, what are the most recent? Here's the tracks in the months leading up to the end of the year.

In [13]:
for position, track in enumerate(top_1000_tracks):
    if track['track']['album']['release_date'][:7] == '2020-09' or track['track']['album']['release_date'][:7] == '2020-10' or track['track']['album']['release_date'][:7] == '2020-11' or track['track']['album']['release_date'][:7] == '2020-12':
        track_artist = track['track']['artists'][0]['name']
        for artist in track['track']['artists'][1:]:
            track_artist += " & " + artist['name']
        print(1000 - position, '.', track_artist, '-', track['track']['name'], '- released', track['track']['album']['release_date'])

915 . Assaf & Cassandra Grey - Lost At Sea - released 2020-12-11
893 . Cosmic Gate & Andrew Bayer - The Launch - Extended Mix - released 2020-09-04
863 . ARTY & NK - Who Am I - released 2020-11-27
836 . Allen Watts & Gid Sedgwick - Another You - Extended Mix - released 2020-09-11
824 . Giuseppe Ottaviani & Sue McLaren - Not One Goodbye - Extended Mix - released 2020-09-25
446 . Above & Beyond & Zoë Johnston & Kyau & Albert - You Got To Go - Kyau & Albert Extended Mix - released 2020-10-01
205 . Aly & Fila & Plumb - Somebody Loves You - Extended Mix - released 2020-09-25


In which years were the tracks produced by the top five most-played artists produced? 

In [14]:
artist_avb_counter = defaultdict(int) # Tracks crediting Armin van Buuren
artist_ab_counter = defaultdict(int) # Tracks crediting Above & Beyond
artist_af_counter = defaultdict(int) # Tracks crediting Aly & Fila
artist_fc_counter = defaultdict(int) # Tracks crediting Ferry Corsten
artist_ar_counter = defaultdict(int) # Tracks crediting Andrew Rayel

for track in top_1000_tracks:
    for artist in track['track']['artists']:
        if artist['name'] == "Armin van Buuren":
            artist_avb_counter[track['track']['album']['release_date'][:4]] += 1
        elif artist['name'] == "Above & Beyond":
            artist_ab_counter[track['track']['album']['release_date'][:4]] += 1
        elif artist['name'] == "Aly & Fila":
            artist_af_counter[track['track']['album']['release_date'][:4]] += 1
        elif artist['name'] == "Ferry Corsten":
            artist_fc_counter[track['track']['album']['release_date'][:4]] += 1
        elif artist['name'] == "Andrew Rayel":
            artist_ar_counter[track['track']['album']['release_date'][:4]] += 1

# Sort by year and print the results
sorted_avb_years = sorted(artist_avb_counter.items(), key=lambda k_v: k_v[0])
sorted_ab_years = sorted(artist_ab_counter.items(), key=lambda k_v: k_v[0])
sorted_af_years = sorted(artist_af_counter.items(), key=lambda k_v: k_v[0])
sorted_fc_years = sorted(artist_fc_counter.items(), key=lambda k_v: k_v[0])
sorted_ar_years = sorted(artist_ar_counter.items(), key=lambda k_v: k_v[0])

print("Armin van Buuren:")
print(sorted_avb_years)
print("Above & Beyond:")
print(sorted_ab_years)
print("Aly & Fila:")
print(sorted_af_years)
print("Ferry Corsten:")
print(sorted_fc_years)
print("Andrew Rayel:")
print(sorted_ar_years)

Armin van Buuren:
[('1996', 1), ('1999', 1), ('2001', 1), ('2002', 3), ('2003', 4), ('2005', 3), ('2006', 5), ('2007', 1), ('2008', 8), ('2009', 6), ('2010', 13), ('2011', 5), ('2012', 6), ('2013', 17), ('2014', 3), ('2015', 10), ('2016', 6), ('2017', 7), ('2018', 7), ('2019', 16), ('2020', 3)]
Above & Beyond:
[('2003', 2), ('2004', 1), ('2005', 1), ('2006', 2), ('2007', 4), ('2008', 3), ('2009', 2), ('2010', 1), ('2011', 2), ('2012', 2), ('2013', 2), ('2014', 4), ('2015', 2), ('2016', 1), ('2017', 3), ('2018', 6), ('2019', 7), ('2020', 6)]
Aly & Fila:
[('2003', 1), ('2007', 1), ('2008', 2), ('2010', 1), ('2011', 1), ('2012', 3), ('2013', 1), ('2014', 6), ('2015', 2), ('2016', 3), ('2017', 3), ('2018', 3), ('2019', 5), ('2020', 2)]
Ferry Corsten:
[('1995', 1), ('2002', 1), ('2007', 1), ('2008', 1), ('2009', 4), ('2011', 5), ('2012', 1), ('2013', 1), ('2015', 1), ('2016', 1), ('2018', 6), ('2019', 1), ('2020', 1)]
Andrew Rayel:
[('2011', 1), ('2012', 4), ('2013', 4), ('2014', 4), ('2016

This would look nice in a [stacked bar chart](https://altair-viz.github.io/gallery/stacked_bar_chart.html#stacked-bar-chart), but I couldn't get the data arranged properly to create the chart.

What's the average BPM of tracks in the top 1,000?

In [15]:
total_bpm = 0

for track in top_1000_tracks:
    total_bpm += sp.audio_features(track['track']['uri'])[0]['tempo']

print(total_bpm/1000)

134.03799499999994


Maybe that's not so useful. How does the track BPM vary throughout the top 1,000? With #1,000 on the left, down to #1 on the right.

In [20]:
bpm = []
for track in top_1000_tracks:
    tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
    if tempo < 100 or tempo > 150: # "outliers", details below
        bpm.append(138)
    else:
        bpm.append(sp.audio_features(track['track']['uri'])[0]['tempo'])

x = np.arange(len(top_1000_tracks))   

source = pd.DataFrame({
  'track': x,
  'bpm': np.array(bpm)
})

source['138'] = 138

base = alt.Chart(source).mark_line().encode(
    alt.X('track'),
    alt.Y('bpm', scale=alt.Scale(domain=(100, 150))),
).properties(
    title="ASOT Top 1000 - BPM of track"
)

rule = alt.Chart(source).mark_rule(color='red').encode(
    y='138'
)

base + rule

Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=0R2BQmqtNBaajKeO9qU6sB:
 Max Retries, reason: too many 503 error responses

Not the best way to visualize it, how about a semi-interactive scatter plot? Mouseover for track position and BPM, zoom with the mousewheel. I couldn't figure out how to get track titles and artists in the tooltips.

In [None]:
detail = (
    alt.Chart(source)
    .mark_point()
    .encode(
        x=alt.X(
            "track:T",
        ),
        y=alt.Y(
            "bpm:Q",
            scale=alt.Scale(domain=(100, 150)),
        ),
        color="bpm",
        tooltip=['bpm', 'track']
    )
    .properties(width=600, height=400, title="BPM of ASOT Top 1000 -- detail view")
).interactive()

detail

There's a few "outliers" that kind of throw off the graph - let's look at the tracks in the top 1,000 with the lowest and highest BPMs.

In [None]:
for position, track in enumerate(top_1000_tracks):
    tempo = sp.audio_features(track['track']['uri'])[0]['tempo']
    if tempo < 125 or tempo > 141: # "outliers"
        track_artist = track['track']['artists'][0]['name']
        for artist in track['track']['artists'][1:]:
            track_artist += " & " + artist['name']
        print(1000 - position, '.', track_artist, '-', track['track']['name'], '-', tempo, 'BPM')

But this is not entirely right, right? [Beatport lists Popcorn as 138 BPM](https://www.beatport.com/track/popcorn-original-mix/10531322). Again, I've done nothing to address any inconsistencies.

## Now you!

I've hardly covered the most basic analyses, so I'll leave you with a CSV file of tracks and ["audio features" from Spotify](https://developer.spotify.com/documentation/web-api/reference/#object-audiofeaturesobject) so you can run the numbers yourself.

In [None]:
import csv

with open('../data/top-1000.csv', 'w', newline='') as csvfile:
    topreader = csv.writer(csvfile, delimiter=',',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    topreader.writerow(['position', 'artist', 'track', 'year', 'danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'id', 'uri', 'duration_ms', 'time_signature'])
    for position, track in enumerate(top_1000_tracks):
        # Get track artists
        track_artist = track['track']['artists'][0]['name']
        for artist in track['track']['artists'][1:]:
            track_artist += " & " + artist['name']
        audio_features = sp.audio_features(track['track']['uri'])[0]
        topreader.writerow([1000 - position, track_artist, track['track']['name'], track['track']['album']['release_date'][:4], audio_features['danceability'], audio_features['energy'], audio_features['key'], audio_features['loudness'], audio_features['speechiness'], audio_features['acousticness'], audio_features['instrumentalness'], audio_features['liveness'], audio_features['valence'], audio_features['tempo'], audio_features['id'], audio_features['uri'], audio_features['duration_ms'], audio_features['time_signature']])

The resulting file can be found in https://github.com/ScottBrenner/asot-jupyter/blob/1000/data/top-1000.csv - let me know what you make with it!