# Download last.fm listening history

  - To see this analysis live, check out my article ["Analyzing Last.fm Listening History"](http://geoffboeing.com/2016/05/analyzing-lastfm-history/)
  - My last.fm page: http://www.last.fm/user/gboeing
  - API documentation: http://www.last.fm/api
  - For anything more complicated, you might use this Python wrapper for the API: https://github.com/pylast/pylast
  
This tool separately downloads your all-time most played tracks, artists, and albums. Then it downloads all of your scrobbles in order of recency. Each of these 4 data sets are saved to separate CSV files. It downloads the "all-time most" data separately because (at least for my data) my massive iTunes history scrobble-upload in 2007 is included in the all-time most played tracks/artists/albums but is excluded from the recent tracks API endpoint. For accurate analysis of my all-time scrobbles, I need to look at those separate all-time lists, or else 4 years of listening history (from iTunes) are ignored in the calculations.

In the first cell, replace the "from keys import..." line of code with two new lines of code (replace placeholder values with your actual values):

```python
key=YOUR-LASTFM-API-KEY
username=YOUR-LASTFM-USERNAME
```

In [1]:
import requests, json, time, pandas as pd

key='edfd8819fb5684d05911d64873c9f3d0'
username='shemer77'

In [2]:
# how long to pause between consecutive API requests
pause_duration = 0.2

## First get your all-time most played tracks, artists, and albums

In [3]:
url = 'https://ws.audioscrobbler.com/2.0/?method=user.get{}&user={}&api_key={}&limit={}&extended={}&page={}&format=json'
limit = 200 #api lets you retrieve up to 200 records per call
extended = 1 #api lets you retrieve extended data for each track, 0=no, 1=yes
page = 1 #page of results to start retrieving at

In [4]:
method = 'toptracks'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
track_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['track']:
    artist_names.append(item['artist']['name'])
    track_names.append(item['name'])
    play_counts.append(item['playcount'])

top_tracks = pd.DataFrame()
top_tracks['artist'] = artist_names
top_tracks['track'] = track_names
top_tracks['play_count'] = play_counts
top_tracks.to_csv('C:\\Users\\agc\\Documents\\GitHub\\data-visualization\lastfm-listening-history\\data\\lastfm_top_tracks.csv', index=None, encoding='utf-8')
top_tracks.head()

Unnamed: 0,artist,track,play_count
0,Wolf Parade,Modern World,133
1,The Kinks,Waterloo Sunset,126
2,Badly Drawn Boy,Once Around the Block,117
3,The Zombies,This Will Be Our Year,113
4,The Kinks,Days,104


In [5]:
method = 'topartists'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['artist']:
    artist_names.append(item['name'])
    play_counts.append(item['playcount'])

top_artists = pd.DataFrame()
top_artists['artist'] = artist_names
top_artists['play_count'] = play_counts
top_artists.to_csv('C:\\Users\\agc\\Documents\\GitHub\\data-visualization\\lastfm-listening-history\\data\\lastfm_top_artists.csv', index=None, encoding='utf-8')
top_artists.head()

Unnamed: 0,artist,play_count
0,The Kinks,2718
1,David Bowie,2588
2,The Beatles,2431
3,Belle and Sebastian,2292
4,Radiohead,2152


In [6]:
method = 'topalbums'
request_url = url.format(method, username, key, limit, extended, page)
artist_names = []
album_names = []
play_counts = []
response = requests.get(request_url).json()
for item in response[method]['album']:
    artist_names.append(item['artist']['name'])
    album_names.append(item['name'])
    play_counts.append(item['playcount'])

top_albums = pd.DataFrame()
top_albums['artist'] = artist_names
top_albums['album'] = album_names
top_albums['play_count'] = play_counts
top_albums.to_csv('C:\\Users\\agc\\Documents\\GitHub\\data-visualization\\lastfm-listening-history\\data\\lastfm_top_albums.csv', index=None, encoding='utf-8')
top_albums.head()

Unnamed: 0,artist,album,play_count
0,Silverstein,A Shipwreck in the Sand (Deluxe Edition),129
1,Frank Black,Teenager of the Year,703
2,Devo,Pioneers Who Got Scalped,690
3,The Zombies,Odessey and Oracle,675
4,Badly Drawn Boy,The Hour of Bewilderbeast,611


## Now get all your scrobbles

Last.fm provides this 'recenttracks' API method to get 'all' scrobbles. However, it seems to be pretty spotty for data from circa 2007. The best way to determine top tracks, artists, albums is with the cells above. However, the code below retrieves time series data of all scrobbles (but with the caveat of spotty data from 2007 and earlier).

Sample URL: https://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user=gboeing&api_key={}&limit=1&extended=0&page=1&format=json

In [7]:
def get_scrobbles(method='recenttracks', username=username, key=key, limit=200, extended=0, page=1, pages=0):
    '''
    method: api method
    username/key: api credentials
    limit: api lets you retrieve up to 200 records per call
    extended: api lets you retrieve extended data for each track, 0=no, 1=yes
    page: page of results to start retrieving at
    pages: how many pages of results to retrieve. if 0, get as many as api can return.
    '''
    # initialize url and lists to contain response fields
    url = 'https://ws.audioscrobbler.com/2.0/?method=user.get{}&user={}&api_key={}&limit={}&extended={}&page={}&format=json'
    responses = []
    artist_names = []
    artist_mbids = []
    album_names = []
    album_mbids = []
    track_names = []
    track_mbids = []
    timestamps = []
    
    # make first request, just to get the total number of pages
    request_url = url.format(method, username, key, limit, extended, page)
    response = requests.get(request_url).json()
    total_pages = int(response[method]['@attr']['totalPages'])
    if pages > 0:
        total_pages = min([total_pages, pages])
        
    print('{} total pages to retrieve'.format(total_pages))
    
    # request each page of data one at a time
    for page in range(1, int(total_pages) + 1, 1):
        if page % 10 == 0: print(page, end=' ')
        time.sleep(pause_duration)
        request_url = url.format(method, username, key, limit, extended, page)
        responses.append(requests.get(request_url))
    
    # parse the fields out of each scrobble in each page (aka response) of scrobbles
    for response in responses:
        scrobbles = response.json()
        for scrobble in scrobbles[method]['track']:
            # only retain completed scrobbles (aka, with timestamp and not 'now playing')
            if 'date' in scrobble.keys():
                artist_names.append(scrobble['artist']['#text'])
                artist_mbids.append(scrobble['artist']['mbid'])
                album_names.append(scrobble['album']['#text'])
                album_mbids.append(scrobble['album']['mbid'])
                track_names.append(scrobble['name'])
                track_mbids.append(scrobble['mbid'])
                timestamps.append(scrobble['date']['uts'])
                
    # create and populate a dataframe to contain the data
    df = pd.DataFrame()
    df['artist'] = artist_names
    df['artist_mbid'] = artist_mbids
    df['album'] = album_names
    df['album_mbid'] = album_mbids
    df['track'] = track_names
    df['track_mbid'] = track_mbids
    df['timestamp'] = timestamps
    df['datetime'] = pd.to_datetime(df['timestamp'].astype(int), unit='s')
    
    return df

In [8]:
# get all scrobbled tracks ever, in order of recency (pages=0 to get all)
scrobbles = get_scrobbles(pages=0)

1142 total pages to retrieve
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750 760 770 780 790 800 810 820 830 840 850 860 870 880 890 900 910 920 930 940 950 960 970 980 990 1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 1110 1120 1130 1140


In [9]:
# save the dataset
scrobbles.to_csv('data/lastfm_scrobbles.csv', index=None, encoding='utf-8')
print '{:,} total rows'.format(len(scrobbles))
scrobbles.head()

228,210 total rows


Unnamed: 0,artist,artist_mbid,album,album_mbid,track,track_mbid,timestamp,datetime
0,Prince,cdc0fff7-54cf-4052-a283-319b648670fd,Purple Rain,,When Doves Cry,6335f70f-d1e2-4f26-93ee-134041adb37d,1462580202,2016-05-07 00:16:42
1,Prince,cdc0fff7-54cf-4052-a283-319b648670fd,Purple Rain,,Darling Nikki,2a7ee1cf-a14c-4ca2-9fde-2f018fcf871c,1462579947,2016-05-07 00:12:27
2,Prince,cdc0fff7-54cf-4052-a283-319b648670fd,Purple Rain,,Computer Blue,834e3b58-9951-4573-8430-3aa5cbc47eb9,1462579707,2016-05-07 00:08:27
3,Prince,cdc0fff7-54cf-4052-a283-319b648670fd,Purple Rain,,The Beautiful Ones,bb4bf2bb-ff65-48f4-a052-0904e840816a,1462579392,2016-05-07 00:03:12
4,Prince,cdc0fff7-54cf-4052-a283-319b648670fd,Purple Rain,,Take Me With U,104f4d27-eaac-48e3-9272-2fa81ade68a1,1462579158,2016-05-06 23:59:18
