# KEXP Playlist Scraping
**Michael Santoro - michael.santoro@du.edu**
## Introduction
Being a big fan of KEXP and especially the morning show I am going to use the api found here to scrape song information from the show. https://api.kexp.org/v2/

In [1]:
import requests
import pandas as pd

In [2]:
r = requests.get('https://api.kexp.org/v2/')
r.status_code

200

In [3]:
type(r.json())

dict

In [5]:
r.json()

{'hosts': 'https://api.kexp.org/v2/hosts/',
 'programs': 'https://api.kexp.org/v2/programs/',
 'shows': 'https://api.kexp.org/v2/shows/',
 'plays': 'https://api.kexp.org/v2/plays/',
 'timeslots': 'https://api.kexp.org/v2/timeslots/'}

## DJ Research

In [6]:
hosts = requests.get('https://api.kexp.org/v2/hosts')

In [11]:
count = hosts.json()['count']

In [12]:
djs = {'name':[],
      'id':[],
      'is_active':[]}
for i in range(0,120,20):
    hosts = requests.get('https://api.kexp.org/v2/hosts/?offset='+f'{i}')
    for entry in hosts.json()['results']:
        djs['name'].append(entry['name'])
        djs['id'].append(entry['id'])
        djs['is_active'].append(entry['is_active'])
dj_df = pd.DataFrame(djs)

In [13]:
dj_df.shape

(91, 3)

In [14]:
dj_df.to_csv('djData.csv')

## Programs Research

In [15]:
programs = requests.get('https://api.kexp.org/v2/programs')

In [16]:
programs.json()

{'count': 36,
 'next': 'https://api.kexp.org/v2/programs/?limit=20&offset=20',
 'previous': None,
 'results': [{'id': 26,
   'uri': 'https://api.kexp.org/v2/programs/26/',
   'name': '90.TEEN',
   'description': 'Rock,Eclectic,Variety Mix',
   'tags': 'Rock,Eclectic,Variety Mix',
   'is_active': True},
  {'id': 1,
   'uri': 'https://api.kexp.org/v2/programs/1/',
   'name': 'Audioasis',
   'description': "Audioasis is KEXP's long-running Northwest music show hosted by Sharlese Metcalf. Past hosts have included Jonathen Poneman, Jason Hughes, Scott Vanderpool, Stevie Zoom, Sean Nelson and Hannah Levin. John Richards helps produce the show. Audioasis delivers three hours of local and live bands from all areas and genres of the Northwest. On Audioasis, you'll hear the new, the old, the demos, the vinyl, and the future of music. Every first Saturday of the month you can catch Audioasis out in the city LIVE! Tune in Saturday from 6-9 PM.",
   'tags': 'Rock,Eclectic,Pacific Northwest',
   'is

In [17]:
programs.json()['count']

36

In [18]:
programs = {'name':[],
      'id':[],
      'description':[],
      'tags':[],
      'is_active':[]}
for i in range(0,60,20):
    prog = requests.get(f'https://api.kexp.org/v2/programs/?offset={i}')
    for entry in prog.json()['results']:
        programs['name'].append(entry['name'])
        programs['id'].append(entry['id'])
        programs['description'].append(entry['description'])
        programs['tags'].append(entry['tags'])
        programs['is_active'].append(entry['is_active'])
programs_df = pd.DataFrame(programs)

In [19]:
programs_df

Unnamed: 0,name,id,description,tags,is_active
0,90.TEEN,26,"Rock,Eclectic,Variety Mix","Rock,Eclectic,Variety Mix",True
1,Audioasis,1,Audioasis is KEXP's long-running Northwest mus...,"Rock,Eclectic,Pacific Northwest",True
2,Best Ambiance,27,Went off the air in 2011.,"African,World",False
3,Drive Time,33,,"Rock,Eclectic,Variety Mix",True
4,Early,32,,"Rock,Eclectic,Variety Mix",True
5,El Sonido,2,El Sonido is a three hour trip around the dive...,"Latin,World",True
6,Expansions,3,Expansions began in late 1995 as the brainchil...,Electronic,True
7,Friday Night,22,"Electronic, Soul, R&B, Hip-Hop, Rock","Rock,Eclectic,Variety Mix",True
8,Jazz Theatre,4,I have a personal mission statement in mind fo...,Jazz,True
9,Live on KEXP,36,,"Live,Eclectic,Variety Mix",True


In [20]:
programs_df.to_csv('programData.csv')

## Show Research

In [2]:
shows = requests.get('https://api.kexp.org/v2/shows/')

In [3]:
shows.json()['results'][0]

{'id': 54494,
 'uri': 'https://api.kexp.org/v2/shows/54494/',
 'program': 14,
 'program_uri': 'https://api.kexp.org/v2/programs/14/',
 'hosts': [55],
 'host_uris': ['https://api.kexp.org/v2/hosts/55/'],
 'program_name': 'The Afternoon Show',
 'program_tags': 'Rock,Eclectic,Variety Mix',
 'host_names': ['Larry Mizell, Jr.'],
 'tagline': 'Happy Friday!',
 'image_uri': 'https://www.kexp.org/media/filer_public/5e/ed/5eed57ed-2169-45b8-8605-266712b6eee3/larry_mizell_jr_thumbnail.png',
 'start_time': '2022-08-12T12:59:26-07:00'}

***Shows will need to be requerried as data is forever updated.***</br>
My plan is to pull data by year, for testing I will start with last year.

In [4]:
shows = requests.get('https://api.kexp.org/v2/shows/?start_time_after=2021-01-01T00:00:00&start_time_before=2022-01-01T00:00:00')

In [5]:
shows.json()['count']

3288

In [6]:
showsDict = {'id':[],
             'program_id':[],
             'hosts_ids':[],
             'program_name':[],
             'start_time':[]}
for i in range(0,3300,20):
    print(f'fetching: {i}')
    shows = requests.get(f'https://api.kexp.org/v2/shows/?start_time_after=2021-01-01T00:00:00&start_time_before=2022-01-01T00:00:00&offset={i}')
    for entry in shows.json()['results']:
        showsDict['id'].append(entry['id'])
        showsDict['program_id'].append(entry['program'])
        showsDict['hosts_ids'].append(entry['hosts'])
        showsDict['program_name'].append(entry['program_name'])
        showsDict['start_time'].append(entry['start_time'])
shows_df = pd.DataFrame(showsDict)

fetching: 0
fetching: 20
fetching: 40
fetching: 60
fetching: 80
fetching: 100
fetching: 120
fetching: 140
fetching: 160
fetching: 180
fetching: 200
fetching: 220
fetching: 240
fetching: 260
fetching: 280
fetching: 300
fetching: 320
fetching: 340
fetching: 360
fetching: 380
fetching: 400
fetching: 420
fetching: 440
fetching: 460
fetching: 480
fetching: 500
fetching: 520
fetching: 540
fetching: 560
fetching: 580
fetching: 600
fetching: 620
fetching: 640
fetching: 660
fetching: 680
fetching: 700
fetching: 720
fetching: 740
fetching: 760
fetching: 780
fetching: 800
fetching: 820
fetching: 840
fetching: 860
fetching: 880
fetching: 900
fetching: 920
fetching: 940
fetching: 960
fetching: 980
fetching: 1000
fetching: 1020
fetching: 1040
fetching: 1060
fetching: 1080
fetching: 1100
fetching: 1120
fetching: 1140
fetching: 1160
fetching: 1180
fetching: 1200
fetching: 1220
fetching: 1240
fetching: 1260
fetching: 1280
fetching: 1300
fetching: 1320
fetching: 1340
fetching: 1360
fetching: 1380
fetchi

In [7]:
shows_df

Unnamed: 0,id,program_id,hosts_ids,program_name,start_time
0,52491,12,[87],Street Sounds,2021-12-31T22:00:41-08:00
1,52490,12,[87],Street Sounds,2021-12-31T22:00:23-08:00
2,52489,12,[87],Street Sounds,2021-12-31T21:59:26-08:00
3,52488,22,[53],Friday Night,2021-12-31T19:01:00-08:00
4,52486,33,[50],Drive Time,2021-12-31T16:05:00-08:00
...,...,...,...,...,...
3283,49207,14,[50],The Afternoon Show,2021-01-01T13:01:29-08:00
3284,49206,15,[4],The Midday Show,2021-01-01T10:00:08-08:00
3285,49205,16,[19],The Morning Show,2021-01-01T07:01:28-08:00
3286,49204,32,[11],Early,2021-01-01T05:00:57-08:00


In [8]:
shows_df.to_csv('showsData_2021.csv')

In [9]:
shows_df[shows_df.program_name=='The Morning Show']

Unnamed: 0,id,program_id,hosts_ids,program_name,start_time
8,52483,16,[1],The Morning Show,2021-12-31T07:02:44-08:00
16,52475,16,[1],The Morning Show,2021-12-30T07:02:37-08:00
24,52467,16,[43],The Morning Show,2021-12-29T07:03:28-08:00
32,52459,16,[90],The Morning Show,2021-12-28T07:02:31-08:00
40,52451,16,[44],The Morning Show,2021-12-27T07:00:08-08:00
...,...,...,...,...,...
3231,49259,16,[43],The Morning Show,2021-01-07T07:01:14-08:00
3239,49251,16,[43],The Morning Show,2021-01-06T07:00:51-08:00
3247,49243,16,[43],The Morning Show,2021-01-05T07:00:44-08:00
3255,49235,16,[43],The Morning Show,2021-01-04T07:00:02-08:00


## Plays Research

In [1]:
shows_df

NameError: name 'shows_df' is not defined

In [None]:
https://api.kexp.org/v2/plays/?show_ids=52475

Filters for Ref: https://api.kexp.org/v2/plays/?airdate_after=&airdate_before=&has_comment=&exclude_airbreaks=&show_ids=&host_ids=&song=&song_exact=&artist=&artist_exact=&album=&album_exact=&label=&label_exact=&recording_id=&ordering=

In [38]:
plays = requests.get(r'https://api.kexp.org/v2/plays/',params={'airdate_after':r'2022-08-10T07:00:00-07:00', 'airdate_before':r'2022-08-10T08:00:00-07:00'})

In [40]:
plays.json()

{'next': 'https://api.kexp.org/v2/plays/?airdate_after=2022-08-10T07%3A00%3A00-07%3A00&airdate_before=2022-08-10T08%3A00%3A00-07%3A00&limit=20&offset=20',
 'previous': None,
 'results': [{'id': 3076290,
   'uri': 'https://api.kexp.org/v2/plays/3076290/',
   'airdate': '2022-08-10T07:58:07-07:00',
   'show': 54475,
   'show_uri': 'https://api.kexp.org/v2/shows/54475/',
   'image_uri': '',
   'thumbnail_uri': '',
   'comment': '',
   'play_type': 'airbreak'},
  {'id': 3076289,
   'uri': 'https://api.kexp.org/v2/plays/3076289/',
   'airdate': '2022-08-10T07:53:41-07:00',
   'show': 54475,
   'show_uri': 'https://api.kexp.org/v2/shows/54475/',
   'image_uri': 'https://ia600904.us.archive.org/23/items/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8-22584905643_thumb500.jpg',
   'thumbnail_uri': 'https://ia800904.us.archive.org/23/items/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8-22584905643_thumb250.jpg',
   'song':

In [25]:
plays.json()['results'][0]

{'id': 3076290,
 'uri': 'https://api.kexp.org/v2/plays/3076290/',
 'airdate': '2022-08-10T07:58:07-07:00',
 'show': 54475,
 'show_uri': 'https://api.kexp.org/v2/shows/54475/',
 'image_uri': '',
 'thumbnail_uri': '',
 'comment': '',
 'play_type': 'airbreak'}

In [27]:
plays.json()['results'][1]

{'id': 3076289,
 'uri': 'https://api.kexp.org/v2/plays/3076289/',
 'airdate': '2022-08-10T07:53:41-07:00',
 'show': 54475,
 'show_uri': 'https://api.kexp.org/v2/shows/54475/',
 'image_uri': 'https://ia600904.us.archive.org/23/items/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8-22584905643_thumb500.jpg',
 'thumbnail_uri': 'https://ia800904.us.archive.org/23/items/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8/mbid-8838d557-42df-4ebd-88a4-f74cbc33e8f8-22584905643_thumb250.jpg',
 'song': 'In Circles',
 'track_id': '6821cd55-295f-3ad0-b507-21784acbb711',
 'recording_id': '91872a9f-6640-401c-8b9c-4cb8241f3e1e',
 'artist': 'Sunny Day Real Estate',
 'artist_ids': ['86b24e8f-a4d9-4c84-83ee-fde0d14ad9fa'],
 'album': 'Diary',
 'release_id': '8838d557-42df-4ebd-88a4-f74cbc33e8f8',
 'release_group_id': '12e88251-2cde-3bea-9cee-2d8c7fe65039',
 'labels': ['Sub Pop Records'],
 'label_ids': ['38dc88de-7720-4100-9d5b-3cdc41b0c474'],
 'release_date': '1994-05-10',
 'rota

In [None]:
https://api.kexp.org/v2/plays/?airdate_after=&airdate_before=&has_comment=&exclude_airbreaks=&show_ids=&host_ids=&song=&song_exact=&artist=&artist_exact=&album=&album_exact=&label=&label_exact=&recording_id=&ordering=

In [33]:
import urllib

In [34]:
url = 'https://api.kexp.org/v2/plays/'
params = 'airdate_after=2022-08-10T07:00:00&airdate_before=2022-08-10T08:00:00'
qry = urllib.urlencode(params).replace('%3A', ':')
s = requests.Session()
req = requests.Request(method='GET', url=url)
prep = req.prepare()
prep.url = url + qry
r = s.send(prep)

AttributeError: module 'urllib' has no attribute 'urlencode'