# Get Data Using API - no key

API stands for Application Programming Interface. On a basic level, it allows a user to "talk" to another server and request information. The API receives the request and responds back with the information.

In [1]:
import requests  #similar to urllib, this library allows a computer to ping a website
import json      #library to handle JSON formatted data

### The Walking Dead Episode Data via TVMaze API

In this example, we do not need an API key (a method of authentication) in order to request data. So think of this method as being similar to web scraping but from the back end.

In [2]:
#URL to TVMaze API
url = r"http://api.tvmaze.com/singlesearch/shows?q=the-walking-dead&embed=episodes"

In [3]:
#the get function checks to make sure that the website/server is responding back
#200 means that we're good
#https://www.restapitutorial.com/httpstatuscodes.html
resp = requests.get(url)
resp

<Response [200]>

In [4]:
#send a request to the website to return back text data from the API
#returns data as JSON string
str_data = resp.text
str_data



In [5]:
#loads function reversed dictionary order
#dictionary objects are unordered in general
WDdata = json.loads(str_data)
WDdata

{'_embedded': {'episodes': [{'_links': {'self': {'href': 'http://api.tvmaze.com/episodes/4095'}},
    'airdate': '2010-10-31',
    'airstamp': '2010-11-01T02:00:00+00:00',
    'airtime': '22:00',
    'id': 4095,
    'image': {'medium': 'http://static.tvmaze.com/uploads/images/medium_landscape/0/2104.jpg',
     'original': 'http://static.tvmaze.com/uploads/images/original_untouched/0/2104.jpg'},
    'name': 'Days Gone Bye',
    'number': 1,
    'runtime': 60,
    'season': 1,
    'summary': '<p>Rick searches for his family after emerging from a coma into a world terrorized by the walking dead. Morgan and Duane, whom he meets along the way, help teach Rick the new rules for survival.</p>',
    'url': 'http://www.tvmaze.com/episodes/4095/the-walking-dead-1x01-days-gone-bye'},
   {'_links': {'self': {'href': 'http://api.tvmaze.com/episodes/4096'}},
    'airdate': '2010-11-07',
    'airstamp': '2010-11-08T03:00:00+00:00',
    'airtime': '22:00',
    'id': 4096,
    'image': {'medium': 'http

In [6]:
#verify that JSON object is one big dictionary
type(WDdata)

dict

In [7]:
#first level keys in JSON object
WDdata.keys()

dict_keys(['id', 'url', 'name', 'type', 'language', 'genres', 'status', 'runtime', 'premiered', 'officialSite', 'schedule', 'rating', 'weight', 'network', 'webChannel', 'externals', 'image', 'summary', 'updated', '_links', '_embedded'])

In [8]:
#dumps function reverses order again
#can currently see correct order of objects
print(json.dumps(WDdata,indent=4))

{
    "id": 73,
    "url": "http://www.tvmaze.com/shows/73/the-walking-dead",
    "name": "The Walking Dead",
    "type": "Scripted",
    "language": "English",
    "genres": [
        "Drama",
        "Action",
        "Horror"
    ],
    "status": "Running",
    "runtime": 60,
    "premiered": "2010-10-31",
    "officialSite": "http://www.amc.com/shows/the-walking-dead",
    "schedule": {
        "time": "21:00",
        "days": [
            "Sunday"
        ]
    },
    "rating": {
        "average": 8.2
    },
    "weight": 100,
    "network": {
        "id": 20,
        "name": "AMC",
        "country": {
            "name": "United States",
            "code": "US",
            "timezone": "America/New_York"
        }
    },
    "webChannel": null,
    "externals": {
        "tvrage": 25056,
        "thetvdb": 153021,
        "imdb": "tt1520211"
    },
    "image": {
        "medium": "http://static.tvmaze.com/uploads/images/medium_portrait/177/444593.jpg",
        "original": "

In [11]:
#set list of episodes to variable
#will cycle (iterate) through the list to get value of keys in episodes
episodes = WDdata['_embedded']['episodes']

In [12]:
#a single episode
#dict['key']['key'][index]
#dictionary name, dictionary key, dictionary key, then list index
WDdata['_embedded']['episodes'][0]

{'_links': {'self': {'href': 'http://api.tvmaze.com/episodes/4095'}},
 'airdate': '2010-10-31',
 'airstamp': '2010-11-01T02:00:00+00:00',
 'airtime': '22:00',
 'id': 4095,
 'image': {'medium': 'http://static.tvmaze.com/uploads/images/medium_landscape/0/2104.jpg',
  'original': 'http://static.tvmaze.com/uploads/images/original_untouched/0/2104.jpg'},
 'name': 'Days Gone Bye',
 'number': 1,
 'runtime': 60,
 'season': 1,
 'summary': '<p>Rick searches for his family after emerging from a coma into a world terrorized by the walking dead. Morgan and Duane, whom he meets along the way, help teach Rick the new rules for survival.</p>',
 'url': 'http://www.tvmaze.com/episodes/4095/the-walking-dead-1x01-days-gone-bye'}

In [13]:
#verify which keys we can get information from per episode
WDdata['_embedded']['episodes'][0].keys()

dict_keys(['id', 'url', 'name', 'season', 'number', 'airdate', 'airtime', 'airstamp', 'runtime', 'image', 'summary', '_links'])

In [14]:
#set empty lists to hold each feature's information

epnamels = [] #episode name
seasonls = [] #season number
epnumls = []  #episode number
datels = []   #airdate
timels = []   #airtime
runls = []    #runtime
epsumls = []  #summary

In [15]:
#make a function to remove <p> and </p> tags from summary text

def cleanText(text):
    import re
    
    clean = re.compile('<.*?>') #regular expression that looks for any pattern in a string that has <>
    return re.sub(clean, '', text)

In [16]:
#test cleanText function
teststr = WDdata['_embedded']['episodes'][0]['summary']

cleanText(teststr)

'Rick searches for his family after emerging from a coma into a world terrorized by the walking dead. Morgan and Duane, whom he meets along the way, help teach Rick the new rules for survival.'

In [17]:
#fill lists with data - ERROR HANDLING FOR EXCEPTIONS...

for episode in episodes:
    
    epnamels.append(episode['name'])
    seasonls.append(episode['season'])
    epnumls.append(episode['number'])
    datels.append(episode['airdate'])
    timels.append(episode['airtime'])
    runls.append(episode['runtime'])
    
    #some episodes do not have a summary in them; causes error when appending to list
    #will assign a value of None to append to list if episode summary raises error
    try:
        text = cleanText(episode['summary'])
    except:
        text = None
        
    epsumls.append(text)

In [18]:
#cleanText function worked
epsumls[:5]

['Rick searches for his family after emerging from a coma into a world terrorized by the walking dead. Morgan and Duane, whom he meets along the way, help teach Rick the new rules for survival.',
 'Rick unknowingly causes a group of survivors to be trapped by walkers. The group dynamic devolves from accusations to violence, as Rick must confront an enemy far more dangerous than the undead.',
 "Rick makes a decision to go back to Atlanta to retrieve the bag of guns and save a man's life. Lori and Shane must deal with the surprising return of someone they thought was dead.",
 "Rick's mission to Atlanta is jeopardized when things go awry. Jim becomes unhinged in camp.",
 'Rick leads the group to the CDC after the attack. Jim must make a terrible life and death decision']

In [19]:
#verify that each list has same number of items
print(len(epnamels))
print(len(seasonls))
print(len(epnumls))
print(len(datels))
print(len(timels))
print(len(runls))
print(len(epsumls))

131
131
131
131
131
131
131


In [20]:
#zip all lists together and make one big list of lists
TWDlist = list(zip(epnamels, seasonls, epnumls, datels, timels, runls, epsumls))

colnames = ['title', 'season', 'number', 'airdate', 'airtime', 'runtime', 'summary']

In [21]:
#make list into a dataframe

import pandas as pd

df = pd.DataFrame(TWDlist, columns=colnames)

df.head()

Unnamed: 0,title,season,number,airdate,airtime,runtime,summary
0,Days Gone Bye,1,1,2010-10-31,22:00,60,Rick searches for his family after emerging fr...
1,Guts,1,2,2010-11-07,22:00,60,Rick unknowingly causes a group of survivors t...
2,Tell It to the Frogs,1,3,2010-11-14,22:00,60,Rick makes a decision to go back to Atlanta to...
3,Vatos,1,4,2010-11-21,22:00,60,Rick's mission to Atlanta is jeopardized when ...
4,Wildfire,1,5,2010-11-28,22:00,60,Rick leads the group to the CDC after the atta...
