# Interacting With the Internet

Let's explore the Internet using Python.

We'll start with an incredibly useful and easy to use library called [requests](http://docs.python-requests.org/en/master/) which abstracts away the complexities of HTTP (the protocol used by your browser when you visit a web page).

## HTTP and HTML

In [1]:
import requests

Let's ask for the Wikipedia page on the Star Wars franchise:

In [2]:
response = requests.get('https://en.wikipedia.org/wiki/Star_Wars')
response

<Response [200]>

The 200 in the response is an [HTTP status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) that means that everything is OK.
Status codes are grouped into five groups:

* 1xx (Informational): The request was received, continuing process
* 2xx (Successful): The request was successfully received, understood, and accepted
* 3xx (Redirection): Further action needs to be taken in order to complete the request
* 4xx (Client Error): The request contains bad syntax or cannot be fulfilled
* 5xx (Server Error): The server failed to fulfill an apparently valid request

In [3]:
response.status_code

200

Let's take a look at what the server actually returned:

In [4]:
# just take a peek at the first 1000 characters because it's long
response.text[:1000]

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Star Wars - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Star_Wars","wgTitle":"Star Wars","wgCurRevisionId":864265876,"wgRevisionId":864265876,"wgArticleId":26678,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Wikipedia indefinitely semi-protected pages","Wikipedia indefinitely move-protected pages","Use mdy dates from March 2017","Pages using multiple image with auto scaled images","Official website different in Wikidata and Wikipedia","Wikipedia articles with BNF identifiers","Wikipedia articles with GND identifiers","Wikipedia articl

This is the raw [HTML](https://en.wikipedia.org/wiki/HTML) data that your browser would recieve from Wikipedia's server and then process and render it into the web page you're used to seeing.
It's a lot of data and a bit overwhelming, but luckily there's a really handy library for searching through HTML and finding the information that we need called [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).

In [5]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text)

Using my browser to inspect the source of the web page, I can see that there's a table with all of the Star Wars movies right after a heading with the id `Skywalker_saga`:

In [6]:
header = soup.find(id='Skywalker_saga')
header

<span class="mw-headline" id="Skywalker_saga">Skywalker saga</span>

Now I can skip to the next table element and grab everything from its first column.

In [7]:
# get next <table> element
table = header.find_next('table')

# get all rows (<tr> elements) from the table
rows = table.find_all('tr')

# skip the title row
rows = rows[1:]

# get the headers (<th> elements) from each row
headers = [row.find_next('th') for row in rows]

A quick note:

```python
x = [a*2 for a in b]
```

is a shorthand way of writing:

```python
x = []
for a in b:
    x.append(a*2)
```

In [8]:
# get the links (<a> elements) from each header
links = [h.find_next('a') for h in headers]

# get the text from inside the links e.g. ['Episode IV', 'A New Hope']
movie_titles = [l.find_all(text=True) for l in links]

# join titles together with a colon
movie_titles = [': '.join(title_parts) for title_parts in movie_titles]

movie_titles

['Episode IV: A New Hope',
 'Episode V: The Empire Strikes Back',
 'Episode VI: Return of the Jedi',
 'Episode I: The Phantom Menace',
 'Episode II: Attack of the Clones',
 'Episode III: Revenge of the Sith',
 'Episode VII: The Force Awakens',
 'Episode VIII: The Last Jedi',
 'Episode IX']

That was a lot of work just to get the information we wanted, though not nearly as much as if we had to write our own code to handle the HTTP request and to parse what we wanted from the HTML response.
The problem is that HTML is typically intended for your browser to render, not necessarily for you to dig through and extract bits of information.
Often when data is meant to be consumed by other programs, it will be exposed through an API (application programming interface) to make it simpler to write code to interact with.

## APIs

Luckily for us, there exists a [Star Wars API](https://swapi.co/documentation)!
APIs typically have different URLs to organize the information they provide.
Looking through the documentation, there's a URL for dealing with the films:

In [9]:
response = requests.get('https://swapi.co/api/films')
response.status_code

200

Success! Let's take a peek at the data returned:

In [10]:
response.text[:1000]

'{"count":7,"next":null,"previous":null,"results":[{"title":"A New Hope","episode_id":4,"opening_crawl":"It is a period of civil war.\\r\\nRebel spaceships, striking\\r\\nfrom a hidden base, have won\\r\\ntheir first victory against\\r\\nthe evil Galactic Empire.\\r\\n\\r\\nDuring the battle, Rebel\\r\\nspies managed to steal secret\\r\\nplans to the Empire\'s\\r\\nultimate weapon, the DEATH\\r\\nSTAR, an armored space\\r\\nstation with enough power\\r\\nto destroy an entire planet.\\r\\n\\r\\nPursued by the Empire\'s\\r\\nsinister agents, Princess\\r\\nLeia races home aboard her\\r\\nstarship, custodian of the\\r\\nstolen plans that can save her\\r\\npeople and restore\\r\\nfreedom to the galaxy....","director":"George Lucas","producer":"Gary Kurtz, Rick McCallum","release_date":"1977-05-25","characters":["https://swapi.co/api/people/1/","https://swapi.co/api/people/2/","https://swapi.co/api/people/3/","https://swapi.co/api/people/4/","https://swapi.co/api/people/5/","https://swapi.co

This response is not HTML but [JSON](https://www.json.org/), a popular format for web APIs as it's easy for both humans and code to read.
It's so common that `requests` has an easy way to convert it into a Python dictionary:

In [11]:
results = response.json()
results.keys()

dict_keys(['count', 'next', 'previous', 'results'])

It looks like what we need is under the 'results' key.
Let's take a peek at the data returned for one of the films:

In [12]:
films = results['results']
films[0]

{'title': 'A New Hope',
 'episode_id': 4,
 'opening_crawl': "It is a period of civil war.\r\nRebel spaceships, striking\r\nfrom a hidden base, have won\r\ntheir first victory against\r\nthe evil Galactic Empire.\r\n\r\nDuring the battle, Rebel\r\nspies managed to steal secret\r\nplans to the Empire's\r\nultimate weapon, the DEATH\r\nSTAR, an armored space\r\nstation with enough power\r\nto destroy an entire planet.\r\n\r\nPursued by the Empire's\r\nsinister agents, Princess\r\nLeia races home aboard her\r\nstarship, custodian of the\r\nstolen plans that can save her\r\npeople and restore\r\nfreedom to the galaxy....",
 'director': 'George Lucas',
 'producer': 'Gary Kurtz, Rick McCallum',
 'release_date': '1977-05-25',
 'characters': ['https://swapi.co/api/people/1/',
  'https://swapi.co/api/people/2/',
  'https://swapi.co/api/people/3/',
  'https://swapi.co/api/people/4/',
  'https://swapi.co/api/people/5/',
  'https://swapi.co/api/people/6/',
  'https://swapi.co/api/people/7/',
  'htt

We could grab all the titles:

In [13]:
[film['title'] for film in films]

['A New Hope',
 'Attack of the Clones',
 'The Phantom Menace',
 'Revenge of the Sith',
 'Return of the Jedi',
 'The Empire Strikes Back',
 'The Force Awakens']

It looks like this API doesn't have data for "The Last Jedi" yet, but that's ok.
It's a lot easier to work with than parsing HTML from the Wikipedia page!

Let's follow one of those character links to see where it leads:

In [14]:
person = requests.get('https://swapi.co/api/people/1/').json()
person

{'name': 'Luke Skywalker',
 'height': '172',
 'mass': '77',
 'hair_color': 'blond',
 'skin_color': 'fair',
 'eye_color': 'blue',
 'birth_year': '19BBY',
 'gender': 'male',
 'homeworld': 'https://swapi.co/api/planets/1/',
 'films': ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 'species': ['https://swapi.co/api/species/1/'],
 'vehicles': ['https://swapi.co/api/vehicles/14/',
  'https://swapi.co/api/vehicles/30/'],
 'starships': ['https://swapi.co/api/starships/12/',
  'https://swapi.co/api/starships/22/'],
 'created': '2014-12-09T13:50:51.644000Z',
 'edited': '2014-12-20T21:17:56.891000Z',
 'url': 'https://swapi.co/api/people/1/'}

It would be nice if we could abstract all of this code away.
Before we do, we have to learn a little bit about **classes**.

## OOP

Python enables developers to write code that adheres to the paradigm of [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming).
Simply put, it's a way of organizing data and the code that operates on that data together into an **object**.

A **class** is a definition of a type and an **object** is an instance of that class of objects.
Often functions on classes are referred to as **methods** and data is referred to as **properties**.

In [15]:
class Dog:
    # init is a special "dunder" (double underscore) method that initializes an object
    def __init__(self, name):
        # self is a reference to the instance of the object itself
        # name is a parameter we expect to receive
        
        # save the parameter as a property on this object
        self.name = name
        
    # let's define another method to do something with the data
    def whosagooddog(self):
        print(f'{self.name} is a good dog!')

In [16]:
d = Dog('Rover')

In [17]:
d.name

'Rover'

In [18]:
d.whosagooddog()

Rover is a good dog!


Classes can also **inherit** from one another so that we can write code once in more generalized classes and take advantage of it in more specialized classes:

In [19]:
class Car:
    def __init__(self, make, model):
        self.make = make
        self.model = model
        
    def honk(self):
        print('beep')
    
# Minivan is a subclass of Car
class Minivan(Car):
    def __init__(self, make, model, has_tv):
        # the special function super() gives us access to self as a member of its parent class
        # i.e. here as a Car. We can call into the Car's init function so we don't have to repeat that code
        super().__init__(make, model)
        self.has_tv = has_tv
        
    def honk(self):
        # here we are overriding the Car implementation of honk
        print('boop')
        
    def watch_dvd(self, movie):
        # here we are adding a method that is specific to Minivans
        if not self.has_tv:
            print('There\'s no TV!')
        else:
            print(f'Watching {movie}')

In [20]:
c = Car('Mitsubishi', 'Galant')
c.make

'Mitsubishi'

In [21]:
c.honk()

beep


In [22]:
v = Minivan('Honda', 'Odyssey', True)
v.model

'Odyssey'

In [23]:
v.honk()

boop


In [24]:
v.watch_dvd('Finding Nemo')

Watching Finding Nemo


## API Wrapper

Let's use what we've learned and make some classes to interact with the API.

In [25]:
class StarWarsAPI:
    def get_movies(self):
        return requests.get('https://swapi.co/api/films').json()['results']
    
    def get_people(self):
        return requests.get('https://swapi.co/api/people').json()['results']

In [26]:
api = StarWarsAPI()

In [27]:
api.get_movies()[0]['title']

'A New Hope'

In [28]:
api.get_people()[0]['name']

'Luke Skywalker'

Instead of working with dicts, let's create some container classes:

In [29]:
class Movie:
    def __init__(self, title, characters):
        self.title = title
        self.characters = characters

class Character:
    def __init__(self, name, movies):
        self.name = name
        self.movies = movies
        
class StarWarsAPI:
    def get_movies(self):
        movies = requests.get('https://swapi.co/api/films').json()['results']
        return [Movie(m['title'], m['characters']) for m in movies]
    
    def get_people(self):
        people = requests.get('https://swapi.co/api/people').json()['results']
        return [Character(p['name'], p['films']) for p in people]

In [30]:
api = StarWarsAPI()

In [31]:
movies = api.get_movies()

In [32]:
movies[0].title

'A New Hope'

In [33]:
movies[0].characters

['https://swapi.co/api/people/1/',
 'https://swapi.co/api/people/2/',
 'https://swapi.co/api/people/3/',
 'https://swapi.co/api/people/4/',
 'https://swapi.co/api/people/5/',
 'https://swapi.co/api/people/6/',
 'https://swapi.co/api/people/7/',
 'https://swapi.co/api/people/8/',
 'https://swapi.co/api/people/9/',
 'https://swapi.co/api/people/10/',
 'https://swapi.co/api/people/12/',
 'https://swapi.co/api/people/13/',
 'https://swapi.co/api/people/14/',
 'https://swapi.co/api/people/15/',
 'https://swapi.co/api/people/16/',
 'https://swapi.co/api/people/18/',
 'https://swapi.co/api/people/19/',
 'https://swapi.co/api/people/81/']

It would be nice if when we asked for the characters, it would go and get the information from the API so we didn't have to see links.

In [34]:
class Movie:
    def __init__(self, title, characters):
        self.title = title
        self.character_links = characters
        
    def characters(self):
        characters = [requests.get(link).json() for link in self.character_links]
        return [
            Character(c['name'], c['films'])
            for c in characters
        ]

class Character:
    def __init__(self, name, movies):
        self.name = name
        self.movie_links = movies
        
    def movies(self):
        movies = [requests.get(link).json() for link in self.movie_links]
        return [
            Movie(m['title'], m['characters'])
            for m in movies
        ]
        
class StarWarsAPI:
    def get_movies(self):
        movies = requests.get('https://swapi.co/api/films').json()['results']
        return [Movie(m['title'], m['characters']) for m in movies]
    
    def get_people(self):
        people = requests.get('https://swapi.co/api/people').json()['results']
        return [Character(p['name'], p['films']) for p in people]

In [35]:
api = StarWarsAPI()

In [36]:
movies = api.get_movies()

In [37]:
movies[0].title

'A New Hope'

In [38]:
characters = movies[0].characters()

In [39]:
[c.name for c in characters]

['Luke Skywalker',
 'C-3PO',
 'R2-D2',
 'Darth Vader',
 'Leia Organa',
 'Owen Lars',
 'Beru Whitesun lars',
 'R5-D4',
 'Biggs Darklighter',
 'Obi-Wan Kenobi',
 'Wilhuff Tarkin',
 'Chewbacca',
 'Han Solo',
 'Greedo',
 'Jabba Desilijic Tiure',
 'Wedge Antilles',
 'Jek Tono Porkins',
 'Raymus Antilles']

In [40]:
characters[0].name

'Luke Skywalker'

In [41]:
[m.title for m in characters[0].movies()]

['The Empire Strikes Back',
 'Revenge of the Sith',
 'Return of the Jedi',
 'A New Hope',
 'The Force Awakens']