# Working With Data From the Internet

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/15_reques.ipynb">Link to interactive slides on Google Colab</a></small>

# Where is the International Space Station?

# [Right here](http://api.open-notify.org/iss-now.json)

Does that data format look familiar?

It's JSON!

## Requests library

We can fetch and work with this data from Python.

In [None]:
import requests
data = requests.get('http://api.open-notify.org/iss-now.json').json()
print(data)
print(f"The ISS is at lat: {data['iss_position']['latitude']}, long: {data['iss_position']['longitude']}")

## Networking on the Internet

Let's dig into what it means to "send a request" over the internet.

This is a 30,000 foot overview - many details are glossed over.

## Anatomy of a URL

`http://video.google.com/videoplay?docid=1234&hl=en`

* `http://`: The "protocol", or "scheme"
   * This specifies the format of the request
* `video.google.com`: The "domain name", or a unique name for a location on the internet
   * Starts after the protocol, and after `://`
* `videoplay`: The "path", or the name of the resource being accessed
   * Starts after the end of the domain, and after a `/`
* `docid=1234&hl=en`: The "query", or parameters related to a request
   * Starts at the end of the path, and after a `?`
   * `docid=1234` and `hl=en` are each individual "parameters", and parameters are always separated by `&`

## Anatomy of our URL

`http://api.open-notify.org/iss-now.json`

* `http://`: protocol
* `api.open-notify.org`: domain
* `iss-now.json`: path

## IP Addresses and Domain Names

Every computer connected to the internet has an IP address. 

Every domain name has one or more IP addresses associated with it.

The first step of sending a request is to translate the domain to an IP address.


## Mailing letters

Let's use an analogy: sending a request over the internet is like mailing someone a letter.

A domain is an address that you understand, like "Joe's house".

An IP address is like the actual address of my house, including all the necessary details, such as postal code.

## DNS

**DNS** (Domain Name System) is like a big dictionary that maps from domain name to IP address.

The owner of a domain has the ability to specify which IP address(es) a domain maps to.

DNS records are easily accessible: https://traceroute-online.com/check-dns-records/

All routing of traffic on the internet is done using IP addresses.

## Address books

DNS is like your address book - it translates from a "place name" to a physical address that the postal service recognizes.

You can't mail a letter to "Joe's house", but you can mail a letter to "123 Python Lane, Boston MA, 02118" (not my real address).

## IP Routing

Routing traffic by IP address is beyond the scope of this lecture, but requests will typically be passed between many servers on the way to arriving at the specified IP address.

Many tools allow you to trace the path of a request: https://traceroute-online.com/

## Postal network 

IP routing is like the network of trucks, planes, trains, etc that are used to get your letter to its final destination. 

Your letter might stop at multiple other post offices along the way.

No one along the way knows or cares what is **in** your letter. They're just delivering an envelope.

## Load balancing

Eventually, your request ends up at a server with the IP address associated with the domain you requested.

Those first servers are often "load balancers", which route requests to other servers, which can actually respond to the requests.

This routing is different than IP routing, but we don't need to worry about the difference.

Eventually, the request is routed to a server that can generate a response.

## Campus mail room

If you mail a letter to me at my campus address, the postal service delivers it to the mail room.

Load balancing, and other routing that happens after delivery to a DNS-registered IP, is like the mail room staff on campus getting the letter to my personal mailbox.

## HTTP Requests

Our protocol is "http", which means we are making an "HTTP request"

**HTTP** stands for "HyperText Transport Protocol"

It is the protocol (agreed upon format) for the majority of requests on the internet.

Other common protocols include: 
* `ftp` (file transfer)
* `smtp` and `imap` (email)
* `udp` (real time communication)

## The Language of Your Letter

HTTP is like the language the letter inside the envelope is written in.

It doesn't have anything to do with how the letter is routed or handled by the postal service, but it is important that your recipient knows how to interpret the letter we wrote.

## Anatomy of an HTTP Request

An HTTP request is just a blob of text describing our request.

It consists of 3 parts:
1. Request line
   * Specifies the HTTP method, and the resource being requested
2. Headers (Optional)
   * Directives on how to handle the request
3. Body (optional)
   * The payload of the message, if applicable

Here's what our HTTP request looks like when sent from a browser. It has 8 headers and no body:

```
GET /iss-now.json HTTP/1.1
Accept: text/html,application/xhtml+xml,[...]
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Host: api.open-notify.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) [...]
```

## HTTP Methods

HTTP requests always have a "method".

There are several methods, but the 2 most common are "GET" and "POST".

**GET** requests are typically used to retrieve data.

**POST** requests are typically used to send data.

## Reading and Responding to the Letter

The letter has been delivered to a person who should be able to understand it.

That person will know how to interpret the contents, and how to respond.

That person might still call on other people (servers) to help create a response.

The response will be returned in the same language and format (protocol) as the original letter.

## HTTP Responses

HTTP responses have status codes, headers, and bodies.

Wikipedia has a [full list](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) of status codes. Common codes are:
* `200` - success
* `301` / `302` - permanent / temporary redirect
* `404` - not found
* `500` - server error

Here's what our HTTP response looks like when received in the browser. It has 6 headers and a body containing a JSON string:

```
HTTP/1.1 200 OK
Server: nginx/1.10.3
Date: Wed, 25 Oct 2023 03:10:09 GMT
Content-Type: application/json
Content-Length: 113
Connection: keep-alive
access-control-allow-origin: *

{"timestamp": 1698203409, "iss_position": {"latitude": "35.2661", "longitude": "110.0297"}, "message": "success"}
```

## Letter Responses

* `200` is like receiving a letter back.
* `301` / `302` are a bit like forwarding mail to another address - the postal service handles this for you, and most HTTP clients will also handle it
* `404` is like trying to send a letter to an unknown address, and having it returned
* `500` is like the box of mail containing your letter fell off the truck and got lost

The body of the response is like the letter you receive in response to your inquiry.

## All the terminology together

`requests.get('http://api.open-notify.org/iss-now.json')`

We made an HTTP request to the domain `api.open-notify.org`, asking it to `GET` the data for the resource `iss-now.json`.

It returned a response with a `200` (success) status code, and a JSON object in the body.

## Back to the `requests` library...

`requests.get()` returns a `Response` object, which has many fields: [Response](https://requests.readthedocs.io/en/latest/api/#requests.Response)

In [None]:
response = requests.get('http://api.open-notify.org/iss-now.json')
print(response.status_code)
print()
print(response.headers)
print()
print(response.text)

We can use the handy `json()` method, which attempts to parse the body of the response as JSON:

In [None]:
print(type(response.json()))
print(response.json())

## All this networking stuff is great...

But I really just want to know what part of the world the ISS is over, and I can't translate latitude/longitudes in my head.

Let's use another free service, [geocode.xyz/api](http://geocode.xyz/api)

It has several options, but we want the simple one: turn a lat/long into an address.

In [None]:
import json
response = requests.get('http://api.open-notify.org/iss-now.json')
response_json = response.json()

pos = response_json['iss_position']
geocode_url = f"https://geocode.xyz/{pos['latitude']},{pos['longitude']}?json=1"
print(f"Reverse geocoding: {pos=} with {geocode_url=}")

geolocate_response = requests.get(geocode_url)
print(geolocate_response.status_code)
print(geolocate_response.json())

# pretty print:
#print(json.dumps(geolocate_response.json(), indent=2, ensure_ascii=False))