<a href="https://colab.research.google.com/github/brendenwest/cis122/blob/main/10_data_retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fetching Data

### Reading

- https://requests.readthedocs.io/en/latest/user/quickstart/
- https://www.tutorialspoint.com/http/index.htm
- https://www.w3schools.com/python/python_json.asp

### Learning Outcomes
- What is data retrieval?
- Basics of HTTP requests
- Making HTTP requests with Python
- Querying databases with Python

### What is Data Retrieval?

Programming often involves retrieving data from a source outside of your program. Commonly, sources are a file, a database, or an `API` (internet service).

We previously covered loading data from a file. This doc covers how to fetch data from an internet service or a database with Python.

### HTTP Basics

HTTP defines how a client can send a request to a server and what the response should look like.

HTTP methods define specific kinds of requests. The most common are:

- GET - request data from a server
- POST - send data to a server

#### HTTP GET

A GET request consists primarily of a `URL` (web address)

The URL may contain `query parameters` (name/value pairs) separated by `=` signs, as in this weather forecast example:

```
https://api.openweathermap.org/data/2.5/forecast/daily?lat=47.6062&lon=122.3321&cnt=5&appid=12345
```

#### HTTP POST

An HTTP POST contains data in request `body`.

Because the HTTP protocol limits the size of GET requests, POST is more often used to send large amounts of data to a server - e.g. form sumbissions & file uploades.

#### HTTP Headers

Requests & responses include `headers` that inform the receiver about the request or response.

An HTTP header consists of its case-insensitive name followed by a colon (:), then by its value.

```
content-type: application/json; charset=utf-8
```

#### HTTP Response

After receiving an HTTP request, a server should return a well-defined response.

The response typically includes:
- **status code** - a standard 3-digit integer that informs the receiver on success or failure of the request
- **headers** -  additional information about the response (e.g. content size, type, & last modified)
- **body** - Can be any data returned from the server

#### Content Types

HTTP servers return data in a defined format that clients should be able to understand.

Some common formats for sharing data between applications are below.

JSON has become a defacto standard because clients can easily convert JSON data to `objects` that programs can operate on.

- CSV - Comma-separated values

```
name,major,gpa
jim,art,3.8
sue,science,3.75

```

- JSON - JavaScript Object Notation

```
[
  {"name":"jim", "major":"art", "gpa": 3.8},
  {"name":"sue", "major":"science", "gpa": 3.75},
]

```

- XML - Extensible Markup Language

```
<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <row>
    <name>jim</name>
    <major>art</major>
    <gpa>3.8</gpa>
  </row>
  <row>
    <name>sue</name>
    <major>science</major>
    <gpa>3.75</gpa>
  </row>
</root>
```

### HTTP Requests in Python

The Python ecosystem has a number of libraries that simplify making an HTTP request. `requests` is the most mature of those.

Basic syntax involves defining a `request` object with URL and method. Subsequent commands can retrieve attributes of request, such as `text`, `status code`, etc.

The `requests` library performs the HTTP request and exposes the `response` for subsequent commands.

The response includes `content` as well as supporting information (e.g. status_code, headers, etc.)

#### GET Requests

A GET request retrieves data from a server based on the `contract` for what urls and query parameters the server supports.

These requests and the necessary commands are relatively simple.

Response content can be accessed in several formats:

- text - plain text of the response
- json() - JSON format (if the content supports this)

In [5]:
import requests
result = requests.get('https://data.seattle.gov/resource/2khk-5ukd.json')
print("status", result.status_code)
print("headers", result.headers)
# convert response body to a Python list
data = result.json()
print('records', len(data))
print(data[0])

status 200
headers {'Server': 'nginx', 'Date': 'Wed, 20 Nov 2024 02:31:31 GMT', 'Content-Type': 'application/json;charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'ETag': '"Zm94dHJvdC41OTU5N18yXzgyUEFyTEFFVjVpOHJZSFVURHJJZkswVkVJRHRB---gziphBVGxhtvC4TXKQPJpRv8jxRiGP8--gzip--gzip"', 'X-SODA2-Fields': '["department","last_name","first_name","job_title","hourly_rate"]', 'X-SODA2-Types': '["text","text","text","text","text"]', 'X-SODA2-Data-Out-Of-Date': 'false', 'X-SODA2-Truth-Last-Modified': 'Wed, 31 Jul 2024 16:25:48 GMT', 'X-SODA2-Secondary-Last-Modified': 'Wed, 31 Jul 2024 16:25:48 GMT', 'Last-Modified': 'Wed, 31 Jul 2024 16:25:48 GMT', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Age': '2', 'X-Socrata-Region': 'aws-us-east-1-fedramp-prod', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'X-Socrata-RequestId': 'dfeaa60361b3c0d704918ff0505cc4c4'}
records 1000
{'department': 'Police Department'

#### POST Requests

As noted above, POST requests are common for sending information to a server (e.g. form submissions, file uploads, etc.)

POST `content_type` needs to match what the server expects. Common content_types are `form-encoded` or JSON.

In [10]:
payload = {'key1': 'value1', 'key2': 'value2'}
# automatically encode Python dictionary as form data
result = requests.post('https://httpbin.org/post', data=payload)
print(result.content)

b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "key1": "value1", \n    "key2": "value2"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "23", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-673d4b6a-4fcf9558576d275b61aca4c6"\n  }, \n  "json": null, \n  "origin": "35.245.247.204", \n  "url": "https://httpbin.org/post"\n}\n'


In [9]:
payload = {'key1': 'value1', 'key2': 'value2'}
# automatically convert the Python dictionary to a string
# and send 'Content-Type: application/json' header
result = requests.post('https://httpbin.org/post', json=payload)
print(result.content)

b'{\n  "args": {}, \n  "data": "{\\"key1\\": \\"value1\\", \\"key2\\": \\"value2\\"}", \n  "files": {}, \n  "form": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "36", \n    "Content-Type": "application/json", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-673d4b66-561af7945c5ad5fc4797298b"\n  }, \n  "json": {\n    "key1": "value1", \n    "key2": "value2"\n  }, \n  "origin": "35.245.247.204", \n  "url": "https://httpbin.org/post"\n}\n'
