# API and Open Data
---
DAT 512 Canisuis College <br>
Professor Paul Lambson<br>
<br>
### Learning Objectives
- Understand general API concepts
- Become aware of RESTful methods
- Gain confidence in Requests library
- Practice acquiring real-world data
<br>


### Sections
- [Python Wrapper or RESTful](#python_wrapper_or_restful)
- [Methods](#methods)
- [Requests Library](#requests_library)
- [Buffalo Open Data](buffalo_open_data)

<a id='python_wrapper_or_restful'></a>
# Python Wrapper or RESTful
Many data provides offer a library of methods or functions that wrap the API in python (or other languages), this is often called a "Client" or "Wrapper". These wrapper handle many repetative tasks, like authentication or pagnination that can be complicated to handle directly with the API. The Public Cloud providers (GCP, AWS, Azure, etc.) offer very will documented and optimized libraries [Google Cloud Python Client](https://github.com/googleapis/google-api-python-client).

Maintaining a multi-language library that acts as middle-ware to APIs is a labor intensive undertaking, while the big tech giants have the resources to keep these up-to-date many time these library fall behind or may not be performative. In many other cases the API that needs to be accessed might not have a python client and is only available using HTTP methods.

The most common implimentation of HTTP methods are RESTful APIs. RESTful APIs are enhanced HTTP methods that has base usablity across methods and implimentation. Caching, throttling and secruity are built-in to these methods. Depending on the design of an API those features will have differing importance.

The obvious question, which way should be used to access API data? The answer- it depends. It depends what methods are available? How performative is the client? How stable are the APIs or clients?

<a id='methods'></a>
# Methods
When accessing API endpoints directly, there are 4 main methods. These methods are generally applicable to software developement and might looks like a CRUD applications.
<br>

|Method|Description|
|:---|:---|
|GET|This method helps in offering read-only access for the resources.|
|POST|This method is implemented for creating a new resource.|
|DELETE|This method is implemented for removing a resource.|
|PUT|This method is implemented for updating an existing resource or creating a fresh one.|

In the workflow of sourcing data on the web, **GET** and **POST** are the most likely *read-only* methods used.

### Requests and Parameters
The power of an API allows passing of information to give context to a request. That can be for authorization but in the data engineering workflow it is mostly for querying data.

**Endpoint** : The URI where a request is sent<br>
**Paramters**: Context to pass with the request that is required or optional information<br>
**Headers**: A dictionary-like structure that is passed in requests and responses that holds meta data about the exchange<br>
**Data**: Also called "message" or "body", holds the information of the exchange


The main two methods **GET** and **POST** place the paramters in differnt parts of the request.

A **GET** request places the paramters at the end of the URI <br>
`https://httpbin.org/get?key1=value1&key2=value2`<br>
Note that all the necessary information is in the URI.

A **POST** request places the parameters into the header or data (message, body) of the request<br>
`curl -d "key1=value1&key2=value2" -X POST https://httpbin.org/post`


### Response Objects
The response object contain similar information to the request, but is created by the server and is an answer to the request that was made.

**Status Code**: A code that indicates the resolution of the request, `200` is the accepted code, other numeric codes represent other resolutions

Once the response is successfully recieved then the data can be accessed and processed.

<a id='requests_library'></a>
# Requests Library
---
**Requests** is a simple, yet elegant, HTTP library [docs](https://pypi.org/project/requests/).

Using requests can simplify the request and response objects. Moving from **GET** to **POST** methods become seamless
#### GET Request
```
import requests
uri = 'https://httpbin.org/get'
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get(uri, params=payload)
```
#### POST Request
```
import requests
uri = 'https://httpbin.org/get'
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get(uri, data=payload)
```

Once a request is made then parts of the request can be evaluated. A few examples:
```
r.url
r.status_code
r.text
t.json()
```

In [None]:
import requests
import pandas as pd

In [None]:
# Make a simple get request
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)

In [None]:
r.url

In [None]:
r.status_code

In [None]:
r.text

In [None]:
r.json()

In [None]:
# Make a simple POST request
r = requests.post('https://httpbin.org/post', data=payload)

In [None]:
r.url

In [None]:
r.status_code

In [None]:
r.text

In [None]:
r.json()

<a id='buffalo_open_data'></a>
# Buffalo Open Data
Now let's take a look at some open data rerouces. [Buffalo Open Data](https://data.buffalony.gov/) is a wildly valuable resource for collecting data from the city of Buffalo.

Diving into the [Crime Dataset](https://data.buffalony.gov/Public-Safety/Crime-Incidents/d6g9-xbgu)

Documentation on the API exists and are somewhat straight forward [SoQL Queries](https://dev.socrata.com/docs/queries/)

In [None]:
uri = 'https://data.buffalony.gov/resource/d6g9-xbgu.json'
r = requests.get(uri)
print('Status code ',r.status_code)
print('Number of rows returned ',len(r.json()))
print('Endoced URI with params ',r.url)

In [None]:
df=pd.DataFrame(r.json())
print(df.shape)
df.head()

In [None]:
# layer in parameters that work as a query
# this layout is specific to the SODA APIs

params_dict = {
    '$where':'date_extract_y(incident_datetime)>2022',
    '$limit':50000
}

uri = 'https://data.buffalony.gov/resource/d6g9-xbgu.json'

r =requests.get(uri,params=params_dict)
print('Status code:',r.status_code)
print('Number of rows returned:',len(r.json()))
print('Endoced URI with params:',r.url)

In [None]:
df=pd.DataFrame(r.json())
print(df.shape)
df.head()

# In Class Problems

#### 0- Using 2022 Crime Data, Review Data Import
- What are the rows and columns
- Did all the data get imported
- Do data types make sense (change types that need to be
- Can any new columns be made from the existing columns (date parts)

#### 1- Perform Exploratory Data Analysis
- Look at disribution of categorical variables
- Evaluate numeric variables by summarizing them
- Look for covariance or correlation

#### 2- Shut it down
- Save the data frame to a file in a directory called 'data'
- Save the JSON data from the original API request in the same directory