<div class="alert alert-success">
    <h1 align="center">Week #8:</h1> 
        
   <h2 align="center">Introduction to APIs</h2>
</div>

## Introduction to APIs

<img src='images/17.jpg' width='40%'/>

### Aims

This exercise aims to introduce you to we APIs and get you familiar with accessing them via Python. the objectives are,

- Send a simple API request
- Understand the status codes
- Send a API request with Authentication
- Parsing the returned data
- Plotting the data to create a map!
 
At the end of this tutorial you should be fairly confident to go on and explore other API's and be able to move on to more complex methods of authentication such as 'Oauth'.


## Installing the basics

Before we connect to some API's we need libraries which enable Python to be able to connect to them and send/recieve data from them. The library we will use is called `requests`. Install the library and load it by running the commands below,


In [1]:
# Install a packages requried for the current tutorial
import sys
!{sys.executable} -m pip install -q requests
!{sys.executable} -m pip install -q google
!{sys.executable} -m pip install -q google-api-python-client
!{sys.executable} -m pip install -q gtfs-realtime-bindings
!{sys.executable} -m pip install -q pandas
!{sys.executable} -m pip install -q geopandas
!{sys.executable} -m pip install -q matplotlib
!{sys.executable} -m pip install -q folium

In [4]:
# Import the installed packages into current environment
import requests
import folium
import time
import pandas
import geopandas
from google.transit import gtfs_realtime_pb2
from shapely.geometry import Point
from IPython import display

Now we are all set.

Let's explore some web API's!


## Connecting to your First API

Let's jump start things by connecting to an API and downloading some data. We will use the TransportNSW's live vehicle position API available here https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-vehicle-positions.

Okay let's send our first get query!

In [5]:
# Create a result object with the Transport API

result = requests.get('https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/ferries/sydneyferries')

Wait! you must be wondering that there is nothing here and what happened? Basically we have created the object successfully. No error message is good news here.

Now lets examine the resulted object closely. First thing we need to check the `Status Code` of the result. This code says how successful the API request has been.


In [6]:
result.status_code

401

Hmm, Thats cryptic... Basically every code here corresponds to certain status the common ones are given below,

Code | Status | Description
---|---|---
200 | OK | The request was successfully completed.
201 | Created | A new resource was successfully created.
400 | Bad Request | The request was invalid.
401 | Unauthorized | The request did not include an authentication token or the authentication token was expired.
403 | Forbidden | The client did not have permission to access the requested resource.
404 | Not Found | The requested resource was not found.
405 | Method Not Allowed | The HTTP method in the request was not supported by the resource. For example, the DELETE method cannot be used with the Agent API.
409 | Conflict | The request could not be completed due to a conflict. For example,  POST ContentStore Folder API cannot complete if the given file or folder name already exists in the parent location.
500 | Internal Server Error | The request was not completed due to an internal error on the server side.
503 | Service Unavailable | The server was unavailable.
 
Our status code is 401 showing that we have some problem related to authentication. Lets print the text sent along with the response to see what the problem has been.

In [7]:
result.text

'{ "ErrorDetails":{ "TransactionId":"000001780dc81340-8f55b48", "ErrorDateTime":"2021-04-07T07:44:57.906+10:00", "Message":"The calling application is unauthenticated.", "RequestedUrl":"/v1/gtfs/vehiclepos/ferries/sydneyferries", "RequestMethod":"GET" } }'

Thats a lot of information! You can see that the text in the result object is in `JSON` (JavaScript Object Notation) format. If we parse it properly, we can query components of this object without printing everything.

In [8]:
# Parse the result
result_json = result.json()

# Print just the error message
result_json['ErrorDetails']['Message']

'The calling application is unauthenticated.'

So basically, our API request has been rejected since it had no authentication details.
Most API providers will restrict the use of their APIs (even open ones) to avoid abuse.
We might have to create an account and get authetication details to use this API.

## Authenticating with API key

For this class I have already signed up with the transportNSW developer website, created an application and generated an API key.
The key is `CGrnUTmzoaCL57n9TzoseFqUb22Pqz32m1eB`. 

Now we have to send this key on the header of the request.


In [9]:
# Create a headers object
headers = {'Authorization' : 'apikey CGrnUTmzoaCL57n9TzoseFqUb22Pqz32m1eB'}

# Create a request with the headers
result = requests.get(url='https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/ferries/sydneyferries', headers=headers)

# Check the status
result.status_code

200

Yay! That has worked... 

Lets start to explore the data that has been sent back.


In [10]:
# Print the first 20 lines of the raw text
# The command looks complicated but you can also do 
# print(result.text) to get all the results

''.join(str.split(result.text)[:20])


'\x031.0\x10\x00\x18Ϭ��\x06\x12�\x01\x1120210407_074447_1"�\x01;\x17CI0645-1.060421.31.0719\x12\x0807:19:00\x1a\x0820210407\x00*9-F8-sj2-1\x12\x14�i\x07�\x15g5\x17C\x00\x00�B-)\\�@\x18\x06\x02(ʬ��\x060\x00:\x0520413B5\x08Fishburn\x12\'07:19amCockatooIsland-CircularQuay\x1a\x00\x12�\x01\x1120210407_074447_2"�\x01;\x17CI0745-2.060421.31.0745\x12\x0807:45:00\x1a\x0820210407\x00*9-F8-sj2-1\x12\x14'

That is not helpful at all... The format of the returned data is `protobuf` (Protocol buffers) which is used to compress the data sent back and forth using APIs - especially realtime ones like this.

Now we need to parse and understand the result that has been sent back.

## Parsing GTFS-realtime data

As we saw before, GTFS-realtime data is in protobuf format which needs to be parsed into a python object so that we can plot it on a map.

We need to create an feed object from the gtfs_realtime_pb2 package which can parse the result

In [11]:
# creating a feed object
feed = gtfs_realtime_pb2.FeedMessage()

# Use the feed object to parse the result of the API
feed.ParseFromString(result.content)

# Print the results
feed.entity[0]

id: "20210407_074447_1"
vehicle {
  trip {
    trip_id: "CI0645-1.060421.31.0719"
    start_time: "07:19:00"
    start_date: "20210407"
    schedule_relationship: SCHEDULED
    route_id: "9-F8-sj2-1"
  }
  position {
    latitude: -33.85311508178711
    longitude: 151.20860290527344
    bearing: 105.0
    speed: 6.480000019073486
  }
  current_stop_sequence: 6
  current_status: IN_TRANSIT_TO
  timestamp: 1617745482
  congestion_level: UNKNOWN_CONGESTION_LEVEL
  stop_id: "20413"
  vehicle {
    id: "Fishburn"
    label: "07:19am Cockatoo Island - Circular Quay"
    license_plate: ""
  }
}

If all of these feed objects and parsing sounds complicated, It's alright. All you need to know is that there are different formats of data returned from different API's and you need to convert them in python to be able to make sense out of them.

Now we have successfully traslated the data into a format we can read! Lets convert it in to a tabular format.

The code below should create a table using the data from the API and with the ID, 

In [12]:
data = pandas.DataFrame()

for i in feed.entity :
    row = pandas.Series([i.id,float(i.vehicle.position.latitude),float(i.vehicle.position.longitude)])
    row_df = pandas.DataFrame([row])
    data = pandas.concat([data,row_df],ignore_index=True)
    
data.columns = ['id','lat','lng']

data

Unnamed: 0,id,lat,lng
0,20210407_074447_1,-33.853115,151.208603
1,20210407_074447_2,-33.8605,151.210327
2,20210407_074447_3,-33.849354,151.206741
3,20210407_074447_4,-33.868164,151.198944
4,20210407_074447_5,-33.848518,151.208862
5,20210407_074447_6,-33.844635,151.257339
6,20210407_074447_7,-33.855301,151.215118
7,20210407_074447_8,-33.848068,151.230927
8,20210407_074447_9,-33.843384,151.221664
9,20210407_074447_10,-33.855049,151.195511


## Plotting the API on a Map

We have the data from TransportNSW in a tabular format. The next step is to convert it into geographic data format and visualise it in realtime.

Geographic data has three components - Geometry, Atrributes and Coordinate Reference Systems. We already have the attributes in the form of the table. Now we need to create the CRS and geometry.

The code below does both,

In [13]:
# Create a simple CRS string - 4326 is WSG84 a.k.a latitude and longitude numbers.
crs = 'epsg:4326'

# We create the geometry from the lat, lng columns of the table
geometry = [Point(xy) for xy in zip(data['lng'],data['lat'])]
geo_data = geopandas.GeoDataFrame(data,crs=crs,geometry=geometry)

# We convert the geodata into JSON format so that it can be mapped easily
geo_json = geo_data.to_json()



Now we have all that is needed to make an interactive map! The `geo_json` object can now be added on top of a base map using the `folium` library as shown below,



In [14]:
# Create a base map with the specified style, center point and zoom level
m = folium.Map(location=[-33.854504, 151.218034],
               tiles='Stamen Toner',
               zoom_start=11)

# Add the geo_json to the map as points
m.add_child(folium.features.GeoJson(geo_json))

Thats it!

We downloaded data from TransportNSW on the real time location of Ferries and made a map out of it all within 20 mins! This is how powerful and simple APIs are. They simplify and standardise most of the data dissemination and secondary data collection so that we can focus on our research and analysis.

## Simple Application using the API

Below is an example application using the TransportNSW API. When we combine all the steps we did before into a sequence and repeat them every 5 seconds, we can build a simple monitoring station which shows the real-time location of all the ferries.

In [17]:
for i in range(1):
    result = requests.get(url='https://api.transport.nsw.gov.au/v1/gtfs/vehiclepos/ferries/sydneyferries', headers=headers)
    feed = gtfs_realtime_pb2.FeedMessage()
    feed.ParseFromString(result.content)
    data = pandas.DataFrame()
    for i in feed.entity :
        row = pandas.Series([i.id,float(i.vehicle.position.latitude),float(i.vehicle.position.longitude)])
        row_df = pandas.DataFrame([row])
        data = pandas.concat([data,row_df],ignore_index=True)
    data.columns = ['id','lat','lng']
    geometry = [Point(xy) for xy in zip(data['lng'],data['lat'])]
    geo_data = geopandas.GeoDataFrame(data,crs=crs,geometry=geometry)
    geo_json = geo_data.to_json()
    m = folium.Map(location=[-33.854504, 151.218034],tiles='Stamen Toner', zoom_start=14)
    folium_plot = m.add_child(folium.features.GeoJson(geo_json))
    display.clear_output(wait=True)
    display.display(folium_plot)
    time.sleep(5)

## Extras

If you are feeling comfortable with the python and using APIs then as an extra task try and explore other end points available in the same API which give information on locations of Buses, Trains etc. https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-vehicle-positions.

Notice that these sources also have bearing (direction) and speed information for each vehicle as well which may be used to visualise their movement across the map as well.

All the best!