# API connection

This example borrows code from the example below:
https://medium.com/@bhaveshpatelaus/gtfs-realtime-vehicle-positions-using-python-and-databricks-tfnsw-a33b98f22e97

TfNSW also includes helpful content and an API explorer at the following links:

* https://opendata.transport.nsw.gov.au/developer-information
* https://opendata.transport.nsw.gov.au/how-use-open-data-develop-application
* https://opendata.transport.nsw.gov.au/developers/api-explorer

## Connecting to an API via KPMG's self-signed certificate

The Python requests module is strict about whether HTTPS certificates are valid. As KPMG data passes through a self-signed certificate, you will need to add that certificate to your trust chain. To connect using requests, follow the steps below:

* open Chrome
* go to https://opendata.transport.nsw.gov.au/node/9582/exploreapi (or any other https website will do)
* Click on the padlock icon
* Select the top certificate in the list (netskope)
* Click ("Export")
* Save the file somewhere
* When you build your request, include the following parameters: `verify='C:\\path\\to\\my\\certificate\\caadmin.netskope.com.pem'`

If you are collaborating with other developers, you may have different folder structures. To avoid each person manually changing this path, the best practice is to follow these steps

* Create a file `.env` in the root of your repo
* Include a variable in there such as `CERT_PATH=C:\\path\\to\\my\\certificate\\caadmin.netskope.com.pem`
* Load these environment variables in the module where you need them with the code below:

```python
import os
from dotenv import load_dotenv
load_dotenv()

cert_path = os.getenv("CERT_PATH", True) # True forces full verification
my_request['verify'] = cert_path
```

## Interesting APIs

The following pages as TfNSW provide detailed information on the APIs available

* https://opendata.transport.nsw.gov.au/user-guide
* https://opendata.transport.nsw.gov.au/get-started
* https://opendata.transport.nsw.gov.au/documentation
* https://opendata.transport.nsw.gov.au/api-basics

The following datasets and APIs are most likely to be useful for this project

Timetables and routes:
https://opendata.transport.nsw.gov.au/dataset/public-transport-timetables-realtime/resource/9b3bfa13-0053-4008-8575-e30151f05d54

Changed schedules:
https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-trip-update

Realtime vehicle positions:
https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-vehicle-positions
trains: https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-vehicle-positions-v2

Historical position for ferries and metro:
https://opendata.transport.nsw.gov.au/dataset/historical-gtfs-and-gtfs-realtime

* Public Transport - Realtime - Alerts - v2
* Public Transport - Realtime Vehicle Positions API v2
* Public Transport - Realtime Trip Update API v2
* Transport Routes
* Public Transport - Realtime Trip Updates API
* Trip Planner APIs
* Public Transport - Timetables - Complete - GTFS
* Public Transport - Timetables - For Realtime
* Public Transport - Realtime Vehicle Positions API

In [1]:
import os
from dotenv import load_dotenv
import requests
from pathlib import Path
import zipfile
import pandas as pd

from google.transit import gtfs_realtime_pb2
from google.transit import gtfs_realtime_pb2
from google.protobuf.json_format import MessageToDict
from google.protobuf.json_format import MessageToJson

from protobuf_to_dict import protobuf_to_dict

from data import data

load_dotenv()

True

In [2]:
FILENAME_SCHEDULE = 'gtfs.zip'

In [3]:
app_name = os.getenv("APP_NAME")
api_key = os.getenv("API_KEY")

In [4]:
BASE_URL = "https://api.transport.nsw.gov.au"
BUS_POSITION_URI = f"{BASE_URL}/v1/gtfs/vehiclepos/buses"
BUS_SCHEDULE_URI = f"{BASE_URL}/v1/gtfs/schedule/buses"
FERRY_POSITION = f"{BASE_URL}/v1/gtfs/historical"

In [6]:
headers = {
    "Authorization": f"apikey {api_key}"
}
request_details = dict(
    headers=headers,
    stream=True
)

cert = os.getenv("CERT", True)
request_details['verify'] = cert

## Realtime locations

In [7]:
from data import realtime

In [8]:
positions = realtime.get_latest_positions()

In [9]:
df = realtime.get_positions_dataframe(positions)
df.head()

Unnamed: 0,id,trip_id,route_id,schedule_relationship,lat,lon,bearing,speed,timestamp,congestion_level,stop_id,vehicle_id,label,request_timestamp
0,8896_136642070083_2433_747_1,1746573,2433_747,0,-33.693474,150.822464,238.0,5.1,1695875712,1,,8896_136642070083_2433_747_1,,1695875716
1,43054_75145969_2459_487_1,1961071,2459_487,0,-33.918179,151.034927,83.0,0.0,1695875712,1,,43054_75145969_2459_487_1,,1695875716
2,33553_26627279_2436_601_1,1975629,2436_601,0,-33.804516,151.002884,194.0,12.9,1695875709,0,,33553_26627279_2436_601_1,,1695875716
3,43054_180751324_2459_526_1,1963468,2459_526,0,-33.828636,151.087265,340.0,0.0,1695875705,1,,43054_180751324_2459_526_1,,1695875716
4,43333_229354203797_2510_940_1,1972021,2510_940,0,-33.936832,151.043243,342.0,0.0,1695875706,1,,43333_229354203797_2510_940_1,,1695875716


In [10]:
print(len(df))

1865


In [11]:
from visualisations import maps
maps.position_map(df.rename(columns={'latitude': 'lat', 'longitude': 'lon'}))

In [20]:
# realtime.upload_realtime(df)

## Extract schedules

This section downloads a large file. Avoid running it too frequently

In [11]:
response = requests.get(BUS_SCHEDULE_URI, **request_details)
response

<Response [200]>

In [12]:
zip_path = Path(data.path / FILENAME_SCHEDULE)

In [13]:
with open(zip_path, "wb") as f:
    f.write(response.content)

In [14]:
with zipfile.ZipFile(zip_path) as zip:
    print(zip.namelist())

['agency.txt', 'calendar.txt', 'calendar_dates.txt', 'notes.txt', 'routes.txt', 'shapes.txt', 'stops.txt', 'stop_times.txt', 'trips.txt']


In [None]:
with zipfile.ZipFile(zip_path) as zip:
    for name in zip.namelist():
        with zip.open(name) as f:
            df = pd.read_csv(f)
            print(name)
            print(df.columns)

agency.txt
Index(['agency_id', 'agency_name', 'agency_url', 'agency_timezone',
       'agency_lang', 'agency_phone'],
      dtype='object')
calendar.txt
Index(['service_id', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday',
       'saturday', 'sunday', 'start_date', 'end_date'],
      dtype='object')
calendar_dates.txt
Index(['service_id', 'date', 'exception_type'], dtype='object')
notes.txt
Index(['note_id', 'note_text'], dtype='object')
routes.txt
Index(['route_id', 'agency_id', 'route_short_name', 'route_long_name',
       'route_desc', 'route_type', 'route_color', 'route_text_color'],
      dtype='object')
shapes.txt
Index(['shape_id', 'shape_pt_lat', 'shape_pt_lon', 'shape_pt_sequence',
       'shape_dist_traveled'],
      dtype='object')
stops.txt
Index(['stop_id', 'stop_name', 'stop_lat', 'stop_lon', 'location_type',
       'parent_station', 'wheelchair_boarding', 'platform_code'],
      dtype='object')
stop_times.txt
Index(['trip_id', 'arrival_time', 'departure_time', 'st

In [None]:
with zipfile.ZipFile(zip_path) as z:
   with z.open("calendar.txt") as f:
      calendar = pd.read_csv(f)
      print(calendar.head())    # print the first 5 rows

   service_id  monday  tuesday  wednesday  thursday  friday  saturday  sunday  \
0           1       0        0          0         0       0         1       0   
1           2       1        1          1         1       1         0       1   
2           3       0        1          0         0       0         0       0   
3           4       0        0          0         0       1         0       0   
4           5       1        1          1         1       1         0       0   

   start_date  end_date  
0    20230909  20240727  
1    20230910  20231231  
2    20230912  20230919  
3    20230908  20231229  
4    20230911  20231229  


In [None]:
with zipfile.ZipFile(zip_path) as z:
   with z.open("trips.txt") as f:
      trips = pd.read_csv(f)
      print(trips.head())    # print the first 5 rows

    route_id  service_id  trip_id  shape_id     trip_headsign  direction_id  \
0  2447_S886         294  1000095     79361     Mount View HS             1   
1  2447_S871         294  1000097     79367     Mount View HS             1   
2  2447_S879         294  1000103     79353       Cessnock PS             1   
3  2454_8615         294  1000377     44723  Blaxland Station             0   
4  2454_8311         294  1000381     44720  Warrimoo Primary             1   

   block_id  wheelchair_accessible  trip_note  \
0       NaN                      2        NaN   
1       NaN                      2        NaN   
2       NaN                      2        NaN   
3       NaN                      2        NaN   
4       NaN                      2        NaN   

                                     route_direction  
0               Pelton to Mount View HS via Ellalong  
1  Middle Rd after Dunlop Dr to Mount View HS via...  
2  Millfield Rd opp Irwin Cr to Cessnock PS via A...  
3         

In [None]:
# Check for uniqueness
trips[trips[['route_id', 'trip_id']].duplicated(keep=False)]

Unnamed: 0,route_id,service_id,trip_id,shape_id,trip_headsign,direction_id,block_id,wheelchair_accessible,trip_note,route_direction


In [None]:
# This trip hits the same stop twice within the same minute
# Would be an interesting case to map using the model
df[df.trip_id == 1948209].head(20)

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled,timepoint,stop_note
2261020,1948209,15:10:00,15:10:00,2165100,1,,0,0,0,1,
2261021,1948209,15:11:00,15:11:00,2165101,2,,0,0,219,0,
2261022,1948209,15:15:00,15:15:00,2165138,3,,0,0,854,1,
2261023,1948209,15:17:00,15:17:00,2165139,4,,0,0,1168,0,
2261024,1948209,15:18:00,15:18:00,2165140,5,,0,0,1345,0,
2261025,1948209,15:18:00,15:18:00,2165207,6,,0,0,1448,1,
2261026,1948209,15:18:00,15:18:00,2165140,7,,0,0,1567,0,
2261027,1948209,15:19:00,15:19:00,2165141,8,,0,0,1704,0,
2261028,1948209,15:19:00,15:19:00,2165142,9,,0,0,1961,0,
2261029,1948209,15:20:00,15:20:00,216563,10,,0,0,2108,0,
