<div class="usecase-title">API v2.1 Tutorial: The City of Melbourne (CoM) API is organised around REST using Opendatasoft Explore API v2. It provides access to all the data available through the platform in a heirarchial structure.</div>

<div class="usecase-authors"><b>Authored by: </b>Te' Claire</div>

<div class="usecase-date"><b>Date: </b> March-July 2024</div>

<div class="usecase-duration"><b>Duration:</b> 40 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b> Level: </b>Beginner</div>
    <div class="usecase-skill"><b> Pre-requisite Skills: </b>Python <i>Optional</i> Google Collaborate access</div>
</div>

<div class="usecase-subsection-blurb">
  <i>Link 1:</i> API Explore v2.1 Console
  <br>
  <a href="https://data.melbourne.vic.gov.au/api/explore/v2.1/console" target="_blank">Link</a>
  <br>
</div>
<br>

##### Context: To provide guidance of the City of Melbourne (CoM) API usage.
1. API and GitHub (Cloud) IDE
2. exports endpoint (no limitations)
3. records endpoint (limited to number of returned records)


---

###### CoM API endpoints:
- Endpoints allow you to enumerate datasets
- List export formats
- Export data
- List facet values
- Manage individual dataset records
<br>


###### Catalog API
* `GET /catalog/datasets` <br>
`GET https://data.melbourne.vic.gov.au/api/catalog/datasets`
- **Purpose** To list all the datasets available in the catalog
  - Used to get an overview of the datasets available in the system

* `GET /catalog/exports`
- **Purpose** To list all export formats that the catalog supports
  - Useful for understanding what formats the data can be exported (CSV, JSON)

  
* `GET /catalog/exports/{format}`
- **Purpose** To export the entire catalog in a specific format
  - Used when you want to download the entire catalog in one of the supported formats

* `GET /catalog/exports/csv`
- **Purpose**  Specifically for exporting the catalog in CSV format.
  - A direct endpoint for exporting data in a common, easily usable format.

* `GET /catalog/exports/dcat{dcat_ap_format}`
- **Purpose** To export the catalog in RDF/XML format using DCAT
  - Exporting data in a format that's suitable for integrating with other data catalogs or systems following the DCAT standard

* `GET /catalog/facets`
- **Purpose** To list all the facet values available in the catalog
  - Facets are used to filter or categorize datasets/ helps understand the categorization

* `GET /catalog/datasets/{dataset_id}`
- **Purpose** To show detailed information about a specific dataset
  - When you need metadata or details about a particular dataset


###### Dataset API

* `GET /catalog/datasets/{dataset_id}/records`
- **Purpose** To query records within a specific dataset
  - To retrieve the data entries or records from a specific dataset

* `GET /catalog/datasets/{dataset_id}/exports`
- **Purpose** To list the export formats available for a specific dataset
- Understands in what formats you can export the data from this dataset
  
* `GET /catalog/datasets/{dataset_id}/exports/{format}`
- **Purpose** To export a specific dataset in a specified format
  - To download data from a specific dataset in a particular format

* `GET /catalog/datasets/{dataset_id}/exports/csv`
- **Purpose** To export a specific dataset in CSV format
  - Direct endpoint for exporting dataset data in CSV, a commonly used data format

* `GET /catalog/datasets/{dataset_id}/exports/gpx`
- **Purpose** To export a specific dataset in GPX format
  - Useful for datasets related to geographical data, which GPX format is well-suited for

* `GET /catalog/datasets/{dataset_id}/facets`
- **Purpose** To list the facets for a specific dataset
  - To get an understanding of the different dimensions or categories within a dataset

* `GET /catalog/datasets/{dataset_id}/attachments`
- **Purpose** To list attachments for a specific dataset
  - When datasets have additional files or documents attached, this endpoint lets you enumerate them
* `GET /catalog/datasets/{dataset_id}/records/{record_id}`
- **Purpose** To read a specific record within a dataset
  - To get detailed information about a particular entry or record in a dataset


###### Load Dependencies

In [3]:
# Dependencies
import warnings
warnings.filterwarnings("ignore")

import requests
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

###### Cloud or Local IDE (Run notebook/ script)
- To collect API from directory located in Google Collab

In [4]:
from google.colab import drive
drive.mount('/content/drive')
with open('/content/drive/My Drive/SIT378/h.txt', 'r') as file:
    api_key = file.read().strip()

import os
api_key = os.getenv(api_key)

Mounted at /content/drive


##### **Preferred Method**: Export Endpoint
##### Single Request for CSV File Download
`GET/catalog/exports/catalog/datasets/`
- ODSQL Function Export CSV or json_format
- Read response directly into dataframe
- `response.content.decode('utf-8')` converts binary repsonse into UTF-8 string (encoded)
- Data uses a delimiter (;)

In [None]:
# **Preferred Method**: Export Endpoint
import requests
import pandas as pd
from io import StringIO

# https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/
dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'

base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
apikey = api_key
dataset_id = dataset_id
format = 'csv'

url = f'{base_url}{dataset_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1,  # all records
    'lang': 'en',
    'timezone': 'UTC',
    'api_key': apikey
}

# GET request
response = requests.get(url, params=params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    pedestrian_hour = pd.read_csv(StringIO(url_content), delimiter=';')
    print(pedestrian_hour.sample(10, random_state=999)) # Test
else:
    print(f'Request failed with status code {response.status_code}')

###### Check number of records in dataset (dataset_id)

In [None]:
###### Check number of records in dataset (dataset_id)
num_records = len(pedestrian_hour)
print(f'The dataset contains {num_records} records.')

In [None]:
# View dataset
pedestrian_hour.head()



---



##### **Example: Catalog API to enumerate datasets** <br>
GET/catalog/datasets  <br>
`GET https://data.melbourne.vic.gov.au/api/catalog/datasets`
- list all datasets available in the Melbourne data catalog

######limit parameter controls the number of records or datasets returned in the response.

In [None]:
import requests
url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets'
params = {
    'select': '*',
    # 'limit': 10,
    'offset': 0,
    'timezone': 'UTC',
    'include_links': 'false',
    'include_app_metas': 'false'
}
headers = {
    'accept': 'application/json; charset=utf-8'
}

# GET request
response = requests.get(url, headers=headers, params=params)

if response.status_code == 200: # Status code Check
    # Successful
    print(response.json())
else:
    # Error
    print(f'Request failed with status code {response.status_code}')


##### **Example: Show dataset Information** <br>
GET /catalog/datasets/{dataset_id}  <br>
`GET https://data.melbourne.vic.gov.au/api/catalog/datasets/pedestrian-counting-system-monthly-counts-per-hour`
- list all datasets available in the Melbourne data catalog

In [None]:
# dataset_id
# https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/
dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'

In [None]:
import requests
url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/' + dataset_id
# or use full URL
# # https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/

params = {
    'select': '*',
    'lang': 'en',
    'timezone': 'UTC',
    'include_links': 'false',
    'include_app_metas': 'false'
}
headers = {
    'accept': 'application/json; charset=utf-8'
}

# Make the GET request
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
    # Successful
    print(response.json())
else:
    # Error
    print(f'Request failed with status code {response.status_code}')


###### Check available export formats for dataset

In [None]:
import requests
url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/exports'
params = {
    'select': '*',
    'lang': 'en',
    'timezone': 'UTC',
    'include_links': 'false',
    'include_app_metas': 'false'
}
headers = {
    'accept': 'application/json; charset=utf-8'
}

# Make the GET request
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
    # Successful
    print(response.json())
else:
    # Error
    print(f'Request failed with status code {response.status_code}')




---



###### Records Endpoint
###### Function `fetch_data` paginates iterates over data in chunks (num_records and offset) until all records are retrieved or a maximum offset is reached.
- This endpoint is subjected to a limited number of returned records: <10000

In [None]:
import requests
import pandas as pd
def fetch_data(base_url, dataset, api_key, num_records=99, offset=0):
    all_records = []
    max_offset = 9900

    while True:
        if offset > max_offset:
            break

        filters = f'{dataset}/records?limit={num_records}&offset={offset}'
        url = f'{base_url}{filters}&api_key={api_key}'

        try:
            result = requests.get(url, timeout = 10)
            result.raise_for_status()
            records = result.json().get('results')
        except requests.exceptions.RequestException as e:
            raise Exception(f'API request failed: {e}')
        if records is None:
            break
        all_records.extend(records)
        if len(records) < num_records:
            break

        offset += num_records

    df = pd.DataFrame(all_records)
    return df

BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
API_KEY = api_key

In [None]:
# data set name
SENSOR_DATASET = 'on-street-parking-bay-sensors'
df = fetch_data(BASE_URL, SENSOR_DATASET, API_KEY)
df