<div class="usecase-title">Optimising Tourist Mobility Using City Circle Tram Stop</div>

<div class="usecase-authors"><b>Authored by: </b> Vinoj Prasath Navarajah</div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<div class="usecase-section-header"><b>Scenario</b></div>

As a tourist visiting Melbourne, I want to easily plan my travel using the free City Circle tram service, so that I can effeciently explore the key attractions, navigate between stops, and make the most of my time without getting lost or overwhelmed.

Due to lack of interactive, user-friendly, and customised navigation options, visitors to M elbourne frequently find it difficult to comprehend how to utilise the City Circle tram route. Despite being a free and useful service that links many of the Melbourne's major landmarks and cultural attractions the city circle tram is frequently neglected. Majority of the sources like static maps and the printed guides offers little context and are not customised to meet the requirnmnet of specific travellers. 

Important factors including a tourist's particular interests,local attractions, real-time tram movement, walks to and from station, or accessibility requirnmnet are not taken into consideration by the traditional materails. Due to this, travellers risk missing curcial locations, selecting useless or confusing route, or being overwhelmed by the adsence of structured guidance. 
For short term tourists who want to see the city as quickly as possible, this might result annoyance, lost time and worse overall expirence. Furthermore, without digital assistance, visitors who don't understrand English or who are not familiar with Melbourne's public transport system may find the trip much more challenging. 

In order to overcome this challenges, a user-friendly, interactive mobility solution is vital that:

- Provides a captivating understandable visual representing of the City Circle tram route.
- Suggests the best places to stop depending on local attractions and traveler interests.
- Highlights the walking paths between the station and the landmarks and estimate the trip times.
- Includes mobility and accessibility features for inclusive planning.

In addition to improve the individual travellers' expirence, such a system would help the City of Melbourne achieve its objectives of supporting sustainable mobility, raising visitor happiness, and promoting study of lesser-known historical and cultural sites inside the central business district.
  

<div class="usecase-section-header">What this use case will teach you</div>

At the end of this use case you will:

- Cleaning the raw datasets and preprocessing
- Data Visualisation using the Folium to display the geolocation markers.
- Geospatial analysis.
- Documentation and communication.
- Problem solving and Use Case Design.


<div class="usecase-section-header"><h2><b>Introduction</h2></b></div>

This use cases aims to solve the problem by creating anb interactive, user-friendly solution that visualises City Circle tam stops, suggests the nearby attractions and helps tourists plan efficient routes around the city. By combining the geospatial data and visual mapping tools, the solution enhances the tourist expirence and supports smarter urban mobility.
The primary dataset used in the usecase is the City Circle Tram Route dataset, sourced from the Melbourne Open Data Platform. It includes tram stop names, route information, and geographic coordinates. Additional data on tourist attractions may be integrated from third-party APIs or publicly available datasets, depending on future development stages.

Datasets:

- City Circle Tram Stops: https://data.melbourne.vic.gov.au/explore/dataset/city-circle-tram-stops/api/
- City Circle Tram Routes: https://data.melbourne.vic.gov.au/explore/dataset/city-circle-tram-route/api/
- Foothpaths : https://data.melbourne.vic.gov.au/explore/dataset/footpaths/api/
- Landmarks and Places of Interest : https://data.melbourne.vic.gov.au/explore/dataset/landmarks-and-places-of-interest-including-schools-theatres-health-services-spor/api/
  


### Importing Libraries

In [137]:
import pandas as pd
import requests
import folium
from folium.plugins import MarkerCluster


#### Importing the City Circle Tram Stops

Several key tasks are completed to prepare the City Circle Tram Stops dataset for future analysis. Firstly, the dataset is retrieved and imported using the Melbourne Open Data v2.1 API, ensuring the most recent and accurate data is used. The initial processing involves coordinate extraction, as the geo_point_2d field stores the latitude and longitude values in a nested dictionary format. New columns for latitude and longitude are created by extracting these values directly from geo_point_2d. The dataset is then validated to ensure it is clean, free from missing coordinate values, and ready for mapping. Each row is checked to confirm that geo_point_2d contains valid geographic data. Finally, a preview of the cleaned dataset is generated using the .head() function to verify the structure and contents before further analysis or visualisation.

In [139]:
# Api endpoint
URL = "https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/city-circle-tram-stops/records?limit=20"

#fetching the JSON
resp = requests.get(URL, timeout=20)
resp.raise_for_status()
payload = resp.json()

# dataframe are flatten
raw = pd.json_normalize(payload["results"])

#Printng the columns
print(raw.columns.tolist())
raw.head()

['name', 'xorg', 'stop_no', 'mccid_str', 'xsource', 'xdate', 'mccid_int', 'geo_point_2d.lon', 'geo_point_2d.lat', 'geo_shape.type', 'geo_shape.geometry.coordinates', 'geo_shape.geometry.type']


Unnamed: 0,name,xorg,stop_no,mccid_str,xsource,xdate,mccid_int,geo_point_2d.lon,geo_point_2d.lat,geo_shape.type,geo_shape.geometry.coordinates,geo_shape.geometry.type
0,Russell Street / Flinders Street,GIS Team,6,,Mapbase,2011-10-18,28,144.970156,-37.816673,Feature,"[144.97015587085124, -37.81667338583987]",Point
1,New Quay Promenade / Docklands Drive,GIS Team,D10,,Mapbase,2011-10-18,11,144.941378,-37.813415,Feature,"[144.94137823870162, -37.813414856197724]",Point
2,Etihad Statium / La Trobe Street,GIS Team,D1,,Mapbase,2011-10-18,13,144.946551,-37.814592,Feature,"[144.94655055842398, -37.814591782869805]",Point
3,Spring Street / Flinders Street,GIS Team,8,,Mapbase,2011-10-18,26,144.974534,-37.815389,Feature,"[144.97453393804187, -37.81538859129167]",Point
4,Melbourne Aquarium / Flinders Street,GIS Team,2,,Mapbase,2011-10-18,4,144.957863,-37.820238,Feature,"[144.95786314283018, -37.82023778673241]",Point


### Finding the missing values and Invalid Locations

A validation procedure was carried out to make sure the geolocation data was precise and appropriate for mapping once the City Circle Tram Stops dataset was imported. Pandas.to_numeric() was used to extract the latitude and longitude values from the geo_point_2d field and convert them to numeric format, handling any non-numeric values using error coercion.

The validation checks included:

- Missing Values – Detecting any rows where latitude or longitude was not provided.

- Range Validation – Ensuring latitude values fall within the range -90 to 90 and longitude values fall within -180 to 180.

- Combined Geolocation Issues – Identifying rows with any missing or invalid coordinates.

In [141]:

#extracting the geo from json
lat_raw = pd.to_numeric(raw.get("geo_point_2d.lat"), errors="coerce")
lon_raw = pd.to_numeric(raw.get("geo_point_2d.lon"), errors="coerce")

# finding the missing values
missing_lat = lat_raw.isna()
missing_lon = lon_raw.isna()


invalid_lat_range = ~missing_lat & ~lat_raw.between(-90, 90)
invalid_lon_range = ~missing_lon & ~lon_raw.between(-180, 180)

# combine the geolocation 
has_geo_issue = missing_lat | missing_lon | invalid_lat_range | invalid_lon_range

#printing the output for the values
print("Total rows:", len(raw))
print("Missing lat:", missing_lat.sum())
print("Missing lon:", missing_lon.sum())
print("Invalid lat range:", invalid_lat_range.sum())
print("Invalid lon range:", invalid_lon_range.sum())
print("Rows with any geolocation issue:", has_geo_issue.sum())

issues_df = raw.loc[has_geo_issue, ["name", "stop_no", "geo_point_2d.lat", "geo_point_2d.lon"]].copy()
issues_df.head(10)


Total rows: 20
Missing lat: 0
Missing lon: 0
Invalid lat range: 0
Invalid lon range: 0
Rows with any geolocation issue: 0


Unnamed: 0,name,stop_no,geo_point_2d.lat,geo_point_2d.lon


### Cleaning and Initialising Data

The dataset was cleaned after geolocation validation to make sure it was prepared for mapping and analysis. The procedure for cleaning entailed:

- Latitude and Longitude Extraction: Pandas.to_numeric() was used to convert the geo_point_2d.lat and geo_point_2d.lon fields to numeric values, handling any non-numeric data using error coercion.

- Eliminating Missing data: To guarantee that every record had complete geolocation information, rows with missing latitude or longitude data were eliminated.

- Only coordinates that fell under the acceptable geographic parameters (latitude between -90 and 90, longitude between -180 and 180) were kept after range filtering.

- Column Selection: The dataset was trimmed down to the following important fields: lat (latitude), lon (longitude), stop_no (stop number), and name (tram stop name).


In [143]:
df = raw.copy()

# convert  lat and lon to numeric values
df["lat"] = pd.to_numeric(df.get("geo_point_2d.lat"), errors="coerce")
df["lon"] = pd.to_numeric(df.get("geo_point_2d.lon"), errors="coerce")

#remove rows from lat and lon missing
df = df.dropna(subset=["lat", "lon"])
df = df[df["lat"].between(-90, 90) & df["lon"].between(-180, 180)]

keep = [c for c in ["name", "stop_no", "lat", "lon"] if c in df.columns]
df = df[keep]

print("Cleaned rows:", len(df))
df.head()


Cleaned rows: 20


Unnamed: 0,name,stop_no,lat,lon
0,Russell Street / Flinders Street,6,-37.816673,144.970156
1,New Quay Promenade / Docklands Drive,D10,-37.813415,144.941378
2,Etihad Statium / La Trobe Street,D1,-37.814592,144.946551
3,Spring Street / Flinders Street,8,-37.815389,144.974534
4,Melbourne Aquarium / Flinders Street,2,-37.820238,144.957863


### Map Visualisation

Using the Folium package, a mapping visualisation was made to display the cleaned dataset in an interactive manner. In order to give a good picture of every City Circle Tram stop, the map was first zoomed in to 13 and focused on the Melbourne CBD.

To improve readability and user engagement, neighbouring markers were grouped using the MarkerCluster plugin. Using its latitude and longitude coordinates, each tram stop in the dataset was plotted as a marker. Users can swiftly identify stops thanks to the marker's popup label, which shows both the stop number and the stop name.

The finished interactive map may be seen in any web browser without the need for Python because it was saved as an HTML file (city_circle_tram_stops_map.html). The spatial distribution of tram stops may be better understood with the help of this visualisation, which can be expanded to incorporate other details like neighbouring landmarks or travel routes.

In [145]:
#creating a base map centered on CBD
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
cluster = MarkerCluster().add_to(m)

for _, row in df.iterrows():
    label = f"{row.get('stop_no','')} – {row.get('name','Tram Stop')}".strip(" –")
    folium.Marker([row["lat"], row["lon"]], popup=label).add_to(cluster)

m.save("city_circle_tram_stops_map.html")
print("Map saved: city_circle_tram_stops_map.html")


Map saved: city_circle_tram_stops_map.html


### Importing City Cricle Tram Route Dataset

This block fetches the City Circle Route dataset via the v2.1 API, then flattens the JSON into a DataFrame with pd.json_normalize(). The route is stored as a MultiLineString inside geo_shape.geometry.coordinates.

In [147]:
# API endpoint 
URL = "https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/city-circle-tram-route/records?limit=5"

# Fetching JSON and normalise
resp = requests.get(URL, timeout=20)
resp.raise_for_status()
payload = resp.json()


raw = pd.json_normalize(payload["results"])

print(raw.columns.tolist())
raw.head(1)


['name', 'xorg', 'mccid_str', 'xsource', 'xdate', 'mccid_int', 'route_no', 'geo_point_2d.lon', 'geo_point_2d.lat', 'geo_shape.type', 'geo_shape.geometry.coordinates', 'geo_shape.geometry.type']


Unnamed: 0,name,xorg,mccid_str,xsource,xdate,mccid_int,route_no,geo_point_2d.lon,geo_point_2d.lat,geo_shape.type,geo_shape.geometry.coordinates,geo_shape.geometry.type
0,City Circle Route,GIS Team,,Mapbase,2011-10-18,0,35,144.956592,-37.814832,Feature,"[[[144.96690455927876, -37.8176316450406], [14...",MultiLineString


### Finding invalid geolocation & missing values

The route geometry is a MultiLineString (multiple line segments). We flatten every coordinate pair into a DataFrame and check for missing or out‑of‑range values. This ensures the geometry is valid before mapping

In [149]:
# Pull the nested coordinates 
coords_nested = raw.loc[0, "geo_shape.geometry.coordinates"]  # list of line segments

# Flatten lon, lat points for validation
lon_list, lat_list = [], []

for segment in coords_nested:        
    for pt in segment:               
        lon, lat = pt[0], pt[1]
        lon_list.append(lon)
        lat_list.append(lat)

coords_df = pd.DataFrame({"lon": lon_list, "lat": lat_list})

# Convert to numeric and validate
coords_df["lon"] = pd.to_numeric(coords_df["lon"], errors="coerce")
coords_df["lat"] = pd.to_numeric(coords_df["lat"], errors="coerce")

missing_lon = coords_df["lon"].isna()
missing_lat = coords_df["lat"].isna()
invalid_lon_range = ~missing_lon & ~coords_df["lon"].between(-180, 180)
invalid_lat_range = ~missing_lat & ~coords_df["lat"].between(-90, 90)

has_issue = missing_lon | missing_lat | invalid_lon_range | invalid_lat_range

print("Total points:", len(coords_df))
print("Missing lon:", int(missing_lon.sum()))
print("Missing lat:", int(missing_lat.sum()))
print("Invalid lon range:", int(invalid_lon_range.sum()))
print("Invalid lat range:", int(invalid_lat_range.sum()))
print("Points with any issue:", int(has_issue.sum()))

coords_df.loc[has_issue].head(10)


Total points: 197
Missing lon: 0
Missing lat: 0
Invalid lon range: 0
Invalid lat range: 0
Points with any issue: 0


Unnamed: 0,lon,lat


### Cleaning the data 

We filter out any invalid coordinates and reconstruct the route as a list of segments where each segment is a list of [lat, lon] points — the format Folium’s PolyLine expects. This keeps the route topology intact.

In [151]:
clean_coords = coords_df[
    coords_df["lon"].between(-180, 180) & coords_df["lat"].between(-90, 90)
].copy()

print("Clean points:", len(clean_coords))

route_segments = []
for segment in raw.loc[0, "geo_shape.geometry.coordinates"]:
    seg_latlon = []
    for lon, lat in segment:
        if (
            pd.notna(lon) and pd.notna(lat)
            and -180 <= float(lon) <= 180
            and -90 <= float(lat) <= 90
        ):
            seg_latlon.append([float(lat), float(lon)])
    if len(seg_latlon) > 1: 
        route_segments.append(seg_latlon)

print("Segments ready:", len(route_segments))

print("Clean points:", len(clean_coords))
print(clean_coords.head())

Clean points: 197
Segments ready: 1
Clean points: 197
          lon        lat
0  144.966905 -37.817632
1  144.966582 -37.817721
2  144.964799 -37.818231
3  144.962082 -37.819021
4  144.961068 -37.819322


### Visualise the route (Folium PolyLine)

This block uses Folium to render the City Circle route polyline on an interactive map. Each segment is drawn with PolyLine, and optional markers indicate the first segment’s start/end. The output is saved as an HTML file you can open in any browser.

In [153]:
# Center the map on Melbourne CBD
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)

# Draw each segment as a PolyLine
for seg in route_segments:
    folium.PolyLine(
        locations=seg,
        weight=4,
        opacity=0.9
    ).add_to(m)

if route_segments:
    start_latlon = route_segments[0][0]
    end_latlon = route_segments[0][-1]
    folium.Marker(start_latlon, popup="Route Start").add_to(m)
    folium.Marker(end_latlon, popup="Route End").add_to(m)

# Save to HTML
m.save("city_circle_tram_route_map.html")
print("Map saved: city_circle_tram_route_map.html")


Map saved: city_circle_tram_route_map.html


### Importing Foothpath Dataset

This code uses the Melbourne Open Data platform's v2.1 API to directly obtain the Footpaths dataset.
First, the dataset's API endpoint (URL) is defined, and the first 20 records are requested.
To prevent connections from stalling, a GET request with a 20-second timeout restriction is made using Python's requests package.
To make sure the request was successful, the response is verified using raise_for_status(); if the server provides an error status code (such as 400 or 500), the code will raise an exception.



In [155]:
# API endpoint 
URL = "https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/footpaths/records?limit=20"

# Fetch JSON and normalise
resp = requests.get(URL, timeout=20)
resp.raise_for_status()
payload = resp.json()

# Flatten the single-record result into a DataFrame
raw = pd.json_normalize(payload["results"])

print(raw.columns.tolist())
raw.head()


['prop_id', 'name', 'shape_stle', 'addresspt1', 'xorg', 'ext_id', 'asset_clas', 'label', 'asset_type', 'easting', 'last_edite', 'created_us', 'northing', 'created_da', 'str_id', 'addresspt', 'asset_subt', 'xsource', 'profile', 'last_edi_1', 'xdate', 'xdrawing', 'mcc_id', 'shape_star', 'roadseg_id', 'geo_point_2d.lon', 'geo_point_2d.lat', 'geo_shape.type', 'geo_shape.geometry.coordinates', 'geo_shape.geometry.type']


Unnamed: 0,prop_id,name,shape_stle,addresspt1,xorg,ext_id,asset_clas,label,asset_type,easting,...,xdate,xdrawing,mcc_id,shape_star,roadseg_id,geo_point_2d.lon,geo_point_2d.lat,geo_shape.type,geo_shape.geometry.coordinates,geo_shape.geometry.type
0,0,,21.5910484766,0.0,RapidMap,RPSP1130065L1,Road,,Road Footway,0.0,...,0,,1514053,26.5821659543,0,144.945766,-37.822492,Feature,"[[[[144.94573449445645, -37.82251778152554], [...",MultiPolygon
1,0,,144.223224596,0.0,RapidMap,RPSP1130703L1,Road,,Road Footway,0.0,...,0,,1513715,120.219373314,0,144.954843,-37.821922,Feature,"[[[[144.9551034787354, -37.82185980397694], [1...",MultiPolygon
2,0,,818.220545572,0.0,RapidMap,RPSP1112001,Road,,Road Footway,0.0,...,20130507,7918206.0,1477297,2456.78039176,23304,144.938555,-37.822202,Feature,"[[[[144.93680394113545, -37.821767153918195], ...",MultiPolygon
3,0,,207.936639844,0.0,,RPSP0814985L1,Road,,Road Footway,0.0,...,0,8235172.0,1389514,495.23854599,20168,144.961885,-37.811803,Feature,"[[[[144.96203704641766, -37.81220170024069], [...",MultiPolygon
4,0,,72.7951236713,0.0,RapidMap,RPSP07A11441L1,Road,,Road Footway,0.0,...,0,,1385672,26.5095536491,0,144.924516,-37.795244,Feature,"[[[[144.9243964840183, -37.795113723237044], [...",MultiPolygon


###  Find missing/invalid geolocation

Description:
This validates the representative point (geo_point_2d.lat/lon) for missing and out‑of‑range values and checks that each record actually has polygon geometry (footpaths are stored as MultiPolygon/Polygon). Any problematic rows are listed in issues_df.

In [157]:
# Geolocation 
lat_raw = pd.to_numeric(raw.get("geo_point_2d.lat"), errors="coerce")
lon_raw = pd.to_numeric(raw.get("geo_point_2d.lon"), errors="coerce")

missing_lat = lat_raw.isna()
missing_lon = lon_raw.isna()
invalid_lat_range = ~missing_lat & ~lat_raw.between(-90, 90)
invalid_lon_range = ~missing_lon & ~lon_raw.between(-180, 180)

has_point_issue = missing_lat | missing_lon | invalid_lat_range | invalid_lon_range

# Geometry 
geom_type = raw.get("geo_shape.geometry.type", pd.Series([None]*len(raw)))
has_polygon = geom_type.fillna("").str.contains("Polygon", case=False)
missing_polygon_geo = ~has_polygon

print("Total rows:", len(raw))
print("Missing lat:", int(missing_lat.sum()))
print("Missing lon:", int(missing_lon.sum()))
print("Invalid lat range:", int(invalid_lat_range.sum()))
print("Invalid lon range:", int(invalid_lon_range.sum()))
print("Rows with any point issue:", int(has_point_issue.sum()))
print("Rows missing polygon geometry:", int(missing_polygon_geo.sum()))

cols = [c for c in ["prop_id","name","asset_type","geo_point_2d.lat","geo_point_2d.lon","geo_shape.geometry.type"] if c in raw.columns]
issues_df = raw.loc[has_point_issue | missing_polygon_geo, cols].head(10)
issues_df


Total rows: 20
Missing lat: 0
Missing lon: 0
Invalid lat range: 0
Invalid lon range: 0
Rows with any point issue: 0
Rows missing polygon geometry: 0


Unnamed: 0,prop_id,name,asset_type,geo_point_2d.lat,geo_point_2d.lon,geo_shape.geometry.type


### Cleaning the Data

Creates numeric lat/lon, removes rows with bad/missing coordinates, and filters to polygon features only. Then trims to a tidy set of columns useful for mapping/analysis.



In [159]:
df = raw.copy()

# Create numeric lat/lon columns 
df["lat"] = pd.to_numeric(df.get("geo_point_2d.lat"), errors="coerce")
df["lon"] = pd.to_numeric(df.get("geo_point_2d.lon"), errors="coerce")

# Drop rows with missing of lat and lon
df = df.dropna(subset=["lat", "lon"])
df = df[df["lat"].between(-90, 90) & df["lon"].between(-180, 180)]

# Keep only rows that have polygon geometry
if "geo_shape.geometry.type" in df.columns:
    df = df[df["geo_shape.geometry.type"].str.contains("Polygon", na=False, case=False)]

keep = [c for c in [
    "prop_id","name","asset_type","asset_clas","lat","lon",
    "geo_shape.geometry.type","geo_shape.geometry.coordinates"
] if c in df.columns]
df = df[keep]

print("Cleaned rows:", len(df))
df.head(3)


Cleaned rows: 20


Unnamed: 0,prop_id,name,asset_type,asset_clas,lat,lon,geo_shape.geometry.type,geo_shape.geometry.coordinates
0,0,,Road Footway,Road,-37.822492,144.945766,MultiPolygon,"[[[[144.94573449445645, -37.82251778152554], [..."
1,0,,Road Footway,Road,-37.821922,144.954843,MultiPolygon,"[[[[144.9551034787354, -37.82185980397694], [1..."
2,0,,Road Footway,Road,-37.822202,144.938555,MultiPolygon,"[[[[144.93680394113545, -37.821767153918195], ..."


### Visualise (Folium): polygons + centroid markers

Description:

Creates an interactive Folium map with two layers: a GeoJSON polygon layer (showing the footpath shapes) and a clustered point layer using each record’s lat/lon for quick inspection. Adds a layer toggle and saves as footpaths_map.html.

In [161]:
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)

N = 200
features = []

for _, row in df.head(N).iterrows():
    geom_type = row.get("geo_shape.geometry.type")
    coords = row.get("geo_shape.geometry.coordinates")

    
    if not isinstance(coords, (list, tuple)) or not isinstance(geom_type, str):
        continue

    features.append({
        "type": "Feature",
        "geometry": {"type": geom_type, "coordinates": coords},
        "properties": {
            "prop_id": row.get("prop_id"),
            "asset_type": row.get("asset_type"),
            "asset_clas": row.get("asset_clas")
        }
    })

# Add polygons
geojson = {"type": "FeatureCollection", "features": features}
folium.GeoJson(geojson, name="Footpaths (polygons)").add_to(m)

# Add point markers
cluster = MarkerCluster(name="Footpath points").add_to(m)
for _, row in df.head(N).iterrows():
    if pd.isna(row["lat"]) or pd.isna(row["lon"]):
        continue
    folium.CircleMarker(
        location=[row["lat"], row["lon"]],
        radius=2,
        popup=f"{row.get('asset_type','Footway')} | id: {row.get('prop_id')}",
        fill=True
    ).add_to(cluster)

folium.LayerControl(collapsed=False).add_to(m)
m.save("footpaths_map.html")
print("Map saved: footpaths_map.html")


Map saved: footpaths_map.html


### Importing the Landmark Datasets

The script begins by retrieving the dataset from the Melbourne Open Data API using the requests library. The API returns the data in JSON format, which is flattened into a Pandas DataFrame with pd.json_normalize() for easy handling.
At this stage, we have all available locations, including schools, health facilities, parks, museums, and offices

In [163]:
# API endpoint
URL = "https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/landmarks-and-places-of-interest-including-schools-theatres-health-services-spor/records?limit=100"

# Fetching data
resp = requests.get(URL, timeout=20)
resp.raise_for_status()
payload = resp.json()

# Flatten JSON into DataFrame
raw = pd.json_normalize(payload["results"])

print("Total rows fetched:", len(raw))
print("Columns:", raw.columns.tolist())
raw.head()

Total rows fetched: 100
Columns: ['theme', 'sub_theme', 'feature_name', 'co_ordinates.lon', 'co_ordinates.lat']


Unnamed: 0,theme,sub_theme,feature_name,co_ordinates.lon,co_ordinates.lat
0,Leisure/Recreation,Major Sports & Recreation Facility,Carlton Football Club,144.961968,-37.784086
1,Education Centre,Primary Schools,Carlton Gardens Primary School,144.969406,-37.802095
2,Leisure/Recreation,Informal Outdoor Facility (Park/Garden/Reserve),Kings Domain,144.974108,-37.825524
3,Education Centre,Secondary Schools,Melbourne Grammar School,144.976285,-37.834256
4,Mixed Use,Retail/Office/Carpark,Southgate Arts and Leisure Precinct,144.965989,-37.820225


### Validation the Geolocation Data
We check the co_ordinates.lat and co_ordinates.lon columns to ensure all values are:

- Numeric (non-numeric entries are coerced to NaN)
- Present (no missing values)
- Valid ranges (-90 ≤ lat ≤ 90 and -180 ≤ lon ≤ 180)

This step ensures that only locations with usable coordinates remain for mapping.


In [165]:
# Extract lat ,lon
lat_raw = pd.to_numeric(raw.get("co_ordinates.lat"), errors="coerce")
lon_raw = pd.to_numeric(raw.get("co_ordinates.lon"), errors="coerce")

# Identify issues
missing_lat = lat_raw.isna()
missing_lon = lon_raw.isna()
invalid_lat_range = ~missing_lat & ~lat_raw.between(-90, 90)
invalid_lon_range = ~missing_lon & ~lon_raw.between(-180, 180)

has_geo_issue = missing_lat | missing_lon | invalid_lat_range | invalid_lon_range

print("Total rows:", len(raw))
print("Missing lat:", missing_lat.sum())
print("Missing lon:", missing_lon.sum())
print("Invalid lat range:", invalid_lat_range.sum())
print("Invalid lon range:", invalid_lon_range.sum())
print("Rows with any geolocation issue:", has_geo_issue.sum())


issues_df = raw.loc[has_geo_issue, ["feature_name", "co_ordinates.lat", "co_ordinates.lon"]]
issues_df.head()

Total rows: 100
Missing lat: 0
Missing lon: 0
Invalid lat range: 0
Invalid lon range: 0
Rows with any geolocation issue: 0


Unnamed: 0,feature_name,co_ordinates.lat,co_ordinates.lon


### Cleaning and Filtering Landmarks For Tourists

After validation, the DataFrame is cleaned:

- Rows with missing or invalid coordinates are removed.
- Only specific themes are kept:
    - Leisure/Recreation – parks, gardens, reserves, sports & recreation facilities, squares.
    - Place Of Assembly – museums, galleries, theatres.
    - Mixed Use – areas with retail or leisure.

- All other themes (e.g., vacant land, schools, health services, offices) are excluded to focus on tourist mobility use cases.

- Columns are trimmed to only those relevant for display: theme, sub_theme, feature_name, lat, and lon.

In [167]:
# cleaning
df = raw.copy()
df["lat"] = pd.to_numeric(df.get("co_ordinates.lat"), errors="coerce")
df["lon"] = pd.to_numeric(df.get("co_ordinates.lon"), errors="coerce")

# Remove missing coords
df = df.dropna(subset=["lat", "lon"])
df = df[df["lat"].between(-90, 90) & df["lon"].between(-180, 180)]

# Keep only tourist-related themes
tourist_themes = [
    "Leisure/Recreation",
    "Place Of Assembly",
    "Mixed Use"
]
df = df[df["theme"].isin(tourist_themes)]

# Keep only relevant columns
df = df[["theme", "sub_theme", "feature_name", "lat", "lon"]]

print("Cleaned tourist-relevant rows:", len(df))
df.head()


Cleaned tourist-relevant rows: 46


Unnamed: 0,theme,sub_theme,feature_name,lat,lon
0,Leisure/Recreation,Major Sports & Recreation Facility,Carlton Football Club,-37.784086,144.961968
2,Leisure/Recreation,Informal Outdoor Facility (Park/Garden/Reserve),Kings Domain,-37.825524,144.974108
4,Mixed Use,Retail/Office/Carpark,Southgate Arts and Leisure Precinct,-37.820225,144.965989
5,Place Of Assembly,Art Gallery/Museum,Australian Centre for Contemporary Art,-37.826605,144.967253
7,Leisure/Recreation,Informal Outdoor Facility (Park/Garden/Reserve),Federation Square,-37.817852,144.968964


### Visualising the Landmarks Using Folium

The filtered dataset is plotted on an interactive map using Folium:

- The map is centred on Melbourne CBD.
- A MarkerCluster groups nearby points to keep the map clean.
- Each marker’s popup shows the landmark’s name and sub-theme.
- Markers are styled in green to represent tourist attractions.

The resulting HTML file (tourist_landmarks_map.html) can be opened in a web browser to explore Melbourne’s tourist-relevant locations interactively.


In [169]:

# Base map
m = folium.Map(location=[-37.8136, 144.9631], zoom_start=12)
cluster = MarkerCluster().add_to(m)

# Add tourist markers
for _, row in df.iterrows():
    label = f"{row['feature_name']} ({row['sub_theme']})"
    folium.Marker(
        location=[row["lat"], row["lon"]],
        popup=label,
        icon=folium.Icon(color="green", icon="info-sign")
    ).add_to(cluster)

m.save("tourist_landmarks_map.html")
print("Map saved: tourist_landmarks_map.html")


Map saved: tourist_landmarks_map.html
