# Housing Recommendation based on distance to nearby utilities

**Authored by**:  Linh Huong Nguyen

**Duration**: 90 mins

**Level**: Intermediate

**Pre-requisite Skills**: Python, Pandas, Matplotlib, NumbPy, Seaborn, Scikit-learn


### Scenario

As a tenant looking for rental houses, and a parent, I would like to calculate the total travelling distance if I have to drop off my kid to school and travel to work, so that I can choose the most suitable place to apply for rent that saves me the most travelling time.

### What this use case will teach you

At the end of this use case, you will have demonstrated the following skills:

* Accessed and imported geospatial and rental listing datasets from open data portals and APIs.

* Performed data cleaning, preprocessing, and geocoding of addresses to ensure spatial accuracy.

* Used geospatial libraries to calculate distances between points of interest (POIs) and rental properties.

* Conducted exploratory data analysis (EDA) to assess accessibility patterns and disparities.

* Visualized geospatial data on interactive maps to highlight proximity patterns and coverage gaps.

* Derived actionable insights to inform housing accessibility and urban development policies.

### Background and Introduction

The accessibility of essential public amenities such as stations and schools and their distance to Melbourne CBD, plays a significant role in shaping rental market dynamics, urban livability, and resident satisfaction. For renters, proximity to these amenities can influence housing decisions, commute times, and quality of life. For policymakers and urban planners, understanding how well rental properties are served by these facilities is crucial for identifying underserved areas, prioritizing infrastructure investments, and ensuring equitable access across the community.

This use case addresses the need for a data-driven approach to evaluating the spatial relationship between rental housing and local amenities like stations and schools location. By combining rental property data with geospatial datasets of public transport stations and schools, we can calculate precise distances and analyze accessibility patterns across different rental listings. These insights can help tenants looking for rental places find the location suits their best interest.

### Datasets used


* Open Route Service API
This dataset provides direction and travelling time by car from different locations on an opensource map. The dataset is accessed through an API key.

* PTV API
This dataset provides all train stations location in Victoria and their routes, stop times.The dataset is accessed through an user ID and API key.

* Victoria School Locations 2024
This dataset contains the list of all school locations in Victoria, including primary and secondary schools, government and non-government. Attributes include school name, sector, type, address, phone, and geographic coordinates. Data is sourced from the Victorian Department of Education and accessible via API V2.1.

* [Rental Listings Dataset]
This dataset contains current rental property listings in the City of Melbourne. Attributes may include address, rental price, property type, number of bedrooms, and listing date. Addresses will be geocoded for spatial analysis.

### Importing Datasets

This section imports essential libraries for data manipulation, visualization, geospatial analysis, interactive mapping, and fetching data from APIs. These libraries provide the necessary functionality for processing, analyzing, and visualizing the project data effectively.

In [191]:
from geopy.geocoders import Nominatim
from scipy.spatial import KDTree
from geopy.distance import geodesic
import openrouteservice
import requests
import pandas as pd
import os
from io import StringIO
import requests
import seaborn as sns
import folium
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import shape, Point
import json
import zipfile
from io import BytesIO
import re
import warnings
warnings.filterwarnings("ignore")

### Loading the datasets using API 2.1v

This section defines functions for fetching data from APIs. The API_Unlimited function retrieves datasets from the Melbourne Open Data Portal using dataset IDs, processes the data into a DataFrame, and provides a preview for verification. Similarly, the fetch_data_from_url function fetches data directly from a given URL, processes it into a DataFrame, and displays a sample for validation. These functions enable seamless access to external datasets for analysis.

In [192]:
def API_Unlimited_external(datasetname): # pass in dataset name and api key
    dataset_id = datasetname

    base_url = 'https://www.education.vic.gov.au/Documents/about/research/datavic/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}.{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC'
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.text
        datasetname = pd.read_csv(StringIO(url_content), delimiter=',')
        print(datasetname.sample(10, random_state=999)) # Test
        return datasetname 
    else:
        return (print(f'Request failed with status code {response.status_code}'))



### Fetching and Previewing Datasets

This section defines the dataset download links required for the use case and fetches the corresponding data using the API_Unlimited_external function. The datasets include school locations which are essential for calculating distance from rental listings to schools.

In [193]:
download_link = 'dv378_DataVic-SchoolLocations-2024'

# Use functions to download and load data
school_locations = API_Unlimited_external(download_link)

     Education_Sector  Entity_Type  School_No  \
1482       Government            1       5244   
1673       Government            1       5553   
976        Government            1       3077   
1926       Government            1       8466   
1298       Government            1       4923   
2134      Independent            2       1729   
1308       Government            1       4943   
1413       Government            1       5136   
49           Catholic            2        324   
352          Catholic            2       1715   

                                School_Name School_Type  \
1482                  Findon Primary School     Primary   
1673              Tulliallan Primary School     Primary   
976               Korumburra Primary School     Primary   
1926           Bairnsdale Secondary College   Secondary   
1298              Mount View Primary School     Primary   
2134                Sholem Aleichem College     Primary   
1308  Wilmot Road Primary School Shepparton    

In [194]:
school_locations= school_locations.dropna(subset=['Y', 'X'])

### GTFS Schedule Dataset

The GTFS Schedule dataset contains static timetable information of public transport services in Victoria. In the below code chunks, a function is created to download and extract data from the GTFS opensourced data portal 

In [195]:

current_directory = os.getcwd()
dataset_folder = 'mpt_data'
dataset_path = os.path.join(current_directory, dataset_folder)
inner_zip_paths = ['2/google_transit.zip', '3/google_transit.zip', '4/google_transit.zip']

In [196]:
def API_GTFS(url: str, inner_zip_paths: list) -> dict:

    required_files = ['stops.txt', 'stop_times.txt', 'routes.txt', 'trips.txt', 'calendar.txt']
    datasets = {}
    # Download main zip
    response = requests.get(url)
    response.raise_for_status()

    # Open main zip in memory
    with zipfile.ZipFile(BytesIO(response.content)) as main_zip:
        for inner_zip_path in inner_zip_paths:
            if inner_zip_path not in main_zip.namelist():
                continue

            subfolder_name = os.path.basename(os.path.dirname(inner_zip_path))
            datasets[subfolder_name] = {}

            with main_zip.open(inner_zip_path) as inner_zip_file:
                with zipfile.ZipFile(BytesIO(inner_zip_file.read())) as inner_zip:
                    for file_name in required_files:
                        if file_name in inner_zip.namelist():
                            with inner_zip.open(file_name) as f:
                                datasets[subfolder_name][file_name] = pd.read_csv(f)

    return datasets


In [197]:
inner_zip_paths = ['2/google_transit.zip']
url = 'https://data.ptv.vic.gov.au/downloads/gtfs.zip'

datasets=API_GTFS(url, inner_zip_paths)

In the below code, train stations locations are extracted from their respective folder.
In the datasetm we can see each station's specific ID, name, location (latitude and longitude) and other necessary information like accessibility for wheelchairs or platform code, etc. 

In [198]:
train_stops=datasets["2"]["stops.txt"]
train_stops['stop_name'] = train_stops['stop_name'].astype(str).str.strip()
train_stops = train_stops[
    train_stops['stop_name'].str.contains('Station', case=False, na=False)
].copy()
print(train_stops.head())


  stop_id                  stop_name   stop_lat    stop_lon  location_type  \
0   10117        Jordanville Station -37.873763  145.112473            NaN   
1   10920          Flagstaff Station -37.811880  144.956043            NaN   
2   10921          Flagstaff Station -37.811725  144.955968            NaN   
3   10922  Melbourne Central Station -37.809973  144.962513            NaN   
4   10923  Melbourne Central Station -37.809865  144.962505            NaN   

  parent_station  wheelchair_boarding  level_id platform_code  
0   vic:rail:JOR                  1.0   Level 0             1  
1   vic:rail:FGS                  1.0  Level -2             1  
2   vic:rail:FGS                  1.0  Level -2             2  
3   vic:rail:MCE                  1.0  Level -2             1  
4   vic:rail:MCE                  1.0  Level -2             2  


The below code shows the stop times dataset, where we can see a list of all trips at each station, their arrival time, and the sequence of each trip.

In [199]:
stop_times=datasets["2"]["stop_times.txt"]
for df in (stop_times,):
    df["trip_id"]  = df["trip_id"].astype(str)
    df["stop_id"]  = df["stop_id"].astype(str)
stop_times.head()


Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
0,02-ALM--15-T2-2302,04:57:00,04:57:00,11197,1,,0,0,0.0
1,02-ALM--15-T2-2302,04:58:00,04:58:00,11198,2,,0,0,713.74
2,02-ALM--15-T2-2302,05:00:00,05:00:00,11200,3,,0,0,1918.59
3,02-ALM--15-T2-2302,05:02:00,05:02:00,11202,4,,0,0,2901.39
4,02-ALM--15-T2-2302,05:04:00,05:04:00,11203,5,,0,0,3921.94


The route dataset contains list of all routes with their names, types and how they are visualised on the map.

In [200]:
routes = datasets["2"]["routes.txt"]
routes.head()


Unnamed: 0,route_id,agency_id,route_short_name,route_long_name,route_type,route_color,route_text_color
0,aus:vic:vic-02-ALM:,,Alamein,Alamein - City,2,152C6B,FFFFFF
1,aus:vic:vic-02-ALM-R:,,Replacement Bus,Alamein - City,2,FE5000,FFFFFF
2,aus:vic:vic-02-BEG:,,Belgrave,Belgrave - City,2,152C6B,FFFFFF
3,aus:vic:vic-02-BEG-R:,,Replacement Bus,Belgrave - City,2,FE5000,FFFFFF
4,aus:vic:vic-02-CBE:,,Cranbourne,Cranbourne - City,2,279FD5,000000


This code snippet processed the trips dataset, converting "trip_id" and "route_id" columns into string datatype.

In [201]:
trips=datasets["2"]["trips.txt"]
for df in (trips,):
    df["trip_id"]  = df["trip_id"].astype(str)
    df["route_id"] = df["route_id"].astype(str)
trips.head()

Unnamed: 0,route_id,service_id,trip_id,shape_id,trip_headsign,direction_id,block_id,wheelchair_accessible
0,aus:vic:vic-02-ALM:,T2,02-ALM--15-T2-2302,2-ALM-vpt-15.1.R,Camberwell,1,,1
1,aus:vic:vic-02-ALM:,T2,02-ALM--15-T2-2304,2-ALM-vpt-15.1.R,Camberwell,1,,1
2,aus:vic:vic-02-ALM:,T2,02-ALM--15-T2-2305,2-ALM-vpt-15.2.H,Alamein,0,,1
3,aus:vic:vic-02-ALM:,T2,02-ALM--15-T2-2306,2-ALM-vpt-15.1.R,Camberwell,1,,1
4,aus:vic:vic-02-ALM:,T2,02-ALM--15-T2-2307,2-ALM-vpt-15.2.H,Alamein,0,,1


The below code merges all train dataset from PTV for a more holistic view.

In [202]:
train_data = stop_times.merge(train_stops, on="stop_id", how="left").merge(trips, on="trip_id", how="left").merge(routes, on="route_id", how="left")


From the below code, data wrangling is performed to drop all unnessary columns and handle missing values.

In [203]:
train_data_clean=train_data.drop(columns=['location_type','parent_station','wheelchair_boarding','level_id','platform_code','wheelchair_boarding','level_id','platform_code','stop_headsign','pickup_type','drop_off_type','agency_id','service_id','route_type','route_color','route_text_color','shape_id','trip_headsign','direction_id','block_id','wheelchair_accessible'])
train_data_clean= train_data_clean.dropna(subset=['stop_lat', 'stop_lon'])
train_data_clean.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,shape_dist_traveled,stop_name,stop_lat,stop_lon,route_id,route_short_name,route_long_name
0,02-ALM--15-T2-2302,04:57:00,04:57:00,11197,1,0.0,Alamein Station,-37.868186,145.079705,aus:vic:vic-02-ALM:,Alamein,Alamein - City
1,02-ALM--15-T2-2302,04:58:00,04:58:00,11198,2,713.74,Ashburton Station,-37.861932,145.08139,aus:vic:vic-02-ALM:,Alamein,Alamein - City
2,02-ALM--15-T2-2302,05:00:00,05:00:00,11200,3,1918.59,Burwood Station,-37.851744,145.08054,aus:vic:vic-02-ALM:,Alamein,Alamein - City
3,02-ALM--15-T2-2302,05:02:00,05:02:00,11202,4,2901.39,Hartwell Station,-37.843883,145.075426,aus:vic:vic-02-ALM:,Alamein,Alamein - City
4,02-ALM--15-T2-2302,05:04:00,05:04:00,11203,5,3921.94,Willison Station,-37.83567,145.070242,aus:vic:vic-02-ALM:,Alamein,Alamein - City


### Rental Listings Spreadsheet

A spreadsheet of rental listings is collected from realestate.com.au to be ranked based on their distance to utilities. This list is gathered from multiple suburbs to provide a variety of locations and compare the accessibility of current rental listings from different areas.

In [204]:
rental_listings = pd.read_excel("Rental Listings.xlsx")
rental_listings.head()

Unnamed: 0,Suburb,Postcode,Address
0,Box Hill,3128,"1/8 Ashted Road, Box Hill, Vic"
1,Box Hill,3128,"1112/850 Whitehorse Road, Box Hill, Vic"
2,Hawthorn,3122,"3/494 Glenferrie Road, Hawthorn, Vic"
3,Hawthorn,3122,"517B/200 Burwood Road, Hawthorn, Vic"
4,Doncaster,3108,"6/77-79 Wetherby Road, Doncaster, Vic"


The below code processed the addresses to remove the unit/apartment numbers.  

In [205]:
rental_listings['Address'] = rental_listings['Address'].str.split("/").str[-1]
rental_listings.head()

Unnamed: 0,Suburb,Postcode,Address
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic"
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic"
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic"
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic"
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic"


### Generate latitude and longitude from address

The below code is a function to return the latitude and longitude from a passed in address.

In [206]:
#Function to geocode an address using Nominatim
def geocode_address(address):
    geolocator = Nominatim(user_agent="mapping_app1.0",timeout=15)
    
    #Define the bounding box for Melbourne
    melbourne_bbox = [(-38.5267, 144.5937), (-37.5113, 145.5125)] 
    
    #Geocode the address within the Melbourne bounding box
    location = geolocator.geocode(address, viewbox=melbourne_bbox, bounded=True)
    
    if location:
        return location.latitude, location.longitude
    else:
        return None
    


After passing the function to the rental listings dataframe, specific coordinates of each rental place is gathered.

In [207]:
rental_listings['coords'] = rental_listings['Address'].apply(geocode_address)
rental_listings.head()

Unnamed: 0,Suburb,Postcode,Address,coords
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic","(-37.8231233, 145.1236049)"
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic","(-37.8176227, 145.1184549)"
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic","(-37.8318201, 145.0338629)"
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic","(-37.8221063, 145.0299116)"
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic","(-37.79392, 145.1419781)"


In the below code, the "coords" column is divided into latitude and longitude columns for each rental listing.

In [208]:
rental_listings_clean=rental_listings.dropna(subset='coords')
rental_listings_clean[['latitude', 'longitude']] =pd.DataFrame(
    rental_listings_clean['coords'].tolist(), index=rental_listings_clean.index)
rental_listings_clean=rental_listings_clean.drop(columns=['coords'])
rental_listings_clean.head()

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978


From the above codes, rental listings table has been added longtitude and latitude columns to navigate the listing on a map.

In [209]:
# Average lat/lon for center
melbourne_center = [rental_listings_clean['latitude'].mean(), rental_listings_clean['longitude'].mean()]
listing_map = folium.Map(location=melbourne_center, zoom_start=13)

# Add coworking space markers
for _, row in rental_listings_clean.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['Address'],
    ).add_to(listing_map)


# Display the map
listing_map


All rental listings have been visualised on a map. By clicking on each marker, the rental listing's address is shown.
It is visible that the listings are spread out to different suburbs. Most data points collected are situated on the South Eastern area of Melbourne, where there is high demands in renting.

### Find nearest station

The below code is a function to return the nearest station from each rental listing location, using KDTree model. KD-tree is a space-partitioning data structure for fast nearest-neighbor search. KD-trees typically use Euclidean distance to provide angular distance and rank neighbors locally.

In [210]:
def find_nearest_station(lat, lon, kdtree, df):
    #Query the KDTree with the given latitude and longitude to find nearest station. Distance is returned in degrees so need to calculate the meters
    distances, indices = kdtree.query([[lat, lon]], k=1)

    #Get the nearest station details from the DataFrame
    idx = int(indices[0]) 

    # Nearest station row
    nearest = df.iloc[idx]

    # Distance in meters
    distance_meters = geodesic(
        (lat, lon), (nearest["stop_lat"], nearest["stop_lon"])
    ).meters

    # Return ONLY stop_id + distance
    return nearest["stop_name"], distance_meters, nearest["stop_lat"], nearest["stop_lon"]

Then the function is applied to the dataset which contains all coordinates of train stations.

In [211]:
train_stations_coords = train_data_clean[["stop_lat", "stop_lon"]].values
kdtree_train = KDTree(train_stations_coords)
rental_listings_clean[["nearest_station", "distance_meters_station", "nearest_station_lat", "nearest_station_lon"]] = (
    rental_listings_clean.apply(
        lambda r: pd.Series(
            find_nearest_station(r["latitude"], r["longitude"], kdtree_train, train_data_clean)
        ),
        axis=1
    )
)
rental_listings_clean.head()

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082


From the above table, Nearest station column and its coordinate is added to the listing table. As a result, distance from each rental listing to its nearest station is also calculated and added to the table.

In [212]:
# Average lat/lon for center
melbourne_center = [rental_listings_clean['latitude'].mean(), rental_listings_clean['longitude'].mean()]
listing_map = folium.Map(location=melbourne_center, zoom_start=13)

# Add coworking space markers
for _, row in rental_listings_clean.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['Address'],
    ).add_to(listing_map)

    folium.Marker(
        location=[row['nearest_station_lat'], row['nearest_station_lon']],
        popup=row['nearest_station'],
        icon=folium.Icon(color='red', icon='train-subway', prefix='fa')
    ).add_to(listing_map)
# Add nearest station markers
listing_map


### Find nearest school

Similarly, a function is created to find the nearest school, also using KDTree.

In [213]:
def find_nearest_school(lat, lon, kdtree, df):
    #Query the KDTree with the given latitude and longitude to find nearest station. Distance is returned in degrees so need to calculate the meters
    distance, index = kdtree.query([lat, lon])

    #Get the nearest station details from the DataFrame
    nearest_school = df.iloc[index]

    #Extract stations coords
    nearest_school_coords = (nearest_school["Y"], nearest_school["X"])
    point_coords = (lat, lon)

    #Calculate the geodesic distance (in meters) between the point and the nearest statio
    distance_meters = geodesic(point_coords, nearest_school_coords).meters
    school_name = nearest_school.get("School_Name", None)
    
    return school_name, nearest_school_coords[0], nearest_school_coords[1]

In the below code, schools ' latitude and longitude are extracted for visualisations.

In [214]:
nearest_school_coords = school_locations[["Y", "X"]].values
kdtree_school = KDTree(nearest_school_coords)
rental_listings_clean[["nearest_school", "school_latitude", "school_longitude"]] = (
    rental_listings_clean.apply(
        lambda r: pd.Series(
            find_nearest_school(r["latitude"], r["longitude"],kdtree_school, school_locations)
        ),
        axis=1
    )
)
rental_listings_clean.head()

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835,Our Lady of Sion College,-37.81835,145.12992
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386,Box Hill Senior Secondary College,-37.80924,145.11191
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082,Doncaster Secondary College,-37.78458,145.13805


In [215]:
# Average lat/lon for center
melbourne_center = [rental_listings_clean['latitude'].mean(), rental_listings_clean['longitude'].mean()]
listing_map = folium.Map(location=melbourne_center, zoom_start=13)

# Add coworking space markers
for _, row in rental_listings_clean.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['Address'],
    ).add_to(listing_map)
    folium.Marker(
        location=[row['school_latitude'], row['school_longitude']],
        popup=row['nearest_school'],
        icon=folium.Icon(color='green', icon='school', prefix='fa')
    ).add_to(listing_map)

# Display the map
listing_map


From the above codes, nearest schools, coordinates and distances to rental listings are added to the table.

### Workplace Location

In the below code, tenants can fill in their workplace address to calculate their travelling distance by train to the city.

In [216]:
work_place = input("Enter your workplace address: ")
print(f"You entered: {work_place}")

work_place_coords = geocode_address(work_place)
work_place_coords

You entered: 480 Collins St Melbourne


(-37.817696, 144.9580457)

Then the nearest station from their workplace is found through "find nearest station" function

In [217]:
work_place_station = find_nearest_station(work_place_coords[0], work_place_coords[1], kdtree_train, train_data_clean)[0]
work_place_station


'Southern Cross Station'

### Train Travelling Time Calculation

The below function converts a GTFS-style time string like "HH:MM:SS" into a datetime.timedelta. In GTFS, times can go past 24 hours (e.g., "25:10:00") to represent trips after midnight. The code parses the hours, minutes, and seconds; if the hour is 24 or more, it subtracts 24 and returns a timedelta with one extra day plus the remaining hours/minutes/seconds. Otherwise, it returns a normal timedelta for that time since midnight.

In [218]:
from datetime import  timedelta
def parse_time(gtfs_time: str) -> timedelta:

        hours, minutes, seconds = map(int, gtfs_time.split(':'))
        if hours >= 24:
            hours = hours - 24
            return timedelta(days=1, hours=hours, minutes=minutes, seconds=seconds)
        return timedelta(hours=hours, minutes=minutes, seconds=seconds)



This below code snippet reorganises the merged GTFS table so it’s indexed by ("stop_name", "trip_id"), which makes looking up all rows for a given station fast. The check_direct_route function then pulls all stop-time rows between two stations, merges them on find trips that include both stations, and filters to keep only those where A’s stop_sequence comes before B’s so that we could find a direct train that reaches B after A on the same trip. It returns a boolean indicating whether such trips exist plus the matching trip_ids. Finally, that function is applied row-by-row to the speadsheet, comparing each listing’s nearest station to work place nearest station, producing two new columns: has_direct_route (True/False) and direct_trip_ids (the list of matching trips).

In [219]:
stop_times_clean = train_data_clean.set_index(["stop_name", "trip_id"]).sort_index()

def check_direct_route(stop_a, stop_b, stop_times_df):

        try:
            stop_a_times = stop_times_df.xs(stop_a, level='stop_name')
            stop_b_times = stop_times_df.xs(stop_b, level='stop_name')
        except KeyError:
            return False, []

        merged = pd.merge(stop_a_times.reset_index(), stop_b_times.reset_index(), on='trip_id', suffixes=('_a', '_b'))
        valid_trips = merged[merged['stop_sequence_a'] < merged['stop_sequence_b']]

        if not valid_trips.empty:
            return True, valid_trips['trip_id'].unique()
        return False, []

rental_listings_clean[["has_direct_route", "direct_trip_ids"]] = (
    rental_listings_clean.apply(
        lambda r: pd.Series(
            check_direct_route(str(r["nearest_station"]), work_place_station, stop_times_clean)
        ),
        axis=1
    )
)
rental_listings_clean.head()

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude,has_direct_route,direct_trip_ids
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835,Our Lady of Sion College,-37.81835,145.12992,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-..."
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386,Box Hill Senior Secondary College,-37.80924,145.11191,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-..."
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229,False,[]
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-..."
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082,Doncaster Secondary College,-37.78458,145.13805,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-..."


The function below estimates how long a multi-stop public-transport route takes. It sorts the stop-times for fast lookups, then walks the route pair-by-pair (from staion A to statopm B). For each leg it first checks there’s a direct trip from A to B (by using check_direct_route); if not, it returns None. It picks the first matching trip_id, pulls A’s departure and B’s arrival times for that trip, converts them to timedeltas with parse_time, computes the leg duration, and accumulates the total in seconds. At the end it returns the total minutes to travel of all routes.

In [220]:
def calculate_route_travel_time(route, stops_df, stop_times_df):

        total_travel_time = 0.0

        stop_times_df.sort_index(inplace=True)

        for i in range(len(route) - 1):
            station_a = route[i]
            station_b = route[i + 1]

            direct_route_exists, trip_ids = check_direct_route(station_a, station_b, stop_times_df)
            if not direct_route_exists:
                return None

            best_trip_id = trip_ids[0]

            try:
                stop_a_time = stop_times_df.loc[(station_a, best_trip_id),'departure_time']
                stop_b_time = stop_times_df.loc[(station_b, best_trip_id),'arrival_time']

                if isinstance(stop_a_time, pd.Series):
                    stop_a_time = stop_a_time.iloc[0]
                if isinstance(stop_b_time, pd.Series):
                    stop_b_time = stop_b_time.iloc[0]

            except KeyError:
                return None

            travel_time = parse_time(stop_b_time) - parse_time(stop_a_time)
            total_travel_time += travel_time.total_seconds()

        return total_travel_time / 60  # Return time in minutes




In the below function, if there is a direct route between two stops, travel time is calculated. Otherwise, it forces a transfer via Flinders Street Station, which is a common transferring center for those working in CBD. In both cases it returns the total travel time (in minutes) computed by calculate_route_travel_time (or None if any leg can’t be found). 

In [221]:
def add_travel_time(has_direct_route,station_a,station_b, stops_df,stop_times_df):
    if has_direct_route:
        return calculate_route_travel_time([station_a]+[station_b], stops_df, stop_times_df)
    return calculate_route_travel_time([station_a] + ['Flinders Street Station'] + [station_b], stops_df, stop_times_df)

Finally, the train travel time is calculated for each of the rental listing.

In [222]:
rental_listings_clean[['train_travel_time']] = (
    rental_listings_clean.apply(
        lambda r: pd.Series(
            add_travel_time(r["has_direct_route"],r['nearest_station'], work_place_station,train_stops, stop_times_clean)
        ),
        axis=1
    )
)
rental_listings_clean.head()

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude,has_direct_route,direct_trip_ids,train_travel_time
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835,Our Lady of Sion College,-37.81835,145.12992,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386,Box Hill Senior Secondary College,-37.80924,145.11191,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229,False,[],16.0
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-...",15.0
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082,Doncaster Secondary College,-37.78458,145.13805,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",33.0


### Car Travelling Time Calculation

This function uses the OpenRouteService Directions API to estimate travel time between two points. By passing start/end latitude/longitude plus an api_key (and default the means of transportation is "driving-car"), it builds a client, requests a route in GeoJSON, then looks at the first feature’s first segment and iterates over its turn-by-turn steps, summing each step’s duration and distance. Finally, it converts the summed duration to minutes and returns that number. 

In [223]:
def calculate_driving_time(start_lat, start_lng, end_lat, end_lng,api_key, profile='driving-car'):
    #Create an OpenRouteService client instance
    client = openrouteservice.Client(key=api_key)

    #Set up coordinates for the route
    coordinates = [[start_lng, start_lat], [end_lng, end_lat]]
    #Get the route between the coordinates with the specified profile (e.g. can have walking/driving)
    route = client.directions(coordinates=coordinates, profile=profile, format='geojson')
    steps = route['features'][0]['properties']['segments'][0]['steps']
    total_distance = 0
    total_duration = 0

    #Generating step information
    for step in steps:
        instruction = step['instruction']
        distance = step['distance']
        duration = step['duration']

        #Accumulate total distance and duration
        total_distance += distance
        total_duration += duration
    
    total_duration = total_duration/60
    return total_duration



Lastly the function was applied to all rental listings and find their driving time from each place to the nearest school.

In [224]:
api_key = '5b3ce3597851110001cf6248a6b7c97bb850491794bb504b30e2f2f7'

rental_listings_clean[['house_school_travel_time']] = (
    rental_listings_clean.apply(
        lambda r: pd.Series(
            
            calculate_driving_time(r["latitude"], r["longitude"], r['school_latitude'], r['school_longitude'],api_key=api_key, profile='driving-car')
            
        ),
        axis=1
    )
)
rental_listings_clean.head()



Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude,has_direct_route,direct_trip_ids,train_travel_time,house_school_travel_time
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835,Our Lady of Sion College,-37.81835,145.12992,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0,2.675
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386,Box Hill Senior Secondary College,-37.80924,145.11191,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0,3.571667
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229,False,[],16.0,0.643333
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-...",15.0,0.921667
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082,Doncaster Secondary College,-37.78458,145.13805,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",33.0,3.001667


Lastly, a sum between travelling time by train to the workplace and by car to school is calculated and passed into an index called "total travel time".

In [225]:
rental_listings_clean = rental_listings_clean.assign(
    total_travel_time = rental_listings_clean[['train_travel_time','house_school_travel_time']].sum(axis=1)
)
rental_listings_clean.head(10)

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude,has_direct_route,direct_trip_ids,train_travel_time,house_school_travel_time,total_travel_time
0,Box Hill,3128,"8 Ashted Road, Box Hill, Vic",-37.823123,145.123605,Box Hill Station,447.359525,-37.819345,145.121835,Our Lady of Sion College,-37.81835,145.12992,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0,2.675,32.675
1,Box Hill,3128,"850 Whitehorse Road, Box Hill, Vic",-37.817623,145.118455,Box Hill Station,306.547766,-37.819113,145.121386,Box Hill Senior Secondary College,-37.80924,145.11191,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0,3.571667,33.571667
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229,False,[],16.0,0.643333,16.643333
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-...",15.0,0.921667,15.921667
4,Doncaster,3108,"77-79 Wetherby Road, Doncaster, Vic",-37.79392,145.141978,Laburnum Station,2966.247633,-37.820629,145.14082,Doncaster Secondary College,-37.78458,145.13805,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",33.0,3.001667,36.001667
5,Doncaster,3108,"21 Sargent Street, Doncaster, Vic",-37.796604,145.12161,Box Hill Station,2498.408059,-37.819113,145.121386,Box Hill North Primary School,-37.80079,145.12331,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",30.0,4.288333,34.288333
6,Mitcham,3132,"1A Bruce Street, Mitcham, Vic",-37.812899,145.185698,Mitcham Station,869.709394,-37.817984,145.193212,St John's School,-37.81744,145.1904,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",41.0,2.39,43.39
7,Mitcham,3132,"12 Norman Street, Mitcham, Vic",-37.825342,145.19773,Mitcham Station,901.263971,-37.818047,145.193234,Rangeview Primary School,-37.82958,145.2063,True,"[02-BEG--1-T5-3600, 02-BEG--1-T5-3602, 02-BEG-...",41.0,5.045,46.045
8,Burwood,3125,"15 Cumming Street, Burwood, Vic",-37.851436,145.113543,Jordanville Station,2473.906468,-37.873708,145.112463,Mount Scopus Memorial College,-37.84722,145.11777,False,[],31.0,2.216667,33.216667
9,Burwood,3125,"373 Burwood Highway, Burwood, Vic",-37.85137,145.12973,Mount Waverley Station,2651.11569,-37.87522,145.128102,St Scholastica's School,-37.85165,145.12591,False,[],33.0,0.591667,33.591667


### Ranking travelling time among rental listings

Based on the travel time, top 10 rental listings that are most convenient and save the most travelling time are listed below by sorting the total travel time by ascending order.

In [226]:
top10_listing = (rental_listings_clean
         .sort_values("total_travel_time", ascending=True)
         .head(10)
         .copy())
top10_listing

Unnamed: 0,Suburb,Postcode,Address,latitude,longitude,nearest_station,distance_meters_station,nearest_station_lat,nearest_station_lon,nearest_school,school_latitude,school_longitude,has_direct_route,direct_trip_ids,train_travel_time,house_school_travel_time,total_travel_time
3,Hawthorn,3122,"200 Burwood Road, Hawthorn, Vic",-37.822106,145.029912,Glenferrie Station,574.354559,-37.821583,145.036402,Rossbourne School,-37.82304,145.02655,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-...",15.0,0.921667,15.921667
2,Hawthorn,3122,"494 Glenferrie Road, Hawthorn, Vic",-37.83182,145.033863,Kooyong Station,890.688011,-37.839836,145.033395,Scotch College,-37.83392,145.03229,False,[],16.0,0.643333,16.643333
27,Pascoe vale,3044,"14 BOLINGBROKE STREET, PASCOE",-37.732244,144.936236,Pascoe Vale Station,706.717089,-37.731245,144.928318,Pascoe Vale Primary School,-37.73122,144.93726,True,"[02-CGB--1-T2-5200, 02-CGB--1-T3-5460, 02-CGB-...",20.0,1.186667,21.186667
23,Caulfield,3162,"7-13 Dudley Street, Caulfield, Vic",-37.880196,145.047438,Caulfield Station,549.733363,-37.877286,145.042382,St Anthony's School,-37.88689,145.0471,True,"[02-CBE--1-T2-C400, 02-CBE--1-T2-C402, 02-CBE-...",20.0,3.648333,23.648333
17,Brighton,3186,"79-81 Asling Street, Brighton, Vic",-37.90241,145.002165,North Brighton Station,256.66757,-37.904698,145.002587,St James' School,-37.89794,144.99766,False,[],24.0,2.161667,26.161667
26,Carnegie,3163,"1256 Glen Huntly Road, Carnegie, Vic",-37.890143,145.047012,Glen Huntly Station,417.646726,-37.889406,145.042356,St Anthony's School,-37.88689,145.0471,False,[],25.0,1.25,26.25
15,Camberwell,3124,"1101 Toorak Road, Camberwell, Vic",-37.847936,145.075152,Hartwell Station,450.452291,-37.843883,145.075426,Hartwell Primary School,-37.84805,145.08447,True,"[02-ALM--1-T5-2800, 02-ALM--1-T5-2802, 02-ALM-...",26.0,2.311667,28.311667
34,Deer Park,3023,"15 LOXWOOD COURT, DEER",-37.780379,144.755361,St Albans Station,5556.139202,-37.744856,144.79979,St Lawrence Catholic Primary School,-37.79068,144.76052,True,"[02-SUY--1-T2-6000, 02-SUY--1-T3-6088, 02-SUY-...",25.0,3.385,28.385
24,Hughesdale,3166,"89 Kangaroo Road, Hughesdale, Vic",-37.898793,145.078095,Hughesdale Station,567.731399,-37.894022,145.07577,Oakleigh Grammar,-37.89591,145.08282,True,"[02-CBE--1-T2-C400, 02-CBE--1-T2-C402, 02-CBE-...",27.0,3.218333,30.218333
25,Chadstone,3148,"2 Kelly Street, Chadstone, Vic",-37.885375,145.093011,Holmesglen Station,1223.90697,-37.874616,145.089963,Salesian College Chadstone,-37.88273,145.10026,False,[],28.0,3.53,31.53


The above table shows top 10 rental listings based on their accessibility scores.

### Visualise top 10 rental listing

The below function uses Open Route Service API to return a GeoJSON file that could be used to visualise the route between stations. 

In [229]:
def get_route(start_lat, start_lng, end_lat, end_lng, api_key, profile='driving-car'):
    #Create an OpenRouteService client instance
    client = openrouteservice.Client(key=api_key)

    #Set up coordinates for the route
    coordinates = [[start_lng, start_lat], [end_lng, end_lat]]

    #Get the route between the coordinates with the specified profile (e.g. can have walking/driving)
    route = client.directions(coordinates=coordinates, profile=profile, format='geojson')

    #Return the route in GeoJSON format
    return route

The below code visualises the rental listings. For each listing it adds a marker to represent the listing (Address in popup), a red train icon at the listing’s nearest station, a green school icon at the listing’s nearest school, a briefcase marker for work_place_coords and routes to represent the travelling time by train from nearest stations to workplace.

In [244]:
# Average lat/lon for center
melbourne_center = [top10_listing['latitude'].mean(), top10_listing['longitude'].mean()]
top_listing_map = folium.Map(location=melbourne_center, zoom_start=11)

# Add coworking space markers
for _, row in top10_listing.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['Address'],
    ).add_to(top_listing_map)
    
    folium.Marker(
        location=[row['nearest_station_lat'], row['nearest_station_lon']],
        popup=row['nearest_station'],
        icon=folium.Icon(color='red', icon='train-subway', prefix='fa')
    ).add_to(top_listing_map)

    folium.Marker(
        location=[row['school_latitude'], row['school_longitude']],
        popup=row['nearest_school'],
        icon=folium.Icon(color='green', icon='school', prefix='fa')
    ).add_to(top_listing_map)

    folium.Marker(
        location=[work_place_coords[0], work_place_coords[1]],
        popup=work_place,
        icon=folium.Icon(color='orange', icon='briefcase', prefix='fa',size=(50,50))
    ).add_to(top_listing_map)

    # Get the route between the coworking space and the nearest public transport station
    transport = get_route(row['nearest_station_lat'], row['nearest_station_lon'], work_place_coords[0], work_place_coords[1], api_key)

    # Add the route to the map
    gj = folium.GeoJson(transport, name="public transport route", style_function=lambda x: {'color': 'blue'})
    folium.Popup(f"Total travel time: {row['train_travel_time']} minutes").add_to(gj)
    gj.add_to(top_listing_map)

# Display the map
top_listing_map


The top 10 rental listings are visualised into a map. When tenants click on each route, the total time to travel by train is also visualised.

**From the map, there are some insights that can be drawn:**

* Where the top 10 cluster
Most listings sit in the inner-east / south-east corridor (Hawthorn, Camberwell, Caulfield, Carnegie, Hughesdale, Chadstone, Brighton). Only Pascoe Vale (north) and Deer Park (outer west) are outliers. Routes visibly converge on the CBD hub, which matches Melbourne’s rail topology.

* Accessibility to the station 
Nine of ten are < 1.3 km from a station. Deer Park is a standout 5.6 km from the nearest metro station (St Albans).

* Direct vs transfer
Most of the top listings have direct route to Southern Cross Station. For example Hawthorn, Pascoe Vale, Caulfield, Camberwell, Deer Park, Hughesdale; For Kooyong, North Brighton, Glen Huntly tenants will have to transfer at Flinders St Station. 

* Commute time ranking 
Fastest routes are both from Hawthorn. with less than 17 minutes to travel
Pascoe Vale is the runner up.
For Caulfield is about 23.6 minutes total travel time

* Schools proximity
Most are very close to a school. Hawthorn/Kooyong are the quickest; Hughesdale/Caulfield a bit longer but still short. 

This use case provides a flexible solution for those who are looking for a place to rent when multiple rental listings can be visualised through a map with the closest train stations, schools and proximity to their workplace. Tenants can also add multiple places of interest like a friend/relative's house and the codes could provide a holistic view of the most suitable listings to their preference. This could save plenty of time from using multiple platforms to calculate travelling time like Google Maps and PTV.

### Conclusion

Melbourne’s rental hunt is really a transport problem in disguise. For families, the daily reality isn’t a single commute; it’s a chain from home to school, and to work via different means of transportation either by car, walking or using public transportation. 

That’s why a tool to calculate combined travel time matters. Those times should include driving to school or station, transfers, and travelling to CBD by train. Measured this way, two homes with the same rent can differ by hours per week in “time cost”, which quickly outweighs small rent differences. By using this tool, tenants can input their workplace address or any other place in interest (like shopping malls, a friend's house or doctor's adress), and have a holistic view of how each rental listing can meet their needs and choose the most suitable one.