<div class="usecase-title">Public Transport Demand </div>

<div class="usecase-authors"><b>Authored by: </b> Taehwan Jung</div>

<div class="usecase-duration"><b>duration:</b> 90 mins</div>
    <div class="usecase-level"><b>Level:</b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite skills: </b>Python, Haversine distance, Folium</div>
    

**Scenario** 

Efficient public transport is essential for reducing congestion and improving urban mobility in Melbourne. This project focuses on analyzing pedestrian data in conjunction with train station and bus stop locations to estimate public transport demand in specific areas and evaluate service efficiency. By identifying patterns in pedestrian activity and correlating these with high-demand areas and times, the city aims to better understand how current services align with actual demand.

**User story**

As a public transportation planner, I want to understand which areas have the highest pedestrian demand for public trasport so that I can collect resources effectively and plan additional services where needed.


At the end of this use case you will: (sholud be filled later)

**Datasets**

**1. Pedestrian Counting per Hour:**
Contains hourly data on the number of pedestrians counted at various sensor locations throughout Melbourne, including sensor IDs, timestamps, and location details.

**2. Bus Stops:** 
shows the locations of the bus stops within the city of Melbourne, As the city of Melbourne do not run the bus services, this is simply to show the locations of the stops, this data does not include the services that run from each of the stops.

**3. Metro train stations with accessbility information:** 
contains locations of train stations and their accessibility information, such as hearing aid information.

## Required modules

In [9]:
import requests
import pandas as pd
import numpy as np
from io import StringIO
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import geopandas as gpd
from folium.plugins import MarkerCluster
import warnings
warnings.filterwarnings('ignore')

## Data loading

In [11]:
# Collect_data function
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'


    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,
        'lang': 'en',
        'timezone': 'UTC',
    }

    response = requests.get(url, params=params)


    if response.status_code == 200:
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')

Data1(Pedestrian counting) loading 

In [13]:
# Pedestrian counting data
dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'
data_1 = collect_data(dataset_id)
data_1.shape

(1991544, 9)

Data2(Bus stops) loading

In [15]:
# Bus stops data
dataset_id_2 = 'bus-stops'
data_2 = collect_data(dataset_id_2)
data_2.shape

(309, 16)

Data3(Train stops) loading 

In [17]:
# Train stations with accessibility data
dataset_id_3 = 'metro-train-stations-with-accessibility-information'
data_3 = collect_data(dataset_id_3)
data_3.shape

(219, 6)

## Data preprocessing

In [19]:
data_1.isna().sum()

id                 0
location_id        0
sensing_date       0
hourday            0
direction_1        0
direction_2        0
pedestriancount    0
sensor_name        0
location           0
dtype: int64

In [20]:
data_2.isna().sum()

geo_point_2d      0
geo_shape         0
prop_id           0
addresspt1        0
addressp_1        0
asset_clas        0
asset_type        0
objectid          0
str_id            0
addresspt         0
asset_subt      309
model_desc        0
mcc_id            0
roadseg_id        0
descriptio        0
model_no          0
dtype: int64

In [21]:
data_3.isna().sum()

geo_point_2d    0
geo_shape       0
he_loop         0
lift            0
pids            0
station         0
dtype: int64

check for missing values in the three datasets

In [23]:
data_2.drop(columns=['asset_subt'], inplace=True)

Remove column with many missing values 

In [25]:
# from object type to pandast datetime type 
data_1['sensing_date'] = pd.to_datetime(data_1['sensing_date'])

# check the data type
print(data_1['sensing_date'].dtypes)

datetime64[ns]


Convert the 'Sensing_Date' column in the pedestrian dataset to Pandas datetime format

In [27]:
# Create columns of 'latitude'and 'longitude' 
# data_1
data_1[['latitude', 'longitude']] = data_1['location'].str.split(',', expand=True)
data_1['latitude'] = pd.to_numeric(data_1['latitude'], errors='coerce')
data_1['longitude'] = pd.to_numeric(data_1['longitude'], errors='coerce')

# data_2
data_2[['latitude', 'longitude']] = data_2['geo_point_2d'].str.split(',', expand=True)
data_2['latitude'] = pd.to_numeric(data_2['latitude'], errors='coerce')
data_2['longitude'] = pd.to_numeric(data_2['longitude'], errors='coerce')

# data_3
data_3[['latitude', 'longitude']] = data_3['geo_point_2d'].str.split(',', expand=True)
data_3['latitude'] = pd.to_numeric(data_3['latitude'], errors='coerce')
data_3['longitude'] = pd.to_numeric(data_3['longitude'], errors='coerce')

Convert the 'location' columns to 'latitude' and 'longitude' in each dataset 

In [29]:
# aggregaring 'pedestriancount' column of data_1 (pedestrian data) 
data_1_1 = data_1
data_1_1['year_month_day'] = data_1['sensing_date'].dt.to_period('D').dt.end_time.dt.date

# aggregating 'pedestriancount' daily  
data_1_daily = data_1_1.groupby(['sensor_name','location_id', 'year_month_day', 'location','latitude','longitude']).agg({
    'pedestriancount': 'sum'
}).reset_index()
data_1_daily.head()

Unnamed: 0,sensor_name,location_id,year_month_day,location,latitude,longitude,pedestriancount
0,261Will_T,108,2022-11-17,"-37.81295822, 144.95678789",-37.812958,144.956788,3993
1,261Will_T,108,2022-11-18,"-37.81295822, 144.95678789",-37.812958,144.956788,11594
2,261Will_T,108,2022-11-19,"-37.81295822, 144.95678789",-37.812958,144.956788,3200
3,261Will_T,108,2022-11-20,"-37.81295822, 144.95678789",-37.812958,144.956788,2855
4,261Will_T,108,2022-11-21,"-37.81295822, 144.95678789",-37.812958,144.956788,11506


Aggregate the pedestrian data daily by summing the 'pedestriancount' column

In [31]:
# filtering necessary columns
data_1_hourly = data_1_1[['sensor_name','year_month_day', 'location_id', 'hourday', 'location', 'pedestriancount','latitude','longitude']]

# sorting by sensor_name
data_1_hourly = data_1_hourly.sort_values(by=['sensor_name','year_month_day','hourday','latitude','longitude']).reset_index(drop=True)

# check result
data_1_hourly.head(10)

Unnamed: 0,sensor_name,year_month_day,location_id,hourday,location,pedestriancount,latitude,longitude
0,261Will_T,2022-11-17,108,16,"-37.81295822, 144.95678789",312,-37.812958,144.956788
1,261Will_T,2022-11-17,108,17,"-37.81295822, 144.95678789",1692,-37.812958,144.956788
2,261Will_T,2022-11-17,108,18,"-37.81295822, 144.95678789",758,-37.812958,144.956788
3,261Will_T,2022-11-17,108,19,"-37.81295822, 144.95678789",436,-37.812958,144.956788
4,261Will_T,2022-11-17,108,20,"-37.81295822, 144.95678789",264,-37.812958,144.956788
5,261Will_T,2022-11-17,108,21,"-37.81295822, 144.95678789",228,-37.812958,144.956788
6,261Will_T,2022-11-17,108,22,"-37.81295822, 144.95678789",218,-37.812958,144.956788
7,261Will_T,2022-11-17,108,23,"-37.81295822, 144.95678789",85,-37.812958,144.956788
8,261Will_T,2022-11-18,108,0,"-37.81295822, 144.95678789",55,-37.812958,144.956788
9,261Will_T,2022-11-18,108,1,"-37.81295822, 144.95678789",36,-37.812958,144.956788


Select only the necessary columns from the pedestrian data

In [33]:
# filtering to visualize sensor location only
data_1_map = data_1.drop_duplicates(subset='sensor_name')

Create a dataframe for visualization

## Data visualization

In [38]:
# Creating a map providing the location information of the three datasets

map_melbourne = folium.Map(location=[-37.80841814,144.95906317], zoom_start=14, width=1000, height=600, control_scale=True)
# Pedestrian data
for index, row in data_1_map.iterrows():
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=5,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=f"Location: {row['sensor_name']}<br>Count: {row['pedestriancount']}",
    ).add_to(map_melbourne)

# Train station data
for index, row in data_2.iterrows():
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=5,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.6,
        popup=f"Location: {row['objectid']}",
    ).add_to(map_melbourne)

# Bus stop data
for index, row in data_3.iterrows():
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=5,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.6,
        popup=f"Location: {row['station']}",
    ).add_to(map_melbourne)

# Adding a legend
legend_html = '''
<div style="position: fixed;
     top: 10px; left: 10px; width: 150px; height: 90px;
     border:2px solid grey; z-index:9999; font-size:14px;
     background-color: white; padding: 10px;">
     <b>Legend</b><br>
     <i style="color:blue;" class="fa fa-circle"></i> Ped sensors<br>
     <i style="color:red;" class="fa fa-circle"></i> Bus stops<br>
     <i style="color:green;" class="fa fa-circle"></i> Train stations</div>
'''
map_melbourne.get_root().html.add_child(folium.Element(legend_html))

# Map displaying
map_melbourne.save('melbourne_map.html')
display(map_melbourne)