<div class="usecase-title">New Drinking Fountains in Melbourne with Pedestrian Traffic Data</div>

<div class="usecase-authors"><b>Authored by: </b>Katrine Chan</div>

<div class="usecase-duration"><b>Duration:</b> 60 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python </div>
</div>

<div class="usecase-section-header"><i><b>User Story</i></b></div>

The City of Melbourne wants to make sure everyone has easy access to drinking water, especially in busy areas. To do this, we’re using spatial analysis to figure out where new drinking fountains would be most helpful based on where people are walking. By analyzing pedestrian traffic and calculating distances between current fountain locations and high-traffic areas, we can identify spots to install new fountains. This will help keep everyone hydrated and make it more convenient for people out and about. Our goal is to make sure no one has to go too far for a drink, improving overall comfort and well-being in the city.

<div class="usecase-section-header"><i><b>Scenario</i></b></div>

As a Community Planner at the City of Melbourne, I’m focused on improving access to drinking water across the city. By analyzing pedestrian traffic data and current drinking fountain locations, I’m using the Haversine formula to identify the best spots for new fountains. This approach helps ensure that busy areas have enough access to drinking water, making it easier for everyone to stay hydrated while out and about. My goal is to make informed decisions about where to place these new fountains to enhance public convenience and overall community well-being.

At the end of this use case you will:
* Gained an understanding of how to use APIs
* Learned to fetch datasets from the Melbourne Open Data Database via APIs
* Become familiar with data pre-processing techniques
* Learned to visualize real-world data using appropriate visualization tools
* Gained experience in working with multiple datasets
* Acquired skills in using Folium for mapping
* Learned to perform spatial analysis using the Haversine Formula


<div class="usecase-section-header"><i><b>Introduction</i></b></div>

To complete this analysis, two datasets will be used.  These datasets will include below: 

* drinking-fountains - This dataset holds the description, type and geographical location of all drinking fountains in the City of Melbourne 

* Pedestrian Counting System (counts per hour) - This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city.  The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throuhgout the day. 

# Table of Contents

* [Part 1 - Importing Required Modules](#part1)
* [Part 2 - Retrieving Pedestrian Counting System Data Set](#part2)
* [Part 3 - Understanding and Pre-Processing Pedestrian Counting System Data Set pedestrian_df](#part3)
* [Part 4 - Exploratory Data Analysis of pedestrian_df](#part4)
* [Part 5 - Retrieving Drinking Fountain Data Set](#part5)
* [Part 6 - Understanding and Pre-Processing Drinking Fountain fountain_df](#part6)
* [Part 7 - Exploratory Data Analysis drinking_df](#part7)
* [Part 8 - Combined Map with Heatmap of Pedestrian Traffic and Drinking Fountain Locations](#part8)
* [Part 9 - Proposed New Water Fountain Locations](#part9)
* [Part 10 - Conclusion](#part10)
* [Part 11 - Reference](#part11)

<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part1">Part 1 - Importing Required Modules</p> </a>

In [1]:
# importing required modules to complete this analysis

import requests 
import pandas as pd 
pd.set_option('display.precision', 15)
import numpy as np 
from io import StringIO 

import matplotlib.pyplot as plt 
import folium
from folium.plugins import HeatMap
from folium.plugins import MousePosition



<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part2">Part 2 - Retrieving Pedestrian Counting System Data Set</p></a>

In [2]:
# https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/
dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'

base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = ''
dataset_id = dataset_id
format = 'csv'

url = f'{base_url}{dataset_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1, # all records
    'lang': 'en',
    'timezone': 'UTC'
    #'api_key': apikey
}

#GET request 
response = requests.get(url, params = params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    df = pd.read_csv(StringIO(url_content), delimiter= ';')
else: 
    print(f'Request failed with status code{response.status_code}')

In [3]:
pedestrian_df = df.copy()

# printing out the first 5 lines of pedestrian_df
pedestrian_df.head(5)

Unnamed: 0,id,location_id,sensing_date,hourday,direction_1,direction_2,pedestriancount,sensor_name,location
0,421820211114,42,2021-11-14,18,25,24,49,UM1_T,"-37.80008566, 144.96386412"
1,461320231124,46,2023-11-24,13,88,101,189,Pel147_T,"-37.80240719, 144.9615673"
2,25420220305,25,2022-03-05,4,28,14,42,MCEC_T,"-37.82401776, 144.95604426"
3,30320240804,30,2024-08-04,3,141,203,344,Lon189_T,"-37.8112185, 144.96656806"
4,751120240108,75,2024-01-08,11,26,17,43,SprFli_T,"-37.81515276, 144.97467661"


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part3">Part 3 - Understanding and Pre-Processing Pedestrian Counting System Data Set pedestrian_df</p></a>

In [4]:
# Understanding the shape of pedestrian_df
num_rows, num_columns = pedestrian_df.shape

print(f'The Pedestrian Counting System Data Frame has {num_rows} rows and {num_columns} columns.')

The Pedestrian Counting System Data Frame has 1850393 rows and 9 columns.


In [5]:
# Displaying Summary information of pedestrian_df dataframe
pedestrian_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1850393 entries, 0 to 1850392
Data columns (total 9 columns):
 #   Column           Dtype 
---  ------           ----- 
 0   id               int64 
 1   location_id      int64 
 2   sensing_date     object
 3   hourday          int64 
 4   direction_1      int64 
 5   direction_2      int64 
 6   pedestriancount  int64 
 7   sensor_name      object
 8   location         object
dtypes: int64(6), object(3)
memory usage: 127.1+ MB


In [6]:
# Understanding how many NA value in each of the features 
pedestrian_df.isnull().sum()

id                 0
location_id        0
sensing_date       0
hourday            0
direction_1        0
direction_2        0
pedestriancount    0
sensor_name        0
location           0
dtype: int64

There are no null values in the pedestrian_df. 

<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part4">Part 4 - Exploratory Data Analysis of pedestrian_df</p></a>


In [7]:
# pedestriancount is the sum of direction_1 and direction_2 and therefore direction_1 and direction_2 will be removed

# Remove the 'direction_1' and 'direction_2' columns from the DataFrame
pedestrian_df = pedestrian_df.drop(['direction_1', 'direction_2'], axis=1)

# Display pedestrian_df to confirm the changes
pedestrian_df.head()

Unnamed: 0,id,location_id,sensing_date,hourday,pedestriancount,sensor_name,location
0,421820211114,42,2021-11-14,18,49,UM1_T,"-37.80008566, 144.96386412"
1,461320231124,46,2023-11-24,13,189,Pel147_T,"-37.80240719, 144.9615673"
2,25420220305,25,2022-03-05,4,42,MCEC_T,"-37.82401776, 144.95604426"
3,30320240804,30,2024-08-04,3,344,Lon189_T,"-37.8112185, 144.96656806"
4,751120240108,75,2024-01-08,11,43,SprFli_T,"-37.81515276, 144.97467661"


In [8]:
# Split the 'location' column into 'lat' and 'lon' columns
pedestrian_df[['lat', 'lon']] = pedestrian_df['location'].str.split(',', expand=True).astype(float)

# Remove the original 'location' column
pedestrian_df = pedestrian_df.drop('location', axis=1)

# Display to confirm the changes
pedestrian_df.head()


Unnamed: 0,id,location_id,sensing_date,hourday,pedestriancount,sensor_name,lat,lon
0,421820211114,42,2021-11-14,18,49,UM1_T,-37.80008566,144.96386412
1,461320231124,46,2023-11-24,13,189,Pel147_T,-37.80240719,144.9615673
2,25420220305,25,2022-03-05,4,42,MCEC_T,-37.82401776,144.95604426
3,30320240804,30,2024-08-04,3,344,Lon189_T,-37.8112185,144.96656806
4,751120240108,75,2024-01-08,11,43,SprFli_T,-37.81515276,144.97467661


In [9]:
# Calculate and print the number of unique pedestrian sensors in the dataset

unique_sensors = pedestrian_df['sensor_name'].nunique()
print(f'There are {unique_sensors} unique sensor in the data set.')


There are 88 unique sensor in the data set.


In [10]:
# As the dataset is quite large, pedestriancount is aggregated by sensor_name to reduce computational load

aggregated_pedestrian_df = pedestrian_df.groupby(['sensor_name', 'lat', 'lon'], as_index=False).agg({'pedestriancount': 'sum'})

# View the resulting DataFrame
aggregated_pedestrian_df.head()


Unnamed: 0,sensor_name,lat,lon,pedestriancount
0,261Will_T,-37.81295822,144.95678789,4654198
1,280Will_T,-37.81246271,144.95690188,2118083
2,474Fl_T,-37.81997273,144.95834911,1255299
3,488Mac_T,-37.79432415,144.92973378,1973997
4,574Qub_T,-37.80309992,144.94908064,1302639


In [11]:
# Checking if there are any NA values in each of the features 
aggregated_pedestrian_df.isnull().sum()

sensor_name        0
lat                0
lon                0
pedestriancount    0
dtype: int64

In [12]:
# Creating a heat map to visualize the weight of pedestrian counts

# Prepare data for the heat map directly from aggregated_pedestrian_df
heat_data = [[row['lat'], row['lon'], row['pedestriancount']] for index, row in aggregated_pedestrian_df.iterrows()]

# Create a Folium map centered at an approximate central location
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=14)

# Add the HeatMap layer to the map
HeatMap(heat_data, radius=30).add_to(melbourne_map)

# Display the map
melbourne_map


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part5">Part 5 - Retrieving Drinking Fountain Data Set</p></a>



In [13]:
# Retrieving Helping Out Data Set from Melbourne Open Data

# https://data.melbourne.vic.gov.au/explore/dataset/drinking-fountains/information/
dataset_id = 'drinking-fountains'

base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = ''
dataset_id = dataset_id
format = 'csv'

url = f'{base_url}{dataset_id}/exports/{format}'
params = {
    'select': '*',
    'limit': -1, # all records
    'lang': 'en',
    'timezone': 'UTC'
    #'api_key': apikey
}

#GET request 
response = requests.get(url, params = params)

if response.status_code == 200:
    # StringIO to read the CSV data
    url_content = response.content.decode('utf-8')
    fountain_df = pd.read_csv(StringIO(url_content), delimiter= ';')
else: 
    print(f'Request failed with status code{response.status_code}')

In [14]:
# Printing out fountain_df
fountain_df.head()

Unnamed: 0,description,co_ordinates,lat,lon
0,Drinking Fountain - Stainless Steel Drinking F...,"-37.82210994675337, 144.93666205920204",-37.822109946753365,144.93666205920204
1,Drinking Fountain - Leaf Type - With Bottle Re...,"-37.81043106640399, 144.95558395492208",-37.81043106640399,144.95558395492208
2,Drinking Fountain - Leaf Type - Dog Bowl - Un...,"-37.80089398503696, 144.96074870882546",-37.80089398503696,144.96074870882546
3,Drinking Fountain - Leaf Type - Dog Bowl - JJ...,"-37.79841970759794, 144.92421993826414",-37.79841970759794,144.92421993826414
4,Drinking Fountain - Leaf Type - Dog Bowl - Pr...,"-37.7914165845557, 144.96125460876374",-37.7914165845557,144.9612546087637


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part6">Part 6 - Understanding and Pre-Processing Drinking Fountain fountain_df</p></a>

In [15]:
# Understanding the shape of fountain_df
num_rows, num_columns = fountain_df.shape

print(f'The Drinking Fountain fountain_df has {num_rows} rows and {num_columns} columns.')

The Drinking Fountain fountain_df has 302 rows and 4 columns.


In [16]:
# Displaying Summary information of fountain_df dataframe
fountain_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302 entries, 0 to 301
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   description   302 non-null    object 
 1   co_ordinates  302 non-null    object 
 2   lat           302 non-null    float64
 3   lon           302 non-null    float64
dtypes: float64(2), object(2)
memory usage: 9.6+ KB


In [17]:
# Checking again if there are any NA value in each of the features 
fountain_df.isnull().sum()

description     0
co_ordinates    0
lat             0
lon             0
dtype: int64

There are 302 registered fountains in the City of Melbourne, and there are no missing data in the fountain_df. 

<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part7">Part 7 - Exploratory Data Analysis drinking_df</p></a>

In [18]:
# Printing out some description to gain more understanding 

# Set display options to show more content
pd.set_option('display.max_colwidth', None)  # No truncation of column contents

# Print the first few descriptions with full content
print(fountain_df['description'].head(50))


0                    Drinking Fountain - Stainless Steel Drinking Fountain - Leaf Type - Wharfs Landing Park
1                                 Drinking Fountain - Leaf Type - With Bottle Refill Tap - Flagstaff Gardens
2                                              Drinking Fountain - Leaf Type - Dog Bowl  - University Square
3                                                Drinking Fountain - Leaf Type - Dog Bowl  - JJ Holland Park
4                                                   Drinking Fountain - Leaf Type - Dog Bowl  - Princes Park
5                                                     Drinking Fountain - Leaf Type - Dog Bowl  - Royal Park
6                                              Drinking Fountain - Leaf Type - Dog Bowl  - Alexandra Gardens
7                      Drinking Fountain - Stainless Steel Drinking Fountain - Leaf Type - Bottle Refill Tap
8                           Drinking Fountain - Stainless Steel Drinking Fountain - Leaf Type - Princes Park
9                  

In [19]:
# Checking if all description lines start with "Drinking fountain" in upper or lower case 

prefix = 'drinking fountain -'
matching_entries = fountain_df['description'].str.lower().str.startswith(prefix)

# Count the number of matching entries
count_matching = matching_entries.sum()

# Print the count
print(f"Number of entries starting with '{prefix}': {count_matching}")

Number of entries starting with 'drinking fountain -': 302


As all entries started with "drinking fountain" these will now be removed

In [20]:
# Remove the first 19 characters from each entry in the 'description' column, this ensure if "drinking fountain" was repeated in an entry, these aren't removed. 
fountain_df['description'] = fountain_df['description'].str[19:]

# Print the updated DataFrame to verify the changes
print(fountain_df['description'].head(50))

0                     Stainless Steel Drinking Fountain - Leaf Type - Wharfs Landing Park
1                                  Leaf Type - With Bottle Refill Tap - Flagstaff Gardens
2                                               Leaf Type - Dog Bowl  - University Square
3                                                 Leaf Type - Dog Bowl  - JJ Holland Park
4                                                    Leaf Type - Dog Bowl  - Princes Park
5                                                      Leaf Type - Dog Bowl  - Royal Park
6                                               Leaf Type - Dog Bowl  - Alexandra Gardens
7                       Stainless Steel Drinking Fountain - Leaf Type - Bottle Refill Tap
8                            Stainless Steel Drinking Fountain - Leaf Type - Princes Park
9                                 Leaf Type - Dog Bowl  - Canning & Palmerston St Reserve
10                                 Leaf Type - With Bottle Refill Tap - Alexandra Gardens
11        

In [21]:
# Even though lat and lon are available, the information from co-ordinates column is more detailed and there this will be split and used as lon and lat. 

# Remove the 'lat' and 'lon' columns from the DataFrame
fountain_df = fountain_df.drop(columns=['lat', 'lon'])

# Print the updated DataFrame to verify the changes
fountain_df.head()



Unnamed: 0,description,co_ordinates
0,Stainless Steel Drinking Fountain - Leaf Type - Wharfs Landing Park,"-37.82210994675337, 144.93666205920204"
1,Leaf Type - With Bottle Refill Tap - Flagstaff Gardens,"-37.81043106640399, 144.95558395492208"
2,Leaf Type - Dog Bowl - University Square,"-37.80089398503696, 144.96074870882546"
3,Leaf Type - Dog Bowl - JJ Holland Park,"-37.79841970759794, 144.92421993826414"
4,Leaf Type - Dog Bowl - Princes Park,"-37.7914165845557, 144.96125460876374"


In [22]:
# Split the 'co-ordinates' column into 'lat' and 'lon' columns
fountain_df[['lat', 'lon']] = fountain_df['co_ordinates'].str.split(',', expand=True)

# Convert the 'lat' and 'lon' columns to numeric types with full precision
fountain_df['lat'] = pd.to_numeric(fountain_df['lat'].str.strip(), errors='coerce')
fountain_df['lon'] = pd.to_numeric(fountain_df['lon'].str.strip(), errors='coerce')

fountain_df = fountain_df.drop(columns=['co_ordinates'])

# Print the updated DataFrame to verify the changes
fountain_df.head()


Unnamed: 0,description,lat,lon
0,Stainless Steel Drinking Fountain - Leaf Type - Wharfs Landing Park,-37.822109946753365,144.93666205920204
1,Leaf Type - With Bottle Refill Tap - Flagstaff Gardens,-37.81043106640399,144.95558395492208
2,Leaf Type - Dog Bowl - University Square,-37.80089398503696,144.96074870882546
3,Leaf Type - Dog Bowl - JJ Holland Park,-37.79841970759794,144.92421993826414
4,Leaf Type - Dog Bowl - Princes Park,-37.7914165845557,144.9612546087637


In [23]:
# Creating Folium Map to show Fountain locations on a map

# Function to create a map centered at a specific location (Melbourne)
def create_map():
    return folium.Map(location=[-37.81534, 144.97215], zoom_start=13)

# Function to add markers to the map with a water droplet icon
def add_fountain_markers_with_emoji_icon(data, map_obj):
    for index, row in data.iterrows():
        latitude = row['lat']
        longitude = row['lon']
        description = row['description']
        
        # Tooltip text shows the description of the fountain
        tooltip_text = f'<b>Drinking Fountain:</b> {description}'

        # Folium's Icon with a water droplet as marker icon
        droplet_icon = folium.Icon(
            icon='tint',  
            prefix='fa',  
            color='lightblue'  
        )
        
        # Adding marker for each fountain with the droplet icon
        folium.Marker(
            location=[latitude, longitude],  
            tooltip=tooltip_text, 
            icon=droplet_icon  
        ).add_to(map_obj)

# Function to create the map and add the custom markers
def generate_fountain_map_with_emoji_icon(data):
    fountain_map = create_map()
    
    # Add markers for each drinking fountain using the water droplet icon
    add_fountain_markers_with_emoji_icon(data, fountain_map)
    
    # Display the map
    return fountain_map

# Function to generate the map with the drinking fountain data
fountain_map = generate_fountain_map_with_emoji_icon(fountain_df)

fountain_map


The map above displays the locations of all registered drinking fountains. Although some descriptions mention a dog bowl, it is assumed that these drinking fountains include a dog bowl as part of their design, rather than being standalone units specifically for dogs.


<img src="https://draffin.com.au/wp-content/uploads/2021/01/Perth-Drink-Fountain-600x600-1.jpg" alt="Leaf Type Drinking Fountain" width="300"/>


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part8">Part 8 - Combined Map with Heatmap of Pedestrian Traffic and Drinking Fountain Locations</p></a>

In [24]:
# Function to create a map centered at a specific location (Melbourne)
def create_map():
    return folium.Map(location=[-37.81534, 144.97215], zoom_start=14)

# Function to add the HeatMap layer, showing pedestriancount
def add_heatmap(data, map_obj):
    heat_data = [[row['lat'], row['lon'], row['pedestriancount']] for index, row in data.iterrows()]
    HeatMap(heat_data, radius=30).add_to(map_obj)

# Function to add dot markers to the map representing fountains
def add_fountain_markers_as_dots(data, map_obj):
    for index, row in data.iterrows():
        latitude = row['lat']
        longitude = row['lon']
        description = row['description']
        
        # Tooltip text shows the description of the fountain
        tooltip_text = f'<b>Drinking Fountain:</b> {description}'
        
        # Adding a circle marker for each fountain with a dot to avoid obscuring the heatmap
        folium.CircleMarker(
            location=[latitude, longitude],
            radius=2,  # Adjust the radius size as needed
            color='black',  # Color of the dot
            fill=True,
            fill_color='blue',  # Fill color of the dot
            fill_opacity=0.7,  # Opacity of the fill color
            tooltip=tooltip_text
        ).add_to(map_obj)

# Function to generate the combined map
def generate_combined_map(heat_data, fountain_data):
    melbourne_map = create_map()
    
    # Add the HeatMap layer
    add_heatmap(heat_data, melbourne_map)
    
    # Add dot markers for each drinking fountain
    add_fountain_markers_as_dots(fountain_data, melbourne_map)
    
    # Display the map
    return melbourne_map

# Prepare heatmap data from aggregated_pedestrian_df
heat_data = [[row['lat'], row['lon'], row['pedestriancount']] for index, row in aggregated_pedestrian_df.iterrows()]

# Generate the combined map with both the heat map and dot markers
melbourne_map = generate_combined_map(aggregated_pedestrian_df, fountain_df)

# Display the map
melbourne_map

<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part9">Part 9 - Proposed New Water Fountain Locations</p></a>

In [25]:
# Haversine function to calculate the distance between two points
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Radius of the Earth in kilometers
    phi1, phi2 = np.radians(lat1), np.radians(lat2) # Convert latitudes to radians
    delta_phi = np.radians(lat2 - lat1) # Difference in latitudes
    delta_lambda = np.radians(lon2 - lon1) # Difference in longitudes
    
    # Haversine formula to calculate the distance
    a = np.sin(delta_phi / 2) ** 2 + np.cos(phi1) * np.cos(phi2) * np.sin(delta_lambda / 2) ** 2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    
    return R * c  # Output distance in kilometers

# Function to calculate the nearest fountain for a given pedestrian sensor
def find_nearest_fountain(sensor_lat, sensor_lon, fountain_df):
    distances = fountain_df.apply(
        lambda row: haversine(sensor_lat, sensor_lon, row['lat'], row['lon']), axis=1)
    return distances.min()

# Calculate the distance to the nearest fountain for all sensors in aggregated_pedestrian_df
aggregated_pedestrian_df['distance_to_nearest_fountain'] = aggregated_pedestrian_df.apply(
    lambda row: find_nearest_fountain(row['lat'], row['lon'], fountain_df), axis=1)

# Show aggregated_pedestrian_df with new distance_to_nearest_fountain column
aggregated_pedestrian_df.head()



Unnamed: 0,sensor_name,lat,lon,pedestriancount,distance_to_nearest_fountain
0,261Will_T,-37.81295822,144.95678789,4654198,0.040205106360729
1,280Will_T,-37.81246271,144.95690188,2118083,0.0184025763483
2,474Fl_T,-37.81997273,144.95834911,1255299,0.116260471244756
3,488Mac_T,-37.79432415,144.92973378,1973997,0.049405869297088
4,574Qub_T,-37.80309992,144.94908064,1302639,0.062858468684246


In [26]:
# Define the conditions for needing a fountain
def label_fountain_needed(row):
    if row['distance_to_nearest_fountain'] > 0.15 and row['pedestriancount'] > 500000:  # 0.15 km = 150 meters and 500000 is roughly 340 pedestrian a day.  
        return 1  # Fountain needed
    else:
        return 0  # Fountain not needed

# Apply the function to create a new label column
aggregated_pedestrian_df['fountain_needed'] = aggregated_pedestrian_df.apply(label_fountain_needed, axis=1)

# Filter the DataFrame to get rows where 'fountain_needed' is 1
locations_needing_fountains = aggregated_pedestrian_df[aggregated_pedestrian_df['fountain_needed'] == 1]

# Print the filtered DataFrame
locations_needing_fountains


Unnamed: 0,sensor_name,lat,lon,pedestriancount,distance_to_nearest_fountain,fountain_needed
6,AG_T,-37.8199817,144.96872865,8229594,0.15137235079835,1
7,AlfPl_T,-37.81379749,144.96995745,2870401,0.164306131995831,1
15,Bou688_T,-37.81686075,144.95358075,14169431,0.168983051446001,1
25,Col700_T,-37.81982992,144.95102555,9811800,0.257058648902335,1
41,Fra118_T,-37.80841815,144.95906316,2981928,0.165401383818882,1
42,Hammer1584_T,-37.81970749,144.96795734,3316151,0.213579180373168,1
46,King2_T,-37.82009057,144.95758725,1293224,0.180033516951586,1
47,King335_T,-37.81267639,144.95386444,1457233,0.175940696302251,1
57,MCEC_T,-37.82401776,144.95604426,15000209,0.207749520682833,1
69,Spen161_T,-37.8172861,144.95319102,3925633,0.219366216082724,1


In [27]:
# Function to create a map centered at a specific location (Melbourne)
def create_map():
    return folium.Map(location=[-37.81534, 144.97215], zoom_start=14)

# Function to add markers to the map for locations needing fountains
def add_fountain_markers(data, map_obj):
    for index, row in data.iterrows():
        latitude = row['lat']
        longitude = row['lon']
        pedestrian_count = row['pedestriancount']  # Show pedestrian count as tooltip
        distance_to_fountain_km = row['distance_to_nearest_fountain']  # Distance in kilometers
        
        # Convert distance to meters
        distance_to_fountain_m = distance_to_fountain_km * 1000
        
        # Tooltip text shows the pedestrian count and distance to the nearest fountain in meters
        tooltip_text = f'<b>Pedestrian Count:</b> {pedestrian_count}<br><b>Distance to Nearest Fountain:</b> {distance_to_fountain_m:.0f} meters'
        
        # Add a marker 
        folium.Marker(
            location=[latitude, longitude],
            tooltip=tooltip_text,
            icon=folium.Icon(icon='circle', color='blue', icon_color='white') 
        ).add_to(map_obj)

# Function to add a heat map layer to the map
def add_heat_map(data, map_obj):
    # Prepare heat map data: [[lat, lon, weight], ...]
    heat_data = [[row['lat'], row['lon'], row['pedestriancount']] for index, row in data.iterrows()]
    
    # Add the heatmap layer
    HeatMap(heat_data, radius=15).add_to(map_obj)

# Create the map
fountain_map = create_map()

# Add markers and heat map
add_fountain_markers(locations_needing_fountains, fountain_map)
add_heat_map(locations_needing_fountains, fountain_map)

# Display the map
fountain_map


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part10">Part 10 - Conclusion </p></a>


In this analysis, we found that most high pedestrian traffic areas in Melbourne are well-served with water fountains located within 150 meters. However, there are 13 high-traffic areas where no nearby fountains are present. To enhance public convenience and align with the City of Melbourne’s wellbeing strategy, which aims to promote healthy and sustainable lifestyles, it is recommended to install water fountains in these identified gaps. This will ensure that all busy areas have adequate access to drinking water, supporting the city's commitment to improving overall community wellbeing.


<p style="font-weight: bold; font-size: 1.2em;"><a calss="anchor" id="part11">Part 11  - Reference </p></a>

https://data.melbourne.vic.gov.au/pages/home/

https://wellsr.com/python/plotting-geographical-heatmaps-with-python-folium-module/<br>
https://www.kaggle.com/code/daveianhickey/how-to-folium-for-maps-heatmaps-time-data<br>
https://medium.com/@vinodvidhole/interesting-heatmaps-using-python-folium-ee41b118a996<br>

https://www.kaggle.com/code/muhammadtalharamzan/notebook7874e4651c<br>
https://medium.com/@vageesh/interactive-map-of-dams-in-tamil-nadu-using-folium-2feb19873740<br>
https://www.kaggle.com/code/bhanvimenghani/folium-chai-eda?scriptVersionId=41717168<br>

https://stackoverflow.com/questions/62617348/how-to-insert-image-from-url-in-jupyter-notebook-markdown

https://stackoverflow.com/questions/77668692/distance-of-haversine<br>
https://stackoverflow.com/questions/67146477/haversine-function-using-pandas-data-frame<br>
https://stackoverflow.com/questions/29545704/fast-haversine-approximation-python-pandas<br>
https://www.geeksforgeeks.org/haversine-formula-to-find-distance-between-two-points-on-a-sphere/<br>
