<div class="usecase-title"><h3>Route Planner</h3></div>

<div class="usecase-authors"><b>Authored by: </b> Chathumini Satharasinghe</div>

<div class="usecase-authors"><b>Date: </b> T2 2024 (July - October)</div>



<div class="usecase-duration"><b>Duration:</b> 270 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Beginner/ Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python, Power BI, Machine Learning</div>
</div>

<div class="usecase-section-header"><h3>Scenario<h3></div>

**As a** member of the local city council's urban planning department focused on Transport & Safety,

**I want to** determine the variables that cause an increase or decrease in the amount of waste collected at Degraves, analyzing factors such as time of year, events, and the number of people in the city (based on parking occupancy data from 2016),

**So that I can** identify patterns and trends in waste generation and collection, specifically understanding how seasonal variations, local events, and population density impact waste collection requirements.

**This will help** in making informed decisions about optimizing waste collection routes, schedules, and resource allocation to improve operational efficiency, reduce environmental impact, and maintain cleanliness in urban areas.

**By** utilizing datasets on footpath steepness and tree canopies along with parking occupancy data, I hope to present this analysis through interactive mapping tools and data visualization techniques, ensuring that stakeholders and the public are well-informed about the factors affecting waste management. This will support the planning and execution of more effective waste collection strategies based on historical and current data

<div class="usecase-section-header"><h3>Skills Demonstrated in the Waste Collection Analysis Project</h3></div>


<ul>
  <li>
    <h5>1. Data Collection and Preparation</h5>
    <ul>
      <li>
        Data Acquisition: Successfully collected datasets on footpath steepness and tree canopies, utilizing APIs and web scraping techniques to gather relevant information for analysis.
      </li>
      <li>
        Data Cleaning: Employed Pandas for data cleaning and preparation, ensuring data integrity and consistency for accurate analysis.
      </li>
    </ul>
  </li>
  <li>
    <h5>2. Data Analysis</h5>
    <ul>
      <li>
        Statistical Analysis: Conducted correlation analysis and hypothesis testing to identify relationships between waste collection and various factors such as time of year, events, and parking occupancy.
      </li>
      <li>
        Predictive Modeling: Developed predictive models using ARIMA and regression analysis to forecast waste collection trends and quantify the impact of identified variables.
      </li>
    </ul>
  </li>
  <li>
    <h5>3. Data Visualization</h5>
    <ul>
      <li>
        Interactive Visualization: Created interactive maps using Folium to visualize the spatial distribution of footpath steepness and tree canopies, enhancing the understanding of their impact on waste collection.
      </li>
      <li>
        Temporal Analysis: Incorporated time series analysis to visualize temporal changes in waste collection patterns, providing insights into seasonal and event-based variations.
      </li>
    </ul>
  </li>
  <li>
    <h5>4. Programming and Technology</h5>
    <ul>
      <li>
        Python Programming: Applied advanced Python programming skills for data manipulation, statistical analysis, and visualization, demonstrating strong technical proficiency.
      </li>
      <li>
        Use of Libraries: Leveraged various Python libraries such as Pandas, GeoPandas, Folium, and Statsmodels to enhance data analysis and visualization capabilities.
      </li>
    </ul>
  </li>
  <li>
    <h5>5. Critical Thinking and Problem Solving</h5>
    <ul>
      <li>
        Problem Identification: Identified key factors influencing waste collection and designed analytical approaches to address these factors effectively.
      </li>
      <li>
        Solution Implementation: Implemented optimized waste collection routes and schedules based on data-driven insights, improving operational efficiency and reducing environmental impact.
      </li>
    </ul>
  </li>
  <li>
    <h5>6. Communication and Presentation</h5>
    <ul>
      <li>
        Clear Data Presentation: Designed user-friendly visualizations and interactive maps that make complex data accessible and understandable to a broad audience.
      </li>
      <li>
        Documentation and Reporting: Prepared comprehensive reports and well-documented code, explaining the methodology, tools used, and insights derived from the data analysis.
      </li>
    </ul>
  </li>
</ul>

<p>These skills demonstrate a well-rounded ability to handle, analyze, and present data related to urban waste collection, supporting informed decision-making in urban planning and environmental management, showcasing both technical proficiency and analytical acumen.</p>

<div class="usecase-section-header"><h3>Project Goal: Waste Collection Optimization in Degraves, Melbourne</h3></div>

<p>Determine the variables that influence the amount of waste collected at Degraves in Melbourne by analyzing factors such as the time of year, events, and the number of people in the city (based on parking occupancy data from 2016). This study aims to identify patterns and trends in waste generation to improve waste collection routes, schedules, and resource allocation.</p>
<h4>Key Activities</h4>
<ul>
  <li>Data Collection: Gather data on footpath steepness, tree canopies, and parking occupancy to understand the environmental and population factors affecting waste collection.</li>
  <li>Data Analysis: Conduct statistical and predictive analyses to identify correlations and causations between waste collection and the collected variables.</li>
  <li>Model Development: Develop predictive models to forecast waste collection needs based on identified variables, optimizing waste management practices.</li>
  <li>Recommendation Development: Based on the analysis, recommend optimized waste collection routes and schedules to improve efficiency and reduce environmental impact.</li>
</ul>

<h4>Expected Outcome</h4>
<p>A comprehensive report detailing the factors affecting waste collection in Degraves, with actionable recommendations for optimizing waste management practices. This will help improve operational efficiency, reduce environmental impact, and maintain urban cleanliness.</p>

<h4>Impact</h4>
<p>This project contributes to the sustainable management of urban waste, supporting cleaner and more efficient cities. By optimizing waste collection practices based on data-driven insights, we can reduce resource usage and enhance the overall quality of urban environments.</p>

<h4>Datasets</h4>
<div class="usecase-subsection-blurb">
  <i>1. Footpath Steepness </i> 
  <br>
  <a href="https://data.melbourne.vic.gov.au/explore/dataset/footpath-steepness/information/?location=18,-37.81515,144.95599&basemap=mbs-7a7333" target="_blank">Link to Dataset 1</a>
  <br>
</div>
<br>
<div class="usecase-subsection-blurb">
  <i>2. Tree Canopies 2021 Urban Forest </i> 
  <br>
  <a href="https://data.melbourne.vic.gov.au/explore/dataset/tree-canopies-2021-urban-forest/map/?location=17,-37.81489,144.95893&basemap=mbs-7a7333" target="_blank">Link to Dataset 2</a>
  <br>
</div>

## 1.Installing all necessasory packages (optional)

In [133]:

# !pip install requests pandas matplotlib-venn folium geopandas seaborn statsmodels

!pip install Folium




## 2.Importing all the necessory libraries

In [126]:
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
import folium
from folium.plugins import MarkerCluster
from IPython.display import display
warnings.filterwarnings('ignore')


## 3. Importing datasets

In [127]:
import requests
import pandas as pd
import os

def fetch_data(base_url, dataset, api_key, num_records=99, offset=0):
    all_records = []
    max_offset = 9900  # Maximum number of requests

    while True:
        # Maximum limit check
        if offset > max_offset:
            break

        # Create API request URL
        filters = f'{dataset}/records?limit={num_records}&offset={offset}'
        url = f'{base_url}{filters}&api_key={api_key}'

        # Start request
        try:
            result = requests.get(url, timeout=10)
            result.raise_for_status()
            records = result.json().get('results')
        except requests.exceptions.RequestException as e:
            raise Exception(f"API request failed: {e}")
        if records is None:
            break
        all_records.extend(records)
        if len(records) < num_records:
            break

        # Next cycle offset
        offset += num_records

    # DataFrame all data
    df = pd.DataFrame(all_records)
    return df

# Retrieve API key from environment variable
API_KEY = os.environ.get("API_KEY")
BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'




In [128]:
# Data set name
FOOTPATH_STEEPNESS = 'footpath-steepness'

footpath_steepness = fetch_data(BASE_URL, FOOTPATH_STEEPNESS, API_KEY)

footpath_steepness

Unnamed: 0,geo_point_2d,geo_shape,grade1in,gradepc,segside,statusid,asset_type,deltaz,streetid,mccid_int,mcc_id,address,rlmax,rlmin,distance
0,"{'lon': 144.94866061456034, 'lat': -37.8230361...","{'type': 'Feature', 'geometry': {'coordinates'...",4.2,23.81,,8,Road Footway,6.77,3094.0,30821.0,1388075,Yarra River,6.86,0.09,28.43
1,"{'lon': 144.91714933764632, 'lat': -37.7954295...","{'type': 'Feature', 'geometry': {'coordinates'...",,,,,Road Footway,,,,1534622,,,,
2,"{'lon': 144.9172426574227, 'lat': -37.79544286...","{'type': 'Feature', 'geometry': {'coordinates'...",,,,,Road Footway,,,,1534622,,,,
3,"{'lon': 144.92075182140118, 'lat': -37.7958016...","{'type': 'Feature', 'geometry': {'coordinates'...",35.1,2.85,,,Road Footway,0.23,,,1387592,,2.78,2.55,8.07
4,"{'lon': 144.92328274904054, 'lat': -37.7965483...","{'type': 'Feature', 'geometry': {'coordinates'...",109.6,0.91,,,Road Footway,0.01,,,1387085,,3.39,3.38,1.11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9994,"{'lon': 144.9447738565619, 'lat': -37.81498889...","{'type': 'Feature', 'geometry': {'coordinates'...",,,,2,Road Footway,,117766.0,23298.0,1514085,Harbour Esplanade between La Trobe Street and ...,,,
9995,"{'lon': 144.9447519625819, 'lat': -37.81494022...","{'type': 'Feature', 'geometry': {'coordinates'...",,0.00,,2,Road Footway,0.00,117766.0,23298.0,1513827,Harbour Esplanade between La Trobe Street and ...,1.78,1.78,0.00
9996,"{'lon': 144.94227756521875, 'lat': -37.8131886...","{'type': 'Feature', 'geometry': {'coordinates'...",9.4,10.68,South,2,Road Footway,0.11,117768.0,22439.0,1467461,Docklands Drive between Harbour Esplanade and ...,2.17,2.06,1.03
9997,"{'lon': 144.94325855273087, 'lat': -37.8141532...","{'type': 'Feature', 'geometry': {'coordinates'...",147.0,0.68,,,Road Footway,0.21,,,1450264,,3.07,2.86,30.89


In [129]:
# data set name
TREE_CANOPIES = 'tree-canopies-2021-urban-forest'

tree_canopies = fetch_data(BASE_URL, TREE_CANOPIES, API_KEY)

tree_canopies

Unnamed: 0,geo_point_2d,geo_shape
0,"{'lon': 144.96886261392126, 'lat': -37.8167499...","{'type': 'Feature', 'geometry': {'coordinates'..."
1,"{'lon': 144.9669639677292, 'lat': -37.81662693...","{'type': 'Feature', 'geometry': {'coordinates'..."
2,"{'lon': 144.9382236726225, 'lat': -37.81614809...","{'type': 'Feature', 'geometry': {'coordinates'..."
3,"{'lon': 144.94576682162327, 'lat': -37.8163160...","{'type': 'Feature', 'geometry': {'coordinates'..."
4,"{'lon': 144.97644113437724, 'lat': -37.8168912...","{'type': 'Feature', 'geometry': {'coordinates'..."
...,...,...
9994,"{'lon': 144.90466663229049, 'lat': -37.8255233...","{'type': 'Feature', 'geometry': {'coordinates'..."
9995,"{'lon': 144.9635453363733, 'lat': -37.82664482...","{'type': 'Feature', 'geometry': {'coordinates'..."
9996,"{'lon': 144.97959424170895, 'lat': -37.8269849...","{'type': 'Feature', 'geometry': {'coordinates'..."
9997,"{'lon': 144.95797389862233, 'lat': -37.8264462...","{'type': 'Feature', 'geometry': {'coordinates'..."


## 4. Preprocess the Data

In [130]:
# Assuming datasets are already loaded as footpath_steepness and tree_canopies

# 1. Check for Missing Values
print("Missing values in Footpath Steepness dataset:")
print(footpath_steepness.isna().sum())
print(f"Footpath Steepness dataset has {footpath_steepness.isna().sum().sum()} missing values\n")

print("Missing values in Tree Canopies dataset:")
print(tree_canopies.isna().sum())
print(f"Tree Canopies dataset has {tree_canopies.isna().sum().sum()} missing values\n")

# 2. Handle Missing Values
# We can use forward fill followed by backward fill as a basic method to handle missing values.
footpath_steepness.fillna(method='ffill', inplace=True)
footpath_steepness.fillna(method='bfill', inplace=True)

tree_canopies.fillna(method='ffill', inplace=True)
tree_canopies.fillna(method='bfill', inplace=True)

# Verify if missing values are handled
print("Missing values in Footpath Steepness dataset after filling:")
print(footpath_steepness.isna().sum())
print(f"Footpath Steepness dataset has {footpath_steepness.isna().sum().sum()} missing values after filling\n")

print("Missing values in Tree Canopies dataset after filling:")
print(tree_canopies.isna().sum())
print(f"Tree Canopies dataset has {tree_canopies.isna().sum().sum()} missing values after filling\n")

# 3. Remove Duplicates
footpath_steepness.drop_duplicates(inplace=True)
tree_canopies.drop_duplicates(inplace=True)

# 4. Convert Data Types (if necessary)
# Convert any potential date columns to datetime (adjust 'date_column' to the actual column name if exists)
# For example, if there was a 'date' column:
# footpath_steepness['date_column'] = pd.to_datetime(footpath_steepness['date_column'])
# tree_canopies['date_column'] = pd.to_datetime(tree_canopies['date_column'])

# Display cleaned data to confirm
print("Cleaned Footpath Steepness Data:")
print(footpath_steepness.head())

print("Cleaned Tree Canopies Data:")
print(tree_canopies.head())



Missing values in Footpath Steepness dataset:
geo_point_2d       0
geo_shape        307
grade1in        2125
gradepc         1296
segside         6872
statusid        3929
asset_type         0
deltaz          1296
streetid        3929
mccid_int       3929
mcc_id             0
address         3931
rlmax           1296
rlmin           1296
distance        1296
dtype: int64
Footpath Steepness dataset has 31502 missing values

Missing values in Tree Canopies dataset:
geo_point_2d    0
geo_shape       0
dtype: int64
Tree Canopies dataset has 0 missing values

Missing values in Footpath Steepness dataset after filling:
geo_point_2d    0
geo_shape       0
grade1in        0
gradepc         0
segside         0
statusid        0
asset_type      0
deltaz          0
streetid        0
mccid_int       0
mcc_id          0
address         0
rlmax           0
rlmin           0
distance        0
dtype: int64
Footpath Steepness dataset has 0 missing values after filling

Missing values in Tree Canopies d

TypeError: unhashable type: 'dict'

# 5.Data Analysis

In [None]:
# Descriptive statistics for numeric columns
print("Footpath Steepness Descriptive Statistics:")
print(footpath_steepness.describe())

print("Tree Canopies Descriptive Statistics:")
print(tree_canopies.describe())


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming footpath_steepness and tree_canopies are cleaned datasets

# Filter numeric columns only
numeric_fp = footpath_steepness.select_dtypes(include=[float, int])
numeric_tc = tree_canopies.select_dtypes(include=[float, int])

# Check if there are any numeric columns left
if not numeric_fp.empty:
    # Correlation Analysis for Footpath Steepness
    sns.heatmap(numeric_fp.corr(), annot=True, cmap='coolwarm')
    plt.title('Footpath Steepness Correlation Matrix')
    plt.show()
else:
    print("No numeric columns in Footpath Steepness data for correlation analysis.")

if not numeric_tc.empty:
    # Correlation Analysis for Tree Canopies
    sns.heatmap(numeric_tc.corr(), annot=True, cmap='coolwarm')
    plt.title('Tree Canopies Correlation Matrix')
    plt.show()
else:
    print("No numeric columns in Tree Canopies data for correlation analysis.")


# 6. Data Visualization

In [134]:
# Initialize map centered around Melbourne
m = folium.Map(location=[-37.81515, 144.95599], zoom_start=12)

# Add footpath steepness data to the map
for idx, row in footpath_steepness.iterrows():
    if 'latitude' in row and 'longitude' in row:
        folium.CircleMarker(
            location=[row['latitude'], row['longitude']],
            radius=5,
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6,
        ).add_to(m)

# Display map within Jupyter Notebook
display(m)



In [136]:
# Initialize map with additional features
m = folium.Map(location=[-37.81515, 144.95599], zoom_start=12)
folium.TileLayer('Stamen Terrain').add_to(m)
folium.TileLayer('Stamen Toner').add_to(m)
folium.LayerControl().add_to(m)

# Add marker cluster
marker_cluster = MarkerCluster().add_to(m)

# Sample data (replace with your actual data)
data = [
    {'lat': -37.815, 'lon': 144.955, 'info': 'Sample A'},
    {'lat': -37.820, 'lon': 144.960, 'info': 'Sample B'}
]

for item in data:
    folium.CircleMarker(
        location=[item['lat'], item['lon']],
        radius=10,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=item['info']
    ).add_to(marker_cluster)

# Display enhanced map within Jupyter Notebook
display(m)
