<div class="usecase-title">Waste Management Efficiency Investigation - Argyle Square case study</div>

<div class="usecase-authors"><b>Authored by: Peregrin J Ryan</b></div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<div class="usecase-section-header">Scenario</div>

Melbourne is the host to many events, both national (Melbourne Cup) and international (Tennis Open, Formula One). Because of this the Melbourne city infrastructure will receive far more stress on its infrastructure. One of the most important and most visual for tourists and citizens is waste management as overflowing bins and rubbish in the street makes the city appear worse and may give the city a poor reputation as well as having the potential for pollution of the Yarra river or local flora and fauna. Because of this, this document will investigate how the waste is managed in Melbourne inner city / city. It will look at when pedestrians are active and when bins are most likely to be used and emptied as well as the time they are full. The goal is to have some way of measuring a case study of waste management near the main melbourne CBD during less active times to find if the waste management in melbourne is capable of handling more busy times and events.

<div class="usecase-section-header">What this use case will teach you</div>

At the end of this use case you will:
- Data from the Melbourne open data (MOP) using API v2.1 GET request
- The condition of bins in Argyle Square park.
- The waste levels throughout the day.
- The times when pedestrians are most likely to see and use the public bins.
- When are the times a bin is most likely to be full.

<div class="usecase-section-header">Introduction and background relating to problem:</div>

Trash from large events or even the infrastructure stress of being the host to so many people within a city puts weight on the infrastructure. When it fails it causes many public issues, like trash piling or litter going into waterways and into nature. Because of this it is paramount for cities like melbourne to put effort into looking after waste.


By evaluating and looking at publicly available data we can use bin data collected from Argyle square as a case study to see how these bins operate and if they are effective at mitigating pollution. We will also use surveyed data to see how the bins are maintained and pedestrian data to see when pedestrians are likely to use the bins. This is to see how on average melbourne's waste infrastructure is working currently and make an assessment if Melbournes waste management can handle more use due to public events.


So this report will look at:
- Waste infrastructure (condition of bins).
- Average fill level.
- When are bins 'full'.
- When are bins emptied.
- What days and what times are bins filled more.
- What times are there more pedestrians.
- When do pedestrians see or interact with full bins.


This is to then make insights into:
- Are bins available to pedestrians at busy times.
- Are bins emptied enough to meet needs.
   - What level are the bins at on average and is it already at its limits without large public events.
- Are the conditions of the bins good enough to support the current needs.


## Datasets used:
Dataset 1: https://data.melbourne.vic.gov.au/explore/dataset/netvox-r718x-bin-sensor/table/?disjunctive.dev_id&sort=-sensor_name


Dataset 2: https://data.melbourne.vic.gov.au/explore/dataset/street-furniture-including-bollards-bicycle-rails-bins-drinking-fountains-horse-/table/?sort=location_desc


Dataset 3: https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/table/?sort=-hourday

## Stages:
### Stage 1:
- Import needed Python packages
- Connect to API
- Load the data 
- Analyse the data

## Import packages
First we need to import packages used for this project:

In [1]:
# Imports needed to request and collect data from API
import requests
import API_store
import pandas as pd
from io import StringIO

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Collect Data
Here we are going to make a function to connect to the API, this can then download the data depending on the provided dataset_id.

In [2]:
# This is the function to collect the data from the API
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    apikey = API_store.API
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        'api_key': apikey
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')

Now we actually collect our data and we will print off some information about each to confirm it was correct.

In [3]:
# dataset_ids = ['netvox-r718x-bin-sensor','pedestrian-counting-system-monthly-counts-per-hour', 'waste-collected-per-month']
# bin_sensor_df = collect_data(dataset_ids[0])
# pedestrian_counting_df = collect_data(dataset_ids[1])
# waste_collected_df = collect_data(dataset_ids[2])

In [4]:
# *** REMOVE ON RELEASE *** - Use to save time from pulling data from API
bin_sensor_df = pd.read_csv('bin_sensor_df.csv')
pedestrian_counting_df = pd.read_csv('pedestrian_counting_df.csv')
waste_collected_df = pd.read_csv('waste_collected_df.csv')

In [5]:
print(f'The datasets collected are: \n-bin_sensor_df: length {len(bin_sensor_df)}\n-pedestrian_counting_df: length {len(pedestrian_counting_df)}\n-waste_collected_df: length {len(waste_collected_df)}')

The datasets collected are: 
-bin_sensor_df: length 969611
-pedestrian_counting_df: length 2257076
-waste_collected_df: length 132


Now We check a more detailed print of our data:

In [6]:
bin_sensor_df.head(10)

Unnamed: 0,dev_id,time,temperature,distance,filllevel,battery,lat_long,sensor_name,fill_level
0,r718x-6f0b,2024-10-27T04:02:18+00:00,22.4,65535.0,255.0,3.6,"-37.8020809, 144.9654563",r718x-bin sensor 14,-8877.0
1,r718x-676c,2024-10-27T04:03:05+00:00,21.8,200.0,90.0,3.6,"-37.8031969, 144.9652732",r718x-bin sensor 2,72.0
2,r718x-6f0b,2024-10-13T04:02:04+00:00,22.7,468.0,40.0,3.6,"-37.8020809, 144.9654563",r718x-bin sensor 14,35.0
3,r718x-6f10,2024-10-13T03:03:40+00:00,22.7,202.0,89.0,3.6,,,72.0
4,r718x-6f31,2024-10-13T04:02:36+00:00,23.0,200.0,74.0,3.6,"-37.8022594, 144.9659489",r718x-bin sensor 19,72.0
5,r718x-6f10,2024-10-13T04:03:40+00:00,22.5,202.0,89.0,3.6,,,72.0
6,r718x-6f0b,2024-10-06T04:01:58+00:00,22.5,65535.0,255.0,3.6,"-37.8020809, 144.9654563",r718x-bin sensor 14,-8877.0
7,r718x-6f34,2024-10-06T03:59:48+00:00,23.5,201.0,74.0,3.6,"-37.802165, 144.9661423",r718x-bin sensor 20,72.0
8,r718x-6f10,2024-10-06T04:03:29+00:00,22.1,203.0,89.0,3.6,,,72.0
9,r718x-6f0b,2024-09-21T17:16:45+00:00,19.8,461.0,41.0,3.6,"-37.8020809, 144.9654563",r718x-bin sensor 14,36.0


In [7]:
pedestrian_counting_df.head(10)

Unnamed: 0,id,location_id,sensing_date,hourday,direction_1,direction_2,pedestriancount,sensor_name,location
0,432020210929,43,2021-09-29,20,2,6,8,UM2_T,"-37.79844526, 144.96411782"
1,92020211116,9,2021-11-16,20,30,22,52,Col700_T,"-37.81982992, 144.95102555"
2,58220220303,58,2022-03-03,2,26,27,53,Bou688_T,"-37.81686075, 144.95358075"
3,241220220728,24,2022-07-28,12,571,542,1113,Col620_T,"-37.81887963, 144.95449198"
4,35020230331,35,2023-03-31,0,131,253,384,SouthB_T,"-37.82018685, 144.96508508"
5,46420220617,46,2022-06-17,4,2,2,4,Pel147_T,"-37.80240719, 144.9615673"
6,1431720241106,143,2024-11-06,17,185,318,503,Spencer_T,"-37.821728, 144.95557015"
7,77420230127,77,2023-01-27,4,2,3,5,HarEsP_T,"-37.81441438, 144.94433026"
8,131920231129,131,2023-11-29,9,34,17,51,King2_T,"-37.82009057, 144.95758725"
9,18320250202,18,2025-02-02,3,1,6,7,Col12_T,"-37.81344862, 144.97305353"


In [8]:
waste_collected_df.head(10)

Unnamed: 0,date,residential,public_litter_bins,dumped_rubbish,street_sweepings,mattresses,commingled_recycling,cardboard,hardwaste_to_landfill,hardwaste_recovered,hardwaste_total,green_waste,month
0,2015-01,1958.38,328.79,147.72,345.68,501,642.18,132.66,85.52,0.0,85.52,3.52,January
1,2016-02,2202.28,281.46,151.18,293.0,173,708.31,196.96,41.72,19.12,60.84,5.56,February
2,2009-05,1249.74,330.36,169.98,691.28,0,376.02,168.52,14.82,0.0,14.82,2.79,May
3,2012-10,1732.58,377.41,143.08,402.98,216,497.62,168.29,37.64,0.0,37.64,8.62,October
4,2014-06,2093.58,371.56,137.66,469.77,204,526.92,148.23,74.64,0.0,74.64,4.0,June
5,2009-12,1637.58,465.0,181.24,542.3,21,609.61,212.64,12.48,0.0,12.48,2.75,December
6,2019-01,2549.5,225.68,138.75,296.02,625,969.07,64.44,50.5,17.45,67.95,9.34,January
7,2010-04,1488.46,396.6,127.84,509.96,170,529.0,164.88,15.88,0.0,15.88,3.85,April
8,2012-06,1523.5,315.99,156.88,505.38,224,468.34,142.82,27.54,0.0,27.54,2.78,June
9,2011-06,1541.58,306.72,175.36,584.16,217,531.58,184.74,26.68,0.0,26.68,3.48,June


In [9]:
print("Bin data")
print("are any values null-->",bin_sensor_df.isnull().values.any())
print("-------------------------------------------------------------------------------------")
print("Pedestrian data")
print("are any values null-->",pedestrian_counting_df.isnull().values.any())
print("-------------------------------------------------------------------------------------")
print("Waste data")
print("are any values null-->",waste_collected_df.isnull().values.any())
print("-------------------------------------------------------------------------------------")

Bin data
are any values null--> True
-------------------------------------------------------------------------------------
Pedestrian data
are any values null--> False
-------------------------------------------------------------------------------------
Waste data
are any values null--> False
-------------------------------------------------------------------------------------


We can see that our Pedestrian data and our Waste data dont have any missing values, which is great. But we need to know how much of our collected bin data is missing since it might cause some issues down the line.

In [18]:
print(f"Missing data:\n{bin_sensor_df.isnull().sum()}\n------\n",
      f"Toal missing values: {len(bin_sensor_df[bin_sensor_df.isnull().any(axis=1)])}\nTotal values: {len(bin_sensor_df)}",
      f"\nPercent of missing values: {(len(bin_sensor_df[bin_sensor_df.isnull().any(axis=1)])/len(bin_sensor_df))*100}")

Missing data:
dev_id              0
time                0
temperature        78
distance           78
filllevel          78
battery            78
lat_long       125591
sensor_name    125591
fill_level         78
dtype: int64
------
 Toal missing values: 125660
Total values: 969611 
Percent of missing values: 12.959836470502086


In [10]:
# *** REMOVE FOR RELEASE*** -use to get csv of the API calls
# bin_sensor_df.to_csv('bin_sensor_df.csv', index=False)
# pedestrian_counting_df.to_csv('pedestrian_counting_df.csv', index=False)
# waste_collected_df.to_csv('waste_collected_df.csv',index=False)

### TO DO
Next steps:
- Bin data (whats missing)
- Bin usage
    - Average bin use
    - Most used bins
    - When are they full
    - when emptied
- Pedestrian data
    - Most frequent times
    - Least frequent times
    - Most and least busy times
- Compare times
    - Pedestrian times vs bin full times
    - Are bins down when pedestrian numbers are up
- Month to month comparision
    - What is the effect on the waste collected when large events happen
    - What are the bins fullness like on busiest and least busy months
- Conclusion

***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```