<div class="usecase-title"><h3>Predicting Future Patron Capacity of Bars and Pubs in Melbourne</h3></div>

<div class="usecase-authors"><b>Authored by: </b> Venuka Hirushan Wijenayake</div>

<div class="usecase-authors"><b>Date: </b> T2 2024 (July - September)</div>

<div class="usecase-duration"><b>Duration:</b> 270 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python, Machine Learning</div>
</div>

### Dataset Import Through API

In [1]:
import requests
import pandas as pd
from io import StringIO

#Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = api_key #use if use datasets API_key permissions
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        #'api_key': apikey  #use if use datasets API_key permissions
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')

In [2]:
# Set dataset_id to query for the API call dataset name
dataset_id1 = 'bars-and-pubs-with-patron-capacity'
# Save dataset to df varaible
bars_pubs = collect_data(dataset_id1)
# Check number of records in df
print(f'The dataset contains {len(bars_pubs)} records.')
# View df
bars_pubs.head(3)

The dataset contains 4696 records.


Unnamed: 0,census_year,block_id,property_id,base_property_id,building_address,clue_small_area,trading_name,business_address,number_of_patrons,longitude,latitude,location
0,2002,11,108972,108972,10-22 Spencer Street MELBOURNE 3000,Melbourne (CBD),Explorers Inn,10-22 Spencer Street MELBOURNE 3000,50,144.955254,-37.820511,"-37.82051068881513, 144.95525416628004"
1,2002,14,103172,103172,31-39 Elizabeth Street MELBOURNE 3000,Melbourne (CBD),Connells Tavern,35 Elizabeth Street MELBOURNE 3000,350,144.964322,-37.817426,"-37.81742610667125, 144.964321660097"
2,2002,15,103944,103944,277-279 Flinders Lane MELBOURNE 3000,Melbourne (CBD),De Biers,"Unit 1, Basement , 277 Flinders Lane MELBOURNE...",400,144.965307,-37.817242,"-37.81724194023457, 144.96530699086"


### Preprocessing Data

#### Displaying available columns

In [3]:
# Print available columns in the pre-colonial trees cleaned dataset
print("Available columns in Bars and Pubs dataset:")
print(bars_pubs.columns.tolist())

Available columns in Bars and Pubs dataset:
['census_year', 'block_id', 'property_id', 'base_property_id', 'building_address', 'clue_small_area', 'trading_name', 'business_address', 'number_of_patrons', 'longitude', 'latitude', 'location']


#### Removing Unwanted Columns

In [4]:
# Remove irrelevant columns
columns_to_remove = ['block_id', 'property_id', 'base_property_id', 'clue_small_area', 'business_address']
bars_pubs_updated = bars_pubs.drop(columns=columns_to_remove)

# Verify the columns have been removed
print("Remaining columns in Bars and Pubs dataset:")
print(bars_pubs_updated.columns.tolist())

Remaining columns in Bars and Pubs dataset:
['census_year', 'building_address', 'trading_name', 'number_of_patrons', 'longitude', 'latitude', 'location']


#### Identifying Missing Values

In [5]:
# Check for missing values in the dataset
missing_values = bars_pubs_updated.isnull().sum()

# Print the number of missing values in each column
print("Missing values in each column:")
print(missing_values)

Missing values in each column:
census_year           0
building_address      0
trading_name          0
number_of_patrons     0
longitude            20
latitude             20
location             20
dtype: int64


#### Displaying Important Details of Missing Rows

In [12]:
# Filter rows with missing values in 'longitude', 'latitude', or 'location'
missing_values_df = bars_pubs_updated[bars_pubs_updated['longitude'].isnull() | 
                                      bars_pubs_updated['latitude'].isnull() | 
                                      bars_pubs_updated['location'].isnull()]

# Select distinct 'building_address' and 'trading_name'
distinct_missing_values = missing_values_df[['building_address', 'trading_name']].drop_duplicates()

# Display the result
print("'Building Address' and 'Trading Name' where 'Longitude', 'Latitude', or 'Location' are missing:")
print(distinct_missing_values)

'Building Address' and 'Trading Name' where 'Longitude', 'Latitude', or 'Location' are missing:
                              building_address  \
41    353 Little Collins Street MELBOURNE 3000   
123            4-6 Goldie Place MELBOURNE 3000   
229            4-6 Goldie Place MELBOURNE 3000   
953           13 Heffernan Lane MELBOURNE 3000   
990          Evan Walker Bridge SOUTHBANK 3006   
991     816 Lorimer Street PORT MELBOURNE 3207   
1007        25-27 Rankins Road KENSINGTON 3031   
2415      27-45 Whiteman Street SOUTHBANK 3006   
2606     717-731 Collins Street DOCKLANDS 3008   
2607         524 Macaulay Road KENSINGTON 3031   
3177      42-44 Lonsdale Street MELBOURNE 3000   
3420  143-175 Harbour Esplanade DOCKLANDS 3008   
4104     607-623 Collins Street MELBOURNE 3000   
4186  143-175 Harbour Esplanade DOCKLANDS 3008   
4188   265-271 Racecourse Road KENSINGTON 3031   

                          trading_name  
41                    Grosvenor Tavern  
123                Pa

In [13]:
!pip install geopy

Defaulting to user installation because normal site-packages is not writeable


#### Filling up missing values manually

In [16]:
# Provided latitude and longitude values for the missing data
address_coordinates = {
    "13 Heffernan Lane MELBOURNE 3000": (-37.811798, 144.966599),
    "143-175 Harbour Esplanade DOCKLANDS 3008": (-37.81741, 144.94591),
    "25-27 Rankins Road KENSINGTON 3031": (-37.78929, 144.93211),
    "265-271 Racecourse Road KENSINGTON 3031": (-37.78858, 144.93225),
    "27-45 Whiteman Street SOUTHBANK 3006": (-37.8251819, 144.9582948),
    "353 Little Collins Street MELBOURNE 3000": (-37.8155378, 144.9629855),
    "42-44 Lonsdale Street MELBOURNE 3000": (-37.8097472, 144.9710871),
    "4-6 Goldie Place MELBOURNE 3000": (-37.8132486, 144.9605705),
    "524 Macaulay Road KENSINGTON 3031": (-37.794177, 144.9286062),
    "607-623 Collins Street MELBOURNE 3000": (-37.8190018, 144.9541476),
    "717-731 Collins Street DOCKLANDS 3008": (-37.8149, 144.9505),
    "816 Lorimer Street PORT MELBOURNE 3207": (-37.8221281, 144.9309547),
    "Evan Walker Bridge SOUTHBANK 3006": (-37.8196329, 144.9651296)
}

# Create a copy of the updated DataFrame and name it bars_pubs_cleaned
bars_pubs_cleaned = bars_pubs_updated.copy()

# Update the new DataFrame with the provided latitude and longitude values
for address, (lat, lon) in address_coordinates.items():
    bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'] == address, 'latitude'] = lat
    bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'] == address, 'longitude'] = lon

# Verify the updated values
print("Updated dataset with filled latitude and longitude:")
print(bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'].isin(address_coordinates.keys()), ['building_address', 'latitude', 'longitude']])

Updated dataset with filled latitude and longitude:
                              building_address   latitude   longitude
19        42-44 Lonsdale Street MELBOURNE 3000 -37.809747  144.971087
41    353 Little Collins Street MELBOURNE 3000 -37.815538  144.962986
123            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570
168            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570
229            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570
...                                        ...        ...         ...
4297           4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570
4302          13 Heffernan Lane MELBOURNE 3000 -37.811798  144.966599
4341  143-175 Harbour Esplanade DOCKLANDS 3008 -37.817410  144.945910
4342  143-175 Harbour Esplanade DOCKLANDS 3008 -37.817410  144.945910
4385           4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570

[99 rows x 3 columns]


In [17]:
# Create a copy of the updated DataFrame and name it bars_pubs_cleaned
bars_pubs_cleaned = bars_pubs_updated.copy()

# Update the new DataFrame with the provided latitude and longitude values and fill location column
for address, (lat, lon) in address_coordinates.items():
    bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'] == address, 'latitude'] = lat
    bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'] == address, 'longitude'] = lon
    bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'] == address, 'location'] = f"{lat}, {lon}"

# Verify the updated values
print("Updated dataset with filled latitude, longitude, and location:")
print(bars_pubs_cleaned.loc[bars_pubs_cleaned['building_address'].isin(address_coordinates.keys()), ['building_address', 'latitude', 'longitude', 'location']])

Updated dataset with filled latitude, longitude, and location:
                              building_address   latitude   longitude  \
19        42-44 Lonsdale Street MELBOURNE 3000 -37.809747  144.971087   
41    353 Little Collins Street MELBOURNE 3000 -37.815538  144.962986   
123            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570   
168            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570   
229            4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570   
...                                        ...        ...         ...   
4297           4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570   
4302          13 Heffernan Lane MELBOURNE 3000 -37.811798  144.966599   
4341  143-175 Harbour Esplanade DOCKLANDS 3008 -37.817410  144.945910   
4342  143-175 Harbour Esplanade DOCKLANDS 3008 -37.817410  144.945910   
4385           4-6 Goldie Place MELBOURNE 3000 -37.813249  144.960570   

                      location  
19    -37.8097472, 144.9710

#### Rechecking Missing Values

In [18]:
# Check for missing values in the dataset
missing_values_new = bars_pubs_cleaned.isnull().sum()

# Print the number of missing values in each column
print("Missing values in each column:")
print(missing_values_new)

Missing values in each column:
census_year          0
building_address     0
trading_name         0
number_of_patrons    0
longitude            0
latitude             0
location             0
dtype: int64
