# Tourist Hotspots vs Hospitality Capacity: Are There Enough Seats?

**Authored by:** Manya  
**Duration:** 90 mins  
**Level:** Intermediate  
**Pre-requisite Skills:** Python, Pandas, Data Visualisation, Geospatial Mapping

## Background

Melbourne welcomes millions of tourists each year. This analysis investigates whether hospitality infrastructure is aligned with tourist demand by measuring foot traffic at major landmarks and comparing with nearby café and restaurant seating capacities.

## Section 1: Imports and Setup

Import all required libraries and configure our environment:

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import MarkerCluster, HeatMap
import requests
from io import StringIO
import warnings
from datetime import datetime, timedelta
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Configuration
warnings.filterwarnings('ignore')
plt.style.use('default')
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# API Configuration
API_KEY = "a620d83783d57950220021aed63caba593c7808b4d258e481df98904"

print("TOURIST HOTSPOTS vs HOSPITALITY CAPACITY ANALYSIS")
print("=" * 70)
print("Investigating whether Melbourne's hospitality infrastructure")
print("meets tourist demand at popular landmarks")
print("=" * 70)

TOURIST HOTSPOTS vs HOSPITALITY CAPACITY ANALYSIS
Investigating whether Melbourne's hospitality infrastructure
meets tourist demand at popular landmarks


## Section 2: Data Collection Functions

Create a function to collect data from the Melbourne Open Data Portal using their API v2.1:

In [5]:
def collect_data(dataset_id, limit=50000):
    """
    Collect data from Melbourne Open Data Portal using API
    """
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    url = f'{base_url}{dataset_id}/exports/csv'
    params = {
        'select': '*',
        'limit': limit,
        'lang': 'en',
        'timezone': 'UTC',
        'api_key': API_KEY
    }
    
    try:
        response = requests.get(url, params=params)
        if response.status_code == 200:
            content = response.content.decode('utf-8')
            df = pd.read_csv(StringIO(content), delimiter=';')
            print(f" Loaded '{dataset_id}' successfully: {df.shape[0]} rows, {df.shape[1]} columns")
            return df
        else:
            print(f" Failed to load '{dataset_id}'. Status Code: {response.status_code}")
            return None
    except Exception as e:
        print(f" Error loading '{dataset_id}': {str(e)}")
        return None

## Section 3: Data Loading

Load our three main datasets:
1. **Pedestrian Count Data** - Hourly foot traffic from sensors
2. **Sensor Location Data** - Geographic metadata for sensors  
3. **Hospitality Venue Data** - Café and restaurant seating capacity

In [7]:
print("\n LOADING DATASETS")
print("-" * 50)

# Load the three main datasets
pedestrian_df = collect_data('pedestrian-counting-system-monthly-counts-per-hour', limit=150000)
sensor_df = collect_data('pedestrian-counting-system-sensor-locations')
hospitality_df = collect_data('cafes-and-restaurants-with-seating-capacity', limit=50000)

# Verify all datasets loaded successfully
datasets = {
    'Pedestrian Data': pedestrian_df,
    'Sensor Locations': sensor_df,
    'Hospitality Data': hospitality_df
}

missing_datasets = [name for name, df in datasets.items() if df is None]
if missing_datasets:
    print(f"\n WARNING: Failed to load: {', '.join(missing_datasets)}")
    exit()
else:
    print("\n All datasets loaded successfully!")


 LOADING DATASETS
--------------------------------------------------
 Loaded 'pedestrian-counting-system-monthly-counts-per-hour' successfully: 150000 rows, 9 columns
 Loaded 'pedestrian-counting-system-sensor-locations' successfully: 143 rows, 12 columns
 Loaded 'cafes-and-restaurants-with-seating-capacity' successfully: 50000 rows, 15 columns

 All datasets loaded successfully!


## Section 4: Data Exploration
Exploring our datasets to understand their structure, data types, and identify any data quality issues before cleaning.

### 4.1 Explore Pedestrian Data

In [9]:
print(" EXPLORING PEDESTRIAN DATA")
print("=" * 50)

print(f"Shape: {pedestrian_df.shape}")
print(f"\nColumns: {list(pedestrian_df.columns)}")
print(f"\nData types:\n{pedestrian_df.dtypes}")

print(f"\nFirst 5 rows:")
print(pedestrian_df.head())

print(f"\nBasic statistics:")
print(pedestrian_df.describe())

print(f"\nMissing values:")
print(pedestrian_df.isnull().sum())

 EXPLORING PEDESTRIAN DATA
Shape: (150000, 9)

Columns: ['id', 'location_id', 'sensing_date', 'hourday', 'direction_1', 'direction_2', 'pedestriancount', 'sensor_name', 'location']

Data types:
id                  int64
location_id         int64
sensing_date       object
hourday             int64
direction_1         int64
direction_2         int64
pedestriancount     int64
sensor_name        object
location           object
dtype: object

First 5 rows:
             id  location_id sensing_date  hourday  direction_1  direction_2  \
0  841020250213           84   2025-02-13       10          708          540   
1  701320250212           70   2025-02-12       13          145          152   
2   75020240421           75   2024-04-21        0           10           20   
3  751020240810           75   2024-08-10       10           47           44   
4    8220240916            8   2024-09-16        2            2            3   

   pedestriancount sensor_name                    location  
0

### 4.2 Explore Sensor Data

In [11]:
print("\n EXPLORING SENSOR DATA")
print("=" * 50)

print(f"Shape: {sensor_df.shape}")
print(f"\nColumns: {list(sensor_df.columns)}")
print(f"\nData types:\n{sensor_df.dtypes}")

print(f"\nFirst 5 rows:")
print(sensor_df.head())

print(f"\nSensor status distribution:")
print(sensor_df['status'].value_counts())

print(f"\nLocation types:")
print(sensor_df['location_type'].value_counts())

print(f"\nMissing values:")
print(sensor_df.isnull().sum())


 EXPLORING SENSOR DATA
Shape: (143, 12)

Columns: ['location_id', 'sensor_description', 'sensor_name', 'installation_date', 'note', 'location_type', 'status', 'direction_1', 'direction_2', 'latitude', 'longitude', 'location']

Data types:
location_id             int64
sensor_description     object
sensor_name            object
installation_date      object
note                   object
location_type          object
status                 object
direction_1            object
direction_2            object
latitude              float64
longitude             float64
location               object
dtype: object

First 5 rows:
   location_id                 sensor_description  sensor_name  \
0            2         Bourke Street Mall (South)     Bou283_T   
1            6  Flinders Street Station Underpass       FliS_T   
2            8                        Webb Bridge      WebBN_T   
3           17              Collins Place (South)      Col15_T   
4           21             155-161 Russel

### 4.3 Explore Hospitality Data

In [13]:
print("\n EXPLORING HOSPITALITY DATA")
print("=" * 50)

print(f"Shape: {hospitality_df.shape}")
print(f"\nColumns: {list(hospitality_df.columns)}")
print(f"\nData types:\n{hospitality_df.dtypes}")

print(f"\nFirst 5 rows:")
print(hospitality_df.head())

print(f"\nIndustry types:")
print(hospitality_df['industry_anzsic4_description'].value_counts())

print(f"\nSeating types:")
print(hospitality_df['seating_type'].value_counts())

print(f"\nSeating capacity statistics:")
print(hospitality_df['number_of_seats'].describe())

print(f"\nMissing values:")
print(hospitality_df.isnull().sum())

print(f"\nMissing coordinates:")
print(f"Rows with missing lat/long: {hospitality_df[['latitude', 'longitude']].isnull().any(axis=1).sum()}")


 EXPLORING HOSPITALITY DATA
Shape: (50000, 15)

Columns: ['census_year', 'block_id', 'property_id', 'base_property_id', 'building_address', 'clue_small_area', 'trading_name', 'business_address', 'industry_anzsic4_code', 'industry_anzsic4_description', 'seating_type', 'number_of_seats', 'longitude', 'latitude', 'location']

Data types:
census_year                       int64
block_id                          int64
property_id                       int64
base_property_id                  int64
building_address                 object
clue_small_area                  object
trading_name                     object
business_address                 object
industry_anzsic4_code             int64
industry_anzsic4_description     object
seating_type                     object
number_of_seats                   int64
longitude                       float64
latitude                        float64
location                         object
dtype: object

First 5 rows:
   census_year  block_id  propert

## Section 5: Data Cleaning and Preprocessing

Based on the exploration, several important findings are visible:

**Pedestrian Data:**
- All 150k records are complete (no missing values)
- Date range includes 2024-2025 data (very recent!)
- Some zero pedestrian counts (normal for overnight hours)

**Sensor Data:**
- All 143 sensors are active (status = 'A')
- 32 sensors missing direction info (indoor sensors likely)
- Mix of outdoor (109) and indoor (34) sensors

**Hospitality Data:**
- 483 venues missing coordinates (1% of data)
- Dominated by cafes/restaurants (75%) vs pubs/bars
- Some venues with 0 seats (takeaway only)
- Census year 2017 data

### 5.1 Clean Pedestrian Data

In [15]:
print("\n DATA CLEANING AND PREPROCESSING")
print("-" * 50)

print("\n5.1 Cleaning Pedestrian Data...")
pedestrian_df['sensing_date'] = pd.to_datetime(pedestrian_df['sensing_date'])
pedestrian_df['year'] = pedestrian_df['sensing_date'].dt.year
pedestrian_df['month'] = pedestrian_df['sensing_date'].dt.month
pedestrian_df['day_of_week'] = pedestrian_df['sensing_date'].dt.dayofweek
pedestrian_df['is_weekend'] = pedestrian_df['day_of_week'].isin([5, 6])

print(f"Date range: {pedestrian_df['sensing_date'].min()} to {pedestrian_df['sensing_date'].max()}")
print(f"Years available: {sorted(pedestrian_df['year'].unique())}")

# Use 2024 data for most recent analysis (2025 data might be incomplete)
recent_data = pedestrian_df[pedestrian_df['year'] == 2024].copy()
print(f" Using 2024 data: {len(recent_data):,} records")

# Check data quality
zero_counts = (pedestrian_df['pedestriancount'] == 0).sum()
negative_counts = (pedestrian_df['pedestriancount'] < 0).sum()
print(f"Zero pedestrian counts: {zero_counts:,} ({zero_counts/len(pedestrian_df)*100:.1f}% - normal for overnight)")
print(f"Negative counts: {negative_counts} (good - no issues)")

# Check for outliers
high_counts = (pedestrian_df['pedestriancount'] > 5000).sum()
print(f"Very high counts (>5000): {high_counts} (potential major events)")


 DATA CLEANING AND PREPROCESSING
--------------------------------------------------

5.1 Cleaning Pedestrian Data...
Date range: 2023-08-08 00:00:00 to 2025-08-07 00:00:00
Years available: [2023, 2024, 2025]
 Using 2024 data: 75,414 records
Zero pedestrian counts: 33 (0.0% - normal for overnight)
Negative counts: 0 (good - no issues)
Very high counts (>5000): 21 (potential major events)


### 5.2 Clean Sensor Data

In [17]:
print("\n5.2 Cleaning Sensor Data...")
sensor_df['installation_date'] = pd.to_datetime(sensor_df['installation_date'])

# All sensors are active, so use all of them
active_sensors = sensor_df.copy()  # All 143 sensors are status 'A'
print(f" All sensors are active: {len(active_sensors)} sensors")

print(f"Installation range: {sensor_df['installation_date'].min().year} to {sensor_df['installation_date'].max().year}")

# Direction info missing for indoor sensors (expected)
missing_directions = sensor_df['direction_1'].isnull().sum()
indoor_sensors = (sensor_df['location_type'] == 'Indoor').sum()
print(f"Sensors missing direction info: {missing_directions}")
print(f"Indoor sensors: {indoor_sensors} (direction not applicable)")

# Check coordinate validity for Melbourne
lat_range = f"{sensor_df['latitude'].min():.3f} to {sensor_df['latitude'].max():.3f}"
lon_range = f"{sensor_df['longitude'].min():.3f} to {sensor_df['longitude'].max():.3f}"
print(f" Coordinate ranges look good for Melbourne - Lat: {lat_range}, Lon: {lon_range}")


5.2 Cleaning Sensor Data...
 All sensors are active: 143 sensors
Installation range: 2009 to 2025
Sensors missing direction info: 32
Indoor sensors: 34 (direction not applicable)
 Coordinate ranges look good for Melbourne - Lat: -37.826 to -37.789, Lon: 144.929 to 144.986


### 5.3 Clean Hospitality Data

In [19]:
print("\n5.3 Cleaning Hospitality Data...")

# Remove venues without coordinates (483 out of 50,000)
hospitality_clean = hospitality_df.dropna(subset=['latitude', 'longitude']).copy()
print(f" Venues with coordinates: {len(hospitality_clean):,} out of {len(hospitality_df):,}")
print(f" Removed {len(hospitality_df) - len(hospitality_clean)} venues without location data")

# Clean text fields
hospitality_clean['trading_name'] = hospitality_clean['trading_name'].str.strip()
hospitality_clean['clue_small_area'] = hospitality_clean['clue_small_area'].str.strip()

# Filter to focus on main hospitality venues (cafes, restaurants, pubs)
main_hospitality = hospitality_clean[
    hospitality_clean['industry_anzsic4_description'].isin([
        'Cafes and Restaurants',
        'Pubs, Taverns and Bars'
    ])
].copy()

print(f" Main hospitality venues: {len(main_hospitality):,}")
print(f"  - Cafes & Restaurants: {(main_hospitality['industry_anzsic4_description'] == 'Cafes and Restaurants').sum():,}")
print(f"  - Pubs & Bars: {(main_hospitality['industry_anzsic4_description'] == 'Pubs, Taverns and Bars').sum():,}")

# Check seating data quality
zero_seats = (main_hospitality['number_of_seats'] == 0).sum()
high_seats = (main_hospitality['number_of_seats'] > 500).sum()
print(f"Venues with 0 seats: {zero_seats:,} (likely takeaway only)")
print(f"Venues with >500 seats: {high_seats} (large venues/clubs)")

# Coordinate validation for Melbourne
lat_valid = main_hospitality['latitude'].between(-38.0, -37.5).all()
lon_valid = main_hospitality['longitude'].between(144.5, 145.5).all()
print(f" Coordinates valid for Melbourne: Lat {lat_valid}, Lon {lon_valid}")

print(f" Final clean hospitality dataset: {len(main_hospitality):,} venues")


5.3 Cleaning Hospitality Data...
 Venues with coordinates: 49,517 out of 50,000
 Removed 483 venues without location data
 Main hospitality venues: 40,186
  - Cafes & Restaurants: 37,134
  - Pubs & Bars: 3,052
Venues with 0 seats: 5 (likely takeaway only)
Venues with >500 seats: 71 (large venues/clubs)
 Coordinates valid for Melbourne: Lat True, Lon True
 Final clean hospitality dataset: 40,186 venues


## Section 6: Identify Tourist Hotspots

Now that we have clean data, let's identify high foot traffic areas that can be considered tourist hotspots. This will cover:
1. Calculate average daily pedestrian counts per sensor
2. Define hotspots as the top 25% busiest locations  
3. Merge with sensor location data for geographic analysis

### 6.1 Calculate Pedestrian Traffic Metrics

In [21]:
print("\n IDENTIFYING TOURIST HOTSPOTS")
print("-" * 50)

print("\n6.1 Calculating pedestrian traffic metrics...")

# Aggregate pedestrian counts by sensor using 2024 data
daily_traffic = recent_data.groupby(['location_id', 'sensor_name']).agg({
    'pedestriancount': ['sum', 'mean', 'count'],
    'sensing_date': ['min', 'max']
}).round(2)

daily_traffic.columns = ['total_count', 'avg_hourly', 'data_points', 'first_date', 'last_date']
daily_traffic = daily_traffic.reset_index()

# Calculate average daily traffic (assuming 24 hours per day)
daily_traffic['avg_daily_traffic'] = daily_traffic['total_count'] / (daily_traffic['data_points'] / 24)

print(f" Traffic data calculated for {len(daily_traffic)} sensors")
print(f" Data points per sensor range: {daily_traffic['data_points'].min()} to {daily_traffic['data_points'].max()}")


 IDENTIFYING TOURIST HOTSPOTS
--------------------------------------------------

6.1 Calculating pedestrian traffic metrics...
 Traffic data calculated for 95 sensors
 Data points per sensor range: 24 to 974


In [22]:
# Merge with sensor location data
traffic_with_location = daily_traffic.merge(
    active_sensors[['location_id', 'sensor_description', 'latitude', 'longitude', 'location_type']], 
    on='location_id', 
    how='inner'
)

print(f" Traffic data merged with location data: {len(traffic_with_location)} sensors")

# Show traffic distribution
print(f"\nTraffic distribution:")
print(f"Min daily traffic: {traffic_with_location['avg_daily_traffic'].min():.0f}")
print(f"Max daily traffic: {traffic_with_location['avg_daily_traffic'].max():.0f}")
print(f"Median daily traffic: {traffic_with_location['avg_daily_traffic'].median():.0f}")
print(f"Mean daily traffic: {traffic_with_location['avg_daily_traffic'].mean():.0f}")

 Traffic data merged with location data: 97 sensors

Traffic distribution:
Min daily traffic: 146
Max daily traffic: 35474
Median daily traffic: 6027
Mean daily traffic: 8858


### 6.2 Identify Top Tourist Hotspots

In [24]:
print("\n6.2 Identifying top tourist hotspots...")

# Define hotspots as top 25% by average daily traffic
hotspot_threshold = traffic_with_location['avg_daily_traffic'].quantile(0.75)
hotspots = traffic_with_location[
    traffic_with_location['avg_daily_traffic'] >= hotspot_threshold
].copy()

hotspots = hotspots.sort_values('avg_daily_traffic', ascending=False)
print(f" Identified {len(hotspots)} tourist hotspots (top 25% by foot traffic)")
print(f" Hotspot threshold: {hotspot_threshold:.0f} average daily pedestrians")

# Show the range of hotspot traffic
print(f"\nHotspot traffic range:")
print(f"Busiest hotspot: {hotspots['avg_daily_traffic'].max():.0f} daily pedestrians")
print(f"Least busy hotspot: {hotspots['avg_daily_traffic'].min():.0f} daily pedestrians")


6.2 Identifying top tourist hotspots...
 Identified 25 tourist hotspots (top 25% by foot traffic)
 Hotspot threshold: 12170 average daily pedestrians

Hotspot traffic range:
Busiest hotspot: 35474 daily pedestrians
Least busy hotspot: 12170 daily pedestrians


In [25]:
# Display top 15 hotspots for better overview
print("\nTop 15 Tourist Hotspots:")
print("Rank  Location                                    Daily Traffic  Type")
print("-" * 70)

top_15_display = hotspots[['sensor_description', 'avg_daily_traffic', 'location_type']].head(15)
for rank, (idx, row) in enumerate(top_15_display.iterrows(), 1):
    location = row['sensor_description'][:35] + "..." if len(row['sensor_description']) > 35 else row['sensor_description']
    print(f"{rank:2d}.   {location:<40} {row['avg_daily_traffic']:>8.0f}    {row['location_type']}")


Top 15 Tourist Hotspots:
Rank  Location                                    Daily Traffic  Type
----------------------------------------------------------------------
 1.   Southbank                                   35474    Outdoor
 2.   Flinders La-Swanston St (West)              34862    Outdoor
 3.   Elizabeth St - Flinders St (East) -...      30911    Outdoor
 4.   State Library - New                         28191    Outdoor
 5.   Town Hall (West)                            25819    Outdoor
 6.   Melbourne Central                           22817    Outdoor
 7.   Flinders Street Station Underpass           22565    Outdoor
 8.   Princes Bridge                              22398    Outdoor
 9.   Melbourne Central-Elizabeth St (Eas...      22109    Outdoor
10.   Spencer St-Collins St (North)               21668    Outdoor
11.   Bourke Street Mall (North)                  21527    Outdoor
12.   Little Collins St-Swanston St (East...      18003    Outdoor
13.   The Arts Centre        

## Section 7: Analyze Hospitality Capacity Near Hotspots

 We've identified 25 major tourist hotspots with 12,170+ daily pedestrians each. The top locations include:
- **Southbank** (35,474 daily) - Major tourist waterfront area
- **Flinders St/Swanston St** (34,862 daily) - Key transport hub
- **State Library** (28,191 daily) - Cultural landmark
- **Town Hall & Melbourne Central** (25,819 & 22,817 daily) - Shopping/civic areas

Additionally, analysisng hospitality capacity within 500m walking distance of these hotspots.

### 7.1 Define Distance Calculation Function

In [27]:
def calculate_distance(lat1, lon1, lat2, lon2):
    """
    Calculate distance between two points using Haversine formula
    Returns distance in meters
    """
    from math import radians, cos, sin, asin, sqrt
    
    # Convert to radians
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
    
    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    return 2 * asin(sqrt(a)) * 6371000  # Earth radius in meters

def find_nearby_venues(hotspot_lat, hotspot_lon, venues_df, radius_meters=500):
    """
    Find hospitality venues within specified radius of a hotspot
    """
    distances = []
    for _, venue in venues_df.iterrows():
        dist = calculate_distance(hotspot_lat, hotspot_lon, venue['latitude'], venue['longitude'])
        if dist <= radius_meters:
            distances.append({
                'venue_index': venue.name,
                'distance': dist,
                'trading_name': venue['trading_name'],
                'seating_type': venue['seating_type'],
                'number_of_seats': venue['number_of_seats'],
                'industry_type': venue['industry_anzsic4_description']
            })
    return pd.DataFrame(distances)

print(" Distance calculation functions defined")
print(" Using 500m radius for 'walking distance' to venues")

 Distance calculation functions defined
 Using 500m radius for 'walking distance' to venues


### 7.2 Calculate Hospitality Capacity for Each Hotspot

This analysis will show which of our major tourist hotspots (12,170+ daily visitors) have sufficient nearby seating capacity.

In [29]:
print("\n ANALYZING HOSPITALITY CAPACITY NEAR HOTSPOTS")
print("-" * 50)

radius_meters = 500  # 500m radius for nearby venues
hotspot_analysis = []

print(f"Analyzing capacity within {radius_meters}m of 25 major hotspots...")

# Process each hotspot
for idx, (_, hotspot) in enumerate(hotspots.iterrows(), 1):
    print(f"Processing {idx}/25: {hotspot['sensor_description'][:40]}...")
    
    nearby_venues = find_nearby_venues(
        hotspot['latitude'], 
        hotspot['longitude'], 
        main_hospitality, 
        radius_meters
    )
    
    # Calculate capacity metrics
    total_venues = len(nearby_venues)
    total_seats = nearby_venues['number_of_seats'].sum() if total_venues > 0 else 0
    indoor_seats = nearby_venues[nearby_venues['seating_type'] == 'Seats - Indoor']['number_of_seats'].sum()
    outdoor_seats = nearby_venues[nearby_venues['seating_type'] == 'Seats - Outdoor']['number_of_seats'].sum()
    cafes_restaurants = len(nearby_venues[nearby_venues['industry_type'] == 'Cafes and Restaurants'])
    pubs_bars = len(nearby_venues[nearby_venues['industry_type'] == 'Pubs, Taverns and Bars'])
    
    # Key metric: seats per 1000 daily visitors
    seats_per_1000 = (total_seats / hotspot['avg_daily_traffic'] * 1000) if hotspot['avg_daily_traffic'] > 0 else 0
    
    hotspot_analysis.append({
        'rank': idx,
        'location_id': hotspot['location_id'],
        'sensor_name': hotspot['sensor_name'],
        'sensor_description': hotspot['sensor_description'],
        'avg_daily_traffic': hotspot['avg_daily_traffic'],
        'latitude': hotspot['latitude'],
        'longitude': hotspot['longitude'],
        'location_type': hotspot['location_type'],
        'nearby_venues': total_venues,
        'total_seats': total_seats,
        'indoor_seats': indoor_seats,
        'outdoor_seats': outdoor_seats,
        'cafes_restaurants': cafes_restaurants,
        'pubs_bars': pubs_bars,
        'seats_per_1000_visitors': seats_per_1000
    })

hotspot_capacity_df = pd.DataFrame(hotspot_analysis)
print(f" Analyzed hospitality capacity for all {len(hotspot_capacity_df)} major tourist hotspots")


 ANALYZING HOSPITALITY CAPACITY NEAR HOTSPOTS
--------------------------------------------------
Analyzing capacity within 500m of 25 major hotspots...
Processing 1/25: Southbank...
Processing 2/25: Flinders La-Swanston St (West)...
Processing 3/25: Elizabeth St - Flinders St (East) - New ...
Processing 4/25: State Library - New...
Processing 5/25: Town Hall (West)...
Processing 6/25: Melbourne Central...
Processing 7/25: Flinders Street Station Underpass...
Processing 8/25: Princes Bridge...
Processing 9/25: Melbourne Central-Elizabeth St (East)...
Processing 10/25: Spencer St-Collins St (North)...
Processing 11/25: Bourke Street Mall (North)...
Processing 12/25: Little Collins St-Swanston St (East)...
Processing 13/25: The Arts Centre...
Processing 14/25: Building 80 RMIT...
Processing 15/25: Swanston St - City Square...
Processing 16/25: Melbourne Convention Exhibition Centre...
Processing 17/25: Bourke St - Spencer St (North)...
Processing 18/25: I-Hub Southern Cross Station - Lon

In [30]:
# Show summary statistics
print("\n CAPACITY ANALYSIS SUMMARY")
print("-" * 50)

print(f"Average venues per hotspot: {hotspot_capacity_df['nearby_venues'].mean():.1f}")
print(f"Average seats per hotspot: {hotspot_capacity_df['total_seats'].mean():.0f}")
print(f"Average seats per 1000 visitors: {hotspot_capacity_df['seats_per_1000_visitors'].mean():.1f}")

print(f"\nCapacity range:")
print(f"Most venues nearby: {hotspot_capacity_df['nearby_venues'].max()} venues")
print(f"Fewest venues nearby: {hotspot_capacity_df['nearby_venues'].min()} venues")
print(f"Most seats available: {hotspot_capacity_df['total_seats'].max():,} seats")
print(f"Fewest seats available: {hotspot_capacity_df['total_seats'].min()} seats")

print(f"\nSeats per 1000 visitors range:")
print(f"Best capacity ratio: {hotspot_capacity_df['seats_per_1000_visitors'].max():.1f}")
print(f"Worst capacity ratio: {hotspot_capacity_df['seats_per_1000_visitors'].min():.1f}")


 CAPACITY ANALYSIS SUMMARY
--------------------------------------------------
Average venues per hotspot: 6472.3
Average seats per hotspot: 398178
Average seats per 1000 visitors: 21727.5

Capacity range:
Most venues nearby: 11252 venues
Fewest venues nearby: 1408 venues
Most seats available: 676,173 seats
Fewest seats available: 135382 seats

Seats per 1000 visitors range:
Best capacity ratio: 54033.8
Worst capacity ratio: 6625.7


In [31]:
print(" DEBUGGING CAPACITY ANALYSIS")
print("-" * 50)

# Check a few specific examples
print("Examining top 3 hotspots in detail:\n")

for i in range(3):
    hotspot = hotspots.iloc[i]
    print(f"{i+1}. {hotspot['sensor_description']}")
    print(f"   Location: ({hotspot['latitude']:.6f}, {hotspot['longitude']:.6f})")
    print(f"   Daily traffic: {hotspot['avg_daily_traffic']:.0f}")
    
    # Test with a much smaller radius first
    nearby_100m = find_nearby_venues(hotspot['latitude'], hotspot['longitude'], main_hospitality, 100)
    nearby_200m = find_nearby_venues(hotspot['latitude'], hotspot['longitude'], main_hospitality, 200)
    nearby_500m = find_nearby_venues(hotspot['latitude'], hotspot['longitude'], main_hospitality, 500)
    
    print(f"   Venues within 100m: {len(nearby_100m)}")
    print(f"   Venues within 200m: {len(nearby_200m)}")
    print(f"   Venues within 500m: {len(nearby_500m)}")
    print()

# Check the coordinate ranges of our datasets
print("Coordinate ranges comparison:")
print(f"Hotspots lat range: {hotspots['latitude'].min():.6f} to {hotspots['latitude'].max():.6f}")
print(f"Hotspots lon range: {hotspots['longitude'].min():.6f} to {hotspots['longitude'].max():.6f}")
print(f"Venues lat range: {main_hospitality['latitude'].min():.6f} to {main_hospitality['latitude'].max():.6f}")
print(f"Venues lon range: {main_hospitality['longitude'].min():.6f} to {main_hospitality['longitude'].max():.6f}")

 DEBUGGING CAPACITY ANALYSIS
--------------------------------------------------
Examining top 3 hotspots in detail:

1. Southbank
   Location: (-37.820187, 144.965085)
   Daily traffic: 35474
   Venues within 100m: 8
   Venues within 200m: 827
   Venues within 500m: 3636

2. Flinders La-Swanston St (West)
   Location: (-37.816686, 144.966897)
   Daily traffic: 34862
   Venues within 100m: 324
   Venues within 200m: 1208
   Venues within 500m: 7189

3. Elizabeth St - Flinders St (East) - New footpath
   Location: (-37.817980, 144.965034)
   Daily traffic: 30911
   Venues within 100m: 311
   Venues within 200m: 1325
   Venues within 500m: 6626

Coordinate ranges comparison:
Hotspots lat range: -37.824018 to -37.807404
Hotspots lon range: 144.951026 to 144.968793
Venues lat range: -37.849719 to -37.776195
Venues lon range: 144.904169 to 144.988262


## Section 8: Refined Capacity Analysis (200m Radius)

Based on our debugging, Melbourne's CBD has extremely high venue density. A 500m radius captures thousands of venues per hotspot, making the analysis less meaningful. 

Using a **200m radius** (2-3 minute walk) for more realistic "immediate vicinity" analysis.

### 8.1 Recalculate with Realistic Walking Distance

In [33]:
print("\n REFINED HOSPITALITY CAPACITY ANALYSIS")
print("-" * 50)
print("Using 200m radius for 'immediate walking distance'")

radius_meters = 200  # More realistic for immediate vicinity
refined_hotspot_analysis = []

print(f"\nAnalyzing capacity within {radius_meters}m of 25 major hotspots...")

# Process each hotspot with refined radius
for idx, (_, hotspot) in enumerate(hotspots.iterrows(), 1):
    print(f"Processing {idx}/25: {hotspot['sensor_description'][:40]}...")
    
    nearby_venues = find_nearby_venues(
        hotspot['latitude'], 
        hotspot['longitude'], 
        main_hospitality, 
        radius_meters
    )
    
    # Calculate capacity metrics
    total_venues = len(nearby_venues)
    total_seats = nearby_venues['number_of_seats'].sum() if total_venues > 0 else 0
    indoor_seats = nearby_venues[nearby_venues['seating_type'] == 'Seats - Indoor']['number_of_seats'].sum()
    outdoor_seats = nearby_venues[nearby_venues['seating_type'] == 'Seats - Outdoor']['number_of_seats'].sum()
    cafes_restaurants = len(nearby_venues[nearby_venues['industry_type'] == 'Cafes and Restaurants'])
    pubs_bars = len(nearby_venues[nearby_venues['industry_type'] == 'Pubs, Taverns and Bars'])
    
    # Key metrics
    seats_per_1000 = (total_seats / hotspot['avg_daily_traffic'] * 1000) if hotspot['avg_daily_traffic'] > 0 else 0
    
    refined_hotspot_analysis.append({
        'rank': idx,
        'location_id': hotspot['location_id'],
        'sensor_description': hotspot['sensor_description'],
        'avg_daily_traffic': hotspot['avg_daily_traffic'],
        'latitude': hotspot['latitude'],
        'longitude': hotspot['longitude'],
        'nearby_venues': total_venues,
        'total_seats': total_seats,
        'indoor_seats': indoor_seats,
        'outdoor_seats': outdoor_seats,
        'cafes_restaurants': cafes_restaurants,
        'pubs_bars': pubs_bars,
        'seats_per_1000_visitors': seats_per_1000
    })

refined_capacity_df = pd.DataFrame(refined_hotspot_analysis)
print(f"\n REFINED ANALYSIS COMPLETE!")
print(f" Analyzed immediate vicinity capacity for all {len(refined_capacity_df)} hotspots")


 REFINED HOSPITALITY CAPACITY ANALYSIS
--------------------------------------------------
Using 200m radius for 'immediate walking distance'

Analyzing capacity within 200m of 25 major hotspots...
Processing 1/25: Southbank...
Processing 2/25: Flinders La-Swanston St (West)...
Processing 3/25: Elizabeth St - Flinders St (East) - New ...
Processing 4/25: State Library - New...
Processing 5/25: Town Hall (West)...
Processing 6/25: Melbourne Central...
Processing 7/25: Flinders Street Station Underpass...
Processing 8/25: Princes Bridge...
Processing 9/25: Melbourne Central-Elizabeth St (East)...
Processing 10/25: Spencer St-Collins St (North)...
Processing 11/25: Bourke Street Mall (North)...
Processing 12/25: Little Collins St-Swanston St (East)...
Processing 13/25: The Arts Centre...
Processing 14/25: Building 80 RMIT...
Processing 15/25: Swanston St - City Square...
Processing 16/25: Melbourne Convention Exhibition Centre...
Processing 17/25: Bourke St - Spencer St (North)...
Process

In [34]:
# Show refined summary statistics
print("\n REFINED CAPACITY ANALYSIS (200m radius)")
print("-" * 50)

print(f"Average venues per hotspot: {refined_capacity_df['nearby_venues'].mean():.1f}")
print(f"Average seats per hotspot: {refined_capacity_df['total_seats'].mean():.0f}")
print(f"Average seats per 1000 visitors: {refined_capacity_df['seats_per_1000_visitors'].mean():.1f}")

print(f"\nCapacity range:")
print(f"Most venues nearby: {refined_capacity_df['nearby_venues'].max()} venues")
print(f"Fewest venues nearby: {refined_capacity_df['nearby_venues'].min()} venues")
print(f"Most seats available: {refined_capacity_df['total_seats'].max():,} seats")
print(f"Fewest seats available: {refined_capacity_df['total_seats'].min()} seats")

print(f"\nSeats per 1000 visitors:")
print(f"Best capacity ratio: {refined_capacity_df['seats_per_1000_visitors'].max():.1f}")
print(f"Worst capacity ratio: {refined_capacity_df['seats_per_1000_visitors'].min():.1f}")
print(f"Median capacity ratio: {refined_capacity_df['seats_per_1000_visitors'].median():.1f}")


 REFINED CAPACITY ANALYSIS (200m radius)
--------------------------------------------------
Average venues per hotspot: 1147.0
Average seats per hotspot: 69994
Average seats per 1000 visitors: 3865.9

Capacity range:
Most venues nearby: 2676 venues
Fewest venues nearby: 9 venues
Most seats available: 190,593 seats
Fewest seats available: 2512 seats

Seats per 1000 visitors:
Best capacity ratio: 15452.4
Worst capacity ratio: 157.5
Median capacity ratio: 2694.5


### 8.2 Identify Capacity Gaps

Now let's identify which hotspots have insufficient immediate seating capacity relative to their foot traffic.

In [36]:
print("\n IDENTIFYING CAPACITY GAPS")
print("-" * 50)

# Sort by seats per 1000 visitors to identify gaps
capacity_sorted = refined_capacity_df.sort_values('seats_per_1000_visitors').reset_index(drop=True)

# Define capacity categories based on seats per 1000 visitors
def categorize_capacity(seats_per_1000):
    if seats_per_1000 < 10:
        return "Critical Gap"
    elif seats_per_1000 < 25:
        return "Low Capacity"  
    elif seats_per_1000 < 50:
        return "Moderate Capacity"
    else:
        return "High Capacity"

refined_capacity_df['capacity_category'] = refined_capacity_df['seats_per_1000_visitors'].apply(categorize_capacity)

# Show capacity distribution
print("Capacity Categories:")
category_counts = refined_capacity_df['capacity_category'].value_counts()
for category, count in category_counts.items():
    print(f"  {category}: {count} hotspots")

print(f"\n TOP 10 CAPACITY GAPS (Lowest Seats per 1000 visitors):")
print("Rank  Location                                Daily Traffic  Seats  Ratio")
print("-" * 75)

worst_capacity = capacity_sorted.head(10)
for i, row in worst_capacity.iterrows():
    location = row['sensor_description'][:35] + "..." if len(row['sensor_description']) > 35 else row['sensor_description']
    print(f"{i+1:2d}.   {location:<38} {row['avg_daily_traffic']:>6.0f}   {row['total_seats']:>5.0f}   {row['seats_per_1000_visitors']:>5.1f}")


 IDENTIFYING CAPACITY GAPS
--------------------------------------------------
Capacity Categories:
  High Capacity: 25 hotspots

 TOP 10 CAPACITY GAPS (Lowest Seats per 1000 visitors):
Rank  Location                                Daily Traffic  Seats  Ratio
---------------------------------------------------------------------------
 1.   Melbourne Convention Exhibition Cen...  15948    2512   157.5
 2.   Princes Bridge                          22398   15333   684.6
 3.   I-Hub Southern Cross Station - Lons...  15663   12462   795.6
 4.   Spencer St-Collins St (North)           21668   19939   920.2
 5.   The Arts Centre                         17140   22343   1303.5
 6.   Bourke St - Spencer St (North)          15910   24587   1545.3
 7.   Southern Cross Station                  13071   21942   1678.7
 8.   Flinders La-Swanston St (West)          34862   61136   1753.7
 9.   Building 80 RMIT                        17030   30711   1803.3
10.   Southbank                               3

In [37]:
print("\n DETAILED ANALYSIS OF CRITICAL GAPS")
print("-" * 50)

# Focus on critical gaps (lowest capacity ratios)
critical_gaps = refined_capacity_df[refined_capacity_df['capacity_category'] == 'Critical Gap'].copy()

if len(critical_gaps) > 0:
    print(f"Found {len(critical_gaps)} hotspots with critical capacity gaps:")
    print()
    
    for i, (_, gap) in enumerate(critical_gaps.iterrows(), 1):
        print(f"{i}. {gap['sensor_description']}")
        print(f"    Daily foot traffic: {gap['avg_daily_traffic']:,} people")
        print(f"    Nearby venues (200m): {gap['nearby_venues']}")
        print(f"    Total seats available: {gap['total_seats']:,}")
        print(f"    Capacity ratio: {gap['seats_per_1000_visitors']:.0f} seats per 1000 visitors")
        print(f"    Venue breakdown: {gap['cafes_restaurants']} cafes/restaurants, {gap['pubs_bars']} pubs/bars")
        print(f"    Indoor/Outdoor: {gap['indoor_seats']:,} indoor, {gap['outdoor_seats']:,} outdoor")
        print()
else:
    print("No hotspots with critical capacity gaps found.")

# Also analyze the worst 3 regardless of category
print(f"\n  WORST 3 CAPACITY RATIOS:")
print("-" * 40)
worst_3 = refined_capacity_df.nsmallest(3, 'seats_per_1000_visitors')
for i, (_, row) in enumerate(worst_3.iterrows(), 1):
    print(f"{i}. {row['sensor_description']}")
    print(f"   • {row['avg_daily_traffic']:,} daily visitors")
    print(f"   • {row['total_seats']:,} seats within 200m")
    print(f"   • {row['seats_per_1000_visitors']:.0f} seats per 1000 visitors")
    print(f"   • {row['nearby_venues']} nearby venues")
    print()


 DETAILED ANALYSIS OF CRITICAL GAPS
--------------------------------------------------
No hotspots with critical capacity gaps found.

  WORST 3 CAPACITY RATIOS:
----------------------------------------
1. Melbourne Convention Exhibition Centre
   • 15,947.721577726219 daily visitors
   • 2,512 seats within 200m
   • 158 seats per 1000 visitors
   • 9 nearby venues

2. Princes Bridge
   • 22,397.614395886892 daily visitors
   • 15,333 seats within 200m
   • 685 seats per 1000 visitors
   • 155 nearby venues

3. I-Hub Southern Cross Station - Lonsdale Street Entrance - South
   • 15,662.757446808511 daily visitors
   • 12,462 seats within 200m
   • 796 seats per 1000 visitors
   • 266 nearby venues

