<div style="background-color: #b8deff; padding: 10px; border-radius: 5px; font-family: Aptos; ">
    <h1><b> Cultural Tourism Route Optimization </b></h1>


<ul>
    <li><b>Authored by:</b> Uvini Wijesinghe</li>
    <li><b>Duration:</b> 10 Weeks</li>
    <li><b>Level:</b> Intermediate</li>
    <li><b>Pre-requisite Skills:</b> Python</li>
</ul>

</div>
</div>

<div style="font-family: Aptos; font-size: 16px;">
Creating optimized cultural tourism routes in Melbourne involves integrating data from multiple sources, including public memorials, sculptures, artworks, fountains, monuments, and landmarks, along with key transport infrastructure such as City Circle tram stops and Melbourne Visitor Shuttle bus stops. By analyzing pedestrian movement patterns, the objective is to design routes that maximize visitor engagement by guiding them through high-interest cultural sites while ensuring accessibility and efficiency. <br>

<br><b>User Story</b>

<ul>
    <li><b>Title:</b> Optimized Cultural Tourism Routes in Melbourne</li>
    <li><b>As a:</b> Tourism Planner/City Developer </li>
    <li><b>I want to:</b> Integrate data from cultural landmarks, transport infrastructure (City Circle tram stops and Melbourne Visitor Shuttle bus stops), and pedestrian movement patterns to create optimized cultural tourism routes.</li>
    <li><b>So that:</b> Visitors can experience a diverse range of cultural sites efficiently, while being guided through high-traffic pedestrian zones and accessible transport hubs to maximize engagement with public artworks, fountains, and monuments.</li>
</ul>

<br><b>Acceptance Criteria:</b>
1. All relevant public memorials, sculptures, artworks, fountains, monuments, and landmarks in Melbourne must be identified, mapped, and included in the dataset.
2. Data for City Circle tram stops, Melbourne Visitor Shuttle bus stops, and pedestrian pathways must be included to ensure routes are accessible via public transport.
3. High-footfall areas must be identified through pedestrian counting data to help determine the most popular areas and to adjust routes accordingly to optimize visitor engagement.
4. Optimized routes should guide visitors through high-interest cultural sites while ensuring accessibility to transport hubs and high pedestrian traffic zones.
5. Routes should cover the highest number of cultural landmarks while maintaining a smooth, logical flow for visitors.
6. The system should provide suggestions for areas where new cultural landmarks, public artworks, or monuments could be developed to encourage visitor traffic in underutilized spaces.
7. The final solution should have a user-friendly interface for tourists, displaying routes, landmarks, and transport stops in a clear and interactive map format.

</div>

In [14]:
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud

import folium
from folium.plugins import MarkerCluster

<div style="background-color: #def0ff; padding: 10px; border-radius: 5px; font-family: Aptos; ">
    <h3><b> 🚂 Train Routes </b></h3>
</div>

#### Train Routes

In [18]:
metro_train_routes = pd.read_csv("Datasets/gtfs/Metro Train/routes.txt", delimiter=",") 

# Split based on 'aus:vic:vic-' and take the second part
metro_train_routes['train_id'] = metro_train_routes['route_id'].str.extract(r'aus:vic:vic-(.*?):?$', expand=False)

metro_train_routes = metro_train_routes[['train_id', 'route_short_name', 'route_long_name']]

metro_train_routes = metro_train_routes.drop_duplicates()

metro_train_routes.head()

Unnamed: 0,train_id,route_short_name,route_long_name
0,02-ALM,Alamein,Alamein - City
1,02-BEG,Belgrave,Belgrave - City
2,02-CBE,Cranbourne,Cranbourne - City
3,02-CCL,City Circle,
4,02-CGB,Craigieburn,Craigieburn - City


#### Train Stops

In [21]:
metro_train_stops = pd.read_csv("Datasets/gtfs/Metro Train/stops.txt", delimiter=",")

metro_train_stops = metro_train_stops[['stop_id', 'stop_name', 'stop_lat','stop_lon']]

metro_train_stops = metro_train_stops.drop_duplicates()

metro_train_stops['stop_id'] = metro_train_stops['stop_id'].astype(str).str.strip()

metro_train_stops.head()

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon
0,10117,Jordanville Station,-37.873763,145.112473
1,10920,Flagstaff Station,-37.81188,144.956043
2,10921,Flagstaff Station,-37.811725,144.955968
3,10922,Melbourne Central Station,-37.809974,144.962547
4,10923,Melbourne Central Station,-37.809865,144.962516


#### Train Times

metro_train_times = pd.read_csv("Datasets/gtfs/Metro Train/stop_times.txt", delimiter=",")

metro_train_times['train_id'] = metro_train_times['trip_id'].str.extract(r'(^[^-]+-[^-]+)')

metro_train_times = metro_train_times[['trip_id', 'train_id', 'stop_id', 'stop_sequence']]

metro_train_times = metro_train_times.drop_duplicates()

metro_train_times['stop_id'] = metro_train_times['stop_id'].astype(str).str.strip()

metro_train_times.head()

#### Trip Ids with Highest Stop Count

In [26]:
# Find the highest stop_sequence for each train_id
highest_seq_per_train = metro_train_times.loc[
    metro_train_times.groupby('train_id')['stop_sequence'].idxmax(),
    ['train_id', 'trip_id', 'stop_sequence']
].rename(columns={'stop_sequence': 'max_sequence'})

# Get unique trip_ids
unique_trip_ids = highest_seq_per_train['trip_id'].unique()

# Filter metro_train_times for those trip_ids
filtered_metro_train_times = metro_train_times[metro_train_times['trip_id'].isin(unique_trip_ids)]
filtered_metro_train_times.head(5)

Unnamed: 0,trip_id,train_id,stop_id,stop_sequence
2539,02-ALM--16-T5-2801,02-ALM,11213,1
2540,02-ALM--16-T5-2801,02-ALM,22189,2
2541,02-ALM--16-T5-2801,02-ALM,12196,3
2542,02-ALM--16-T5-2801,02-ALM,12198,4
2543,02-ALM--16-T5-2801,02-ALM,12200,5


#### Final Train Stops and Routes Dataset

In [30]:
# Trim spaces and convert stop_id to string for consistency
filtered_metro_train_times['stop_id'] = filtered_metro_train_times['stop_id'].astype(str).str.strip()
metro_train_stops['stop_id'] = metro_train_stops['stop_id'].astype(str).str.strip()

result = filtered_metro_train_times.merge(metro_train_stops, on='stop_id', how='left')
result.head(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_metro_train_times['stop_id'] = filtered_metro_train_times['stop_id'].astype(str).str.strip()


Unnamed: 0,trip_id,train_id,stop_id,stop_sequence,stop_name,stop_lat,stop_lon
0,02-ALM--16-T5-2801,02-ALM,11213,1,Flinders Street Station,-37.818307,144.96601
1,02-ALM--16-T5-2801,02-ALM,22189,2,Southern Cross Station,-37.818535,144.952144


<div style="background-color: #def0ff; padding: 10px; border-radius: 5px; font-family: Aptos; ">
    <h3><b> 🚶🏻‍♂️Pedestrians </b></h3>
</div>

#### Import and Clean Pedestrians dataset

In [None]:
ped_counts = pd.read_csv("Datasets/pedestrian-counting-system-monthly-counts-per-hour.csv")
ped_counts.head(2)

In [None]:
# Split 'Location' column into separate latitude and longitude
ped_counts[['Latitude', 'Longitude']] = ped_counts['Location'].str.split(',', expand=True)

# Select required columns for the new DataFrame
date_counts = ped_counts[['Location_ID', 'Sensing_Date', 'Total_of_Directions', 'Sensor_Name', 'Latitude', 'Longitude']]

# Convert 'Latitude' and 'Longitude' to float
date_counts['Latitude'] = date_counts['Latitude'].astype(float)
date_counts['Longitude'] = date_counts['Longitude'].astype(float)

# Display the new DataFrame
date_counts.head(5)

In [None]:
# Group by 'Sensor_Name' and sum 'Total_of_Directions'
sensor_count = date_counts.groupby(['Location_ID', 'Sensor_Name', 'Latitude', 'Longitude'], as_index=False)['Total_of_Directions'].sum()

# Display the result
sensor_count.head()

#### Exploratory Data Analysis For Pedestrian Data

In [None]:

# Create a base map centered on the average coordinates (Melbourne)
m = folium.Map(location=[sensor_count['Latitude'].mean(), sensor_count['Longitude'].mean()], zoom_start=15)

# Initialize MarkerCluster for better performance when there are many markers
marker_cluster = MarkerCluster().add_to(m)

# Add bubble markers with size based on the count
for _, row in sensor_count.iterrows():
    # Set the bubble size directly based on the count
    bubble_size = row['Total_of_Directions'] / 500000 # Divide by 1 million for readability
    
    folium.CircleMarker(
        location=[row['Latitude'], row['Longitude']],
        radius=bubble_size,  # Size based directly on count
        color="blue",  # Color can be dynamic based on intensity
        fill=True,
        fill_opacity=0.6,
        fill_color="blue",  # You can adjust this to a gradient for more color intensity
        popup=f"<b>Sensor Name:</b> {row['Sensor_Name']}<br><b>Total of Directions:</b> {row['Total_of_Directions']}"
    ).add_to(marker_cluster)

# Display the map
m

<div style="background-color: #def0ff; padding: 10px; border-radius: 5px; font-family: Aptos; ">
    <h3><b> 🧑🏻‍🎨 Public Artworks, Fountains and Monuments </b></h3>
</div>

#### Import Artworks, Fountains and Monuments dataset

In [None]:
places = pd.read_csv("Datasets/public-artworks-fountains-and-monuments.csv")
places.head(2)

#### Exploratory Data Analysis For Artworks, Fountains and Monuments

##### Pie Chart: Different Types of Assests

In [None]:
# Count frequency of each Asset Type
asset_counts = places['Asset Type'].value_counts()

# Plot pie chart
plt.figure(figsize=(8, 8))
plt.pie(asset_counts, labels=asset_counts.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired.colors)
plt.title('Distribution of Asset Types')
plt.axis('equal')  # Equal aspect ratio ensures the pie chart is circular.
plt.show()

##### Bar Chart: Amount of Artworks Maneged by Different Organizations

In [None]:
# Count frequency of each Xorg
xorg_counts = places['Xorg'].value_counts()

# Plot bar chart
plt.figure(figsize=(10, 6))
xorg_counts.plot(kind='bar', color='skyblue', edgecolor='black')

plt.title('Number of Artworks by Xorg')
plt.xlabel('Xorg')
plt.ylabel('Number of Artworks')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

##### Word Cloud: Most Common Artists

In [None]:

# Drop NaN values from 'Artist' column
artists = places['Artist'].dropna()

# Combine all artist names into a single string
artist_text = " ".join(artists)

# Generate the word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white', colormap='viridis').generate(artist_text)

# Plot the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Artists')
plt.tight_layout()
plt.show()

##### Map: Map Visualization of each Artworks, Fountains and Monuments

In [None]:

# Convert 'Co-ordinates' column to separate latitude and longitude
places[['Latitude', 'Longitude']] = places['Co-ordinates'].str.split(',', expand=True)
places['Latitude'] = places['Latitude'].astype(float)
places['Longitude'] = places['Longitude'].astype(float)

# Define color mapping for different Asset Types
asset_colors = {
    "Art": "blue",
    "Monument": "green",
    "Sculpture": "purple",
    "Panel": "orange"
}

# Create a base map centered on Melbourne
m = folium.Map(location=[-37.81, 144.96], zoom_start=13)

# Add markers with detailed tooltip
for _, row in places.iterrows():
    asset_type = row['Asset Type']
    color = asset_colors.get(asset_type, "gray")  # Default color if type is missing

    # Construct the tooltip with bold labels
    tooltip = f"""
    <b>Asset Type:</b> {row['Asset Type']}<br>
    <b>Name:</b> {row['Name']}<br>
    <b>Organization:</b> {row['Xorg']}<br>
    <b>Artist:</b> {row['Artist']}<br>
    <b>Year:</b> {row['Art Date']}
    """

    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=row['Name'],
        tooltip=tooltip,
        icon=folium.Icon(color=color)
    ).add_to(m)

# Display the map
m
