<a href="https://colab.research.google.com/github/dillibabu-06/build-your-own-x/blob/master/geo_spatial_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Perform geo-spatial analysis to determine optimal routing paths and identify suitable locations for micro-hubs. This will involve preparing or simulating necessary GIS data (e.g., weather patterns, road networks, farm locations), conducting spatial analysis (e.g., proximity, network, overlay analysis), visualizing the results on a map with appropriate legends, and providing recommendations for optimal routing strategies and micro-hub site selections based on the analysis.

## Prepare GIS Data

### Subtask:
Acknowledge the need for and, if not provided, simulate or describe the types of GIS data (e.g., weather patterns, road network data, farm locations) that would be used for geo-spatial analysis.


**Reasoning**:
First, I will import the necessary libraries, pandas and numpy, which are essential for data manipulation and numerical operations, then create the `df_farms` dataframe by simulating farm locations and their produce volumes.



In [1]:
import pandas as pd
import numpy as np

# Set a seed for reproducibility
np.random.seed(42)

# 1. Simulate farm locations (df_farms)
num_farms = 50
df_farms = pd.DataFrame({
    'farm_id': [f'farm_{i:03d}' for i in range(num_farms)],
    'latitude': np.random.uniform(34.0, 34.5, num_farms), # Example latitude range
    'longitude': np.random.uniform(-119.0, -118.5, num_farms), # Example longitude range
    'produce_volume': np.random.randint(100, 1000, num_farms) # Random produce volume
})

print("df_farms created:")
print(df_farms.head())

df_farms created:
    farm_id   latitude   longitude  produce_volume
0  farm_000  34.187270 -118.515208             674
1  farm_001  34.475357 -118.612434             963
2  farm_002  34.365997 -118.530251             842
3  farm_003  34.299329 -118.552586             340
4  farm_004  34.078009 -118.701050             663


**Reasoning**:
Next, I will simulate potential micro-hub locations by creating a DataFrame named `df_hubs` as per the instructions.



In [3]:
num_hubs = 5
df_hubs = pd.DataFrame({
    'hub_id': [f'hub_{i:02d}' for i in range(num_hubs)],
    'latitude': np.random.uniform(34.1, 34.4, num_hubs), # Example latitude range for hubs
    'longitude': np.random.uniform(-118.9, -118.6, num_hubs) # Example longitude range for hubs
})

print("df_hubs created:")
print(df_hubs.head())

df_hubs created:
   hub_id   latitude   longitude
0  hub_00  34.119468 -118.855574
1  hub_01  34.176175 -118.600678
2  hub_02  34.174063 -118.819966
3  hub_03  34.308891 -118.607016
4  hub_04  34.313681 -118.776689


**Reasoning**:
As per the instructions, I will simulate basic weather pattern data by creating a DataFrame named `df_weather`. This DataFrame will include columns for `location_id`, `weather_condition`, and `date` or `time_period`, assigning random weather conditions to a sample of locations using both farm and hub IDs.



In [5]:
num_weather_entries = 100
locations = pd.concat([df_farms['farm_id'], df_hubs['hub_id']]).sample(num_weather_entries, replace=True).reset_index(drop=True)
weather_conditions = ['clear', 'rainy', 'snowy', 'cloudy', 'foggy']
dates = pd.to_datetime(pd.date_range('2023-01-01', periods=num_weather_entries, freq='D'))

df_weather = pd.DataFrame({
    'location_id': locations,
    'date': np.random.choice(dates, num_weather_entries, replace=True),
    'weather_condition': np.random.choice(weather_conditions, num_weather_entries)
})

print("df_weather created:")
print(df_weather.head())

df_weather created:
  location_id       date weather_condition
0    farm_031 2023-03-10             rainy
1    farm_026 2023-04-09             clear
2    farm_019 2023-01-12             clear
3    farm_023 2023-01-25             clear
4    farm_011 2023-02-21             foggy


A detailed road network dataset would typically be needed for precise routing, including information like road segments, speed limits, traffic conditions, and turn restrictions. For this exercise, the 'road network' is implicitly represented by the geographical coordinates of farms and hubs, which allows for basic distance calculations between them. More advanced network analysis, such as finding optimal routes considering actual travel time and road conditions, would require an actual road network dataset (e.g., from OpenStreetMap or other GIS sources).

A detailed road network dataset would typically be needed for precise routing, including information like road segments, speed limits, traffic conditions, and turn restrictions. For this exercise, the 'road network' is implicitly represented by the geographical coordinates of farms and hubs, which allows for basic distance calculations between them. More advanced network analysis, such as finding optimal routes considering actual travel time and road conditions, would require an actual road network dataset (e.g., from OpenStreetMap or other GIS sources).

## Perform Spatial Analysis

### Subtask:
Conduct geo-spatial analysis to identify optimal routing paths and suitable locations for micro-hubs through proximity analysis.


**Reasoning**:
The first step is to install the `geopy` library, which is necessary for calculating geodesic distances, and then import the `geodesic` function as per the instructions. I will use `!pip install` for installation and `from geopy.distance import geodesic` for import.



In [6]:
!pip install geopy
from geopy.distance import geodesic
print("geopy installed and geodesic imported.")

geopy installed and geodesic imported.


**Reasoning**:
As per the instructions, I will define a function `calculate_distance` to compute the geodesic distance between two points, then initialize an empty list to store farm-to-hub assignments. Subsequently, I will iterate through each farm and each hub to find the closest hub for each farm, storing the farm ID, assigned hub ID, and the minimum distance in the list.



In [7]:
def calculate_distance(point1, point2):
    """Calculates geodesic distance between two points in kilometers."""
    return geodesic(point1, point2).km

# List to store farm-to-hub assignments and distances
farm_hub_assignments = []

# Iterate through each farm to find the closest hub
for index, farm in df_farms.iterrows():
    farm_location = (farm['latitude'], farm['longitude'])
    min_distance = float('inf')
    assigned_hub_id = None

    for _, hub in df_hubs.iterrows():
        hub_location = (hub['latitude'], hub['longitude'])
        distance = calculate_distance(farm_location, hub_location)

        if distance < min_distance:
            min_distance = distance
            assigned_hub_id = hub['hub_id']

    farm_hub_assignments.append({
        'farm_id': farm['farm_id'],
        'assigned_hub_id': assigned_hub_id,
        'distance_to_hub_km': min_distance
    })

print("Farm-to-hub assignments calculated.")

Farm-to-hub assignments calculated.


**Reasoning**:
As per the instructions, I will convert the `farm_hub_assignments` list into a new DataFrame named `df_farm_assignments`.



In [8]:
df_farm_assignments = pd.DataFrame(farm_hub_assignments)

print("df_farm_assignments created:")
print(df_farm_assignments.head())

df_farm_assignments created:
    farm_id assigned_hub_id  distance_to_hub_km
0  farm_000          hub_01            7.974834
1  farm_001          hub_03           18.472694
2  farm_002          hub_03            9.488115
3  farm_003          hub_03            5.121482
4  farm_004          hub_01           14.293316


**Reasoning**:
As per the instructions, I will merge `df_farm_assignments` with `df_farms` on `farm_id` to add the `produce_volume` to the assignment DataFrame.



In [9]:
df_merged_assignments = pd.merge(df_farm_assignments, df_farms[['farm_id', 'produce_volume']], on='farm_id', how='left')

print("df_merged_assignments created:")
print(df_merged_assignments.head())

df_merged_assignments created:
    farm_id assigned_hub_id  distance_to_hub_km  produce_volume
0  farm_000          hub_01            7.974834             674
1  farm_001          hub_03           18.472694             963
2  farm_002          hub_03            9.488115             842
3  farm_003          hub_03            5.121482             340
4  farm_004          hub_01           14.293316             663


**Reasoning**:
As per the instructions, I will group the `df_merged_assignments` DataFrame by `assigned_hub_id` and sum the `produce_volume` to calculate the total produce volume for each hub. This will be stored in a new DataFrame named `df_hub_volumes`.



In [10]:
df_hub_volumes = df_merged_assignments.groupby('assigned_hub_id')['produce_volume'].sum().reset_index()

print("df_hub_volumes created:")
print(df_hub_volumes.head())

df_hub_volumes created:
  assigned_hub_id  produce_volume
0          hub_00            5391
1          hub_01            7034
2          hub_02            3098
3          hub_03            5771
4          hub_04            7784


## Visualize Geo-Spatial Results

### Subtask:
Generate a visualization, such as a map, showing potential micro-hub locations, key geographical features, and optimized routes based on the spatial analysis. Ensure the visualization includes appropriate legends to explain different elements.


**Reasoning**:
As per the instructions, I will install the `folium` library, which is required for generating interactive maps, and then import it to make its functionalities available for use.



In [11]:
!pip install folium
import folium
print("folium installed and imported.")

folium installed and imported.


**Reasoning**:
As per the instructions, I will create a base `folium` map, centered on the approximate average latitude and longitude of the farms and hubs, to serve as the foundation for the visualization.



In [12]:
map_center_lat = (df_farms['latitude'].mean() + df_hubs['latitude'].mean()) / 2
map_center_lon = (df_farms['longitude'].mean() + df_hubs['longitude'].mean()) / 2

m = folium.Map(location=[map_center_lat, map_center_lon], zoom_start=10)

print("Base Folium map created.")

Base Folium map created.


**Reasoning**:
As per the instructions, I will iterate through `df_farms` and add markers for each farm to the map. Each marker will display the `farm_id` and `produce_volume` when clicked, using a `folium.CircleMarker` with a distinct color to represent farms.



In [13]:
for index, farm in df_farms.iterrows():
    folium.CircleMarker(
        location=[farm['latitude'], farm['longitude']],
        radius=5, # Adjust radius as needed
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        popup=f"Farm ID: {farm['farm_id']}<br>Produce Volume: {farm['produce_volume']}"
    ).add_to(m)

print("Farm markers added to the map.")

Farm markers added to the map.


**Reasoning**:
As per the instructions, I will iterate through `df_hubs` and add distinct markers for each hub to the map. Each marker will display the `hub_id` and the total `produce_volume` assigned to it (from `df_hub_volumes`) when clicked, using a different color or icon to distinguish them from farm markers.



In [14]:
for index, hub in df_hubs.iterrows():
    # Get total produce volume for the current hub
    hub_volume = df_hub_volumes[df_hub_volumes['assigned_hub_id'] == hub['hub_id']]['produce_volume'].iloc[0]
    folium.Marker(
        location=[hub['latitude'], hub['longitude']],
        icon=folium.Icon(color='red', icon='warehouse', prefix='fa'), # Distinct icon and color for hubs
        popup=f"Hub ID: {hub['hub_id']}<br>Total Produce Volume: {hub_volume}"
    ).add_to(m)

print("Hub markers added to the map.")

Hub markers added to the map.


**Reasoning**:
As per the instructions, I will iterate through `df_merged_assignments` to draw lines connecting each farm to its assigned hub on the map. This requires retrieving the coordinates of both the farm and the hub for each assignment and then using `folium.PolyLine` to visualize the routes.



In [15]:
for index, assignment in df_merged_assignments.iterrows():
    farm_id = assignment['farm_id']
    hub_id = assignment['assigned_hub_id']

    # Get farm coordinates
    farm_coords = df_farms[df_farms['farm_id'] == farm_id][['latitude', 'longitude']].values[0]

    # Get hub coordinates
    hub_coords = df_hubs[df_hubs['hub_id'] == hub_id][['latitude', 'longitude']].values[0]

    # Draw a line connecting the farm to its assigned hub
    folium.PolyLine(
        locations=[farm_coords, hub_coords],
        color='green',
        weight=1.5,
        opacity=0.7,
        popup=f"Farm: {farm_id} to Hub: {hub_id}<br>Distance: {assignment['distance_to_hub_km']:.2f} km"
    ).add_to(m)

print("Lines connecting farms to assigned hubs added to the map.")

Lines connecting farms to assigned hubs added to the map.


**Reasoning**:
As per the instructions, I will add a `folium.LayerControl()` to the map to enable users to toggle the visibility of different layers, such as farm markers, hub markers, and the routing lines.



In [16]:
folium.LayerControl().add_to(m)

print("Layer control added to the map.")

Layer control added to the map.


**Reasoning**:
As per the instructions, I will display the generated `folium` map, which includes farm and hub markers, optimal routing lines, and a layer control for interactive viewing.



In [17]:
m

## Interpret and Recommend

### Subtask:
Interpret the results of the geo-spatial analysis to provide recommendations for optimal routing strategies and ideal site selections for micro-hubs, considering factors like accessibility, cost, and efficiency.


**Reasoning**:
As per the instructions, I will analyze `df_hub_volumes` to identify hubs with the highest and lowest produce volumes and calculate the mean and standard deviation to understand the distribution and potential disparities.



In [18]:
print("Hub Produce Volumes:")
print(df_hub_volumes)

print("\nDescriptive Statistics for Hub Produce Volumes:")
print(df_hub_volumes['produce_volume'].describe())


Hub Produce Volumes:
  assigned_hub_id  produce_volume
0          hub_00            5391
1          hub_01            7034
2          hub_02            3098
3          hub_03            5771
4          hub_04            7784

Descriptive Statistics for Hub Produce Volumes:
count       5.000000
mean     5815.600000
std      1797.687487
min      3098.000000
25%      5391.000000
50%      5771.000000
75%      7034.000000
max      7784.000000
Name: produce_volume, dtype: float64


### Visual Assessment of the Map and Initial Interpretations

The Folium map (`m`) provides a visual representation of the farm locations (blue circles), micro-hub locations (red warehouses), and the assigned routes (green lines) connecting each farm to its closest hub.

**Observations from the map:**

*   **Hub Distribution:** The hubs appear somewhat spread out, attempting to cover the simulated farm locations.
*   **Farm Clusters:** Some areas show denser clusters of farms, while others are more sparse.
*   **Routing Lines:** The green lines indicate the current assignment of farms to their closest hubs. Most lines appear reasonably direct, reflecting the proximity-based assignment.
*   **Potential Gaps/Coverage Issues:** By observing the map, one can identify if certain farm clusters are far from any existing hub, potentially leading to longer routes than ideal. Conversely, some hubs might have very few farms assigned, suggesting underutilization or an opportunity to consolidate services.

**Initial Interpretation:**

Hubs 'hub_04' and 'hub_01' appear to be handling the largest volumes, suggesting they are centrally located to dense farm clusters or cover a larger geographical area with many farms. 'hub_02' has the lowest volume, which might indicate it's in a less dense farm area or its service area is smaller, potentially making it a candidate for re-evaluation or relocation if efficiency is a primary concern. The current routing strategy minimizes individual farm-to-hub distances, which is a good starting point for efficiency. However, aggregate route efficiency and traffic patterns are not yet considered.

### Opportunities for Optimization and New Micro-Hub Locations

Based on the analysis of `df_hub_volumes` and the visual assessment of the Folium map, several opportunities for optimizing the current micro-hub network and routing strategies can be identified:

1.  **Re-evaluating 'hub_02'**: With a total produce volume of 3098, 'hub_02' handles significantly less volume compared to 'hub_04' (7784) and 'hub_01' (7034). This disparity suggests that 'hub_02' might be underutilized or located in an area with sparse farm density. Recommendations could include:
    *   **Relocation**: If there are nearby denser farm clusters currently served by more distant hubs, relocating 'hub_02' to such a cluster could balance the load and reduce overall transportation distances.
    *   **Consolidation**: If 'hub_02' is truly in a low-density area with no prospects for increased farm activity, its operations could potentially be consolidated with a neighboring hub, saving operational costs.

2.  **Addressing High-Volume Hubs ('hub_04' and 'hub_01')**: These hubs are handling the largest volumes, indicating they serve significant farm clusters. While this is efficient for aggregation, it might lead to:
    *   **Capacity Strain**: If actual hub capacity is a concern, these hubs might become bottlenecks during peak seasons. Future planning should consider expanding their capacity or introducing new, smaller 'satellite' hubs in their periphery to offload some volume.
    *   **Longer Routes for Peripheral Farms**: Farms at the edges of the service areas for these high-volume hubs might still incur relatively long travel distances. Investigating the routes for these farms could reveal opportunities for new hub placements.

3.  **Identifying Underserved Farm Clusters**: A detailed visual inspection of the map might reveal clusters of blue farm markers that are notably far from any red hub marker. These represent potential 'cold spots' where a new micro-hub could drastically reduce transportation distances and improve accessibility for those farms. This would require a more granular spatial analysis focusing on farm density and current assignment distances.

4.  **Optimizing Routing for Long-Distance Assignments**: While the current proximity-based assignment minimizes individual farm-to-hub distances, it doesn't account for multi-stop routes or actual road network efficiency. Farms with unusually long green lines on the map warrant closer inspection. For these, alternative routing strategies or the introduction of intermediate collection points could be considered.

5.  **Strategic Placement of New Hubs**: To select ideal sites for new micro-hubs, the analysis should focus on:
    *   **Centroid of High-Density Farm Clusters**: Calculate the centroid of identified underserved farm clusters to propose a geographically central new hub location.
    *   **Accessibility**: New sites should have good road access to minimize last-mile delivery challenges.
    *   **Future Growth**: Consider areas with potential for agricultural expansion.

By leveraging the insights from `df_hub_volumes` and the visual map, the goal is to achieve a more balanced distribution of produce volume among hubs, reduce overall transportation costs, and improve the efficiency and responsiveness of the supply chain.

## Final Task

### Subtask:
Summarize the geo-spatial analysis process and its implications for informing routing and micro-hub site selection.


## Summary:

### Data Analysis Key Findings

*   **Simulated Data Foundation**: The analysis commenced by simulating essential GIS data including 50 farm locations with varying produce volumes, 5 potential micro-hub locations, and basic weather patterns. It was acknowledged that a detailed road network, though implicitly represented by coordinates for this exercise, would be crucial for real-world advanced routing.
*   **Proximity-Based Farm-to-Hub Assignment**: A spatial analysis assigned each of the 50 farms to its closest micro-hub based on geodesic distance. This resulted in a distribution of produce volumes across the hubs.
*   **Uneven Hub Load Distribution**: The produce volume assigned to each micro-hub varied significantly, highlighting potential inefficiencies:
    *   'Hub\_04' processed the highest volume at 7,784 units.
    *   'Hub\_01' processed the second highest at 7,034 units.
    *   'Hub\_02' processed the lowest volume at 3,098 units, which is significantly below the average of 5,815.6 units and indicates potential underutilization or suboptimal placement.
*   **Interactive Spatial Visualization**: An interactive map was successfully generated, visually representing farm locations (blue circles), micro-hub locations (red warehouse icons), and the proximity-based routes (green lines) connecting farms to their assigned hubs. Pop-ups on the map provided details on farm produce volume and total hub produce volume, aiding visual assessment.

### Insights or Next Steps

*   **Optimize Hub Network for Balanced Load**: The significant disparity in produce volume handled by hubs (e.g., 'Hub\_02' processing less than half of 'Hub\_04') suggests an opportunity to re-evaluate or relocate underutilized hubs to achieve a more balanced distribution of load, reduce overall transportation costs, and improve system efficiency.
*   **Refine Routing and Site Selection with Real-World Constraints**: While current routing minimizes individual farm-to-hub distances, future steps should incorporate actual road network data, traffic conditions, and multi-stop route optimization to identify truly optimal routing paths and more strategically select new micro-hub locations based on accessibility and capacity needs in underserved or high-density farm clusters.
