# SafeRoute AI - Data Analysis & Street Network Setup

## üìã Notebook Overview

This notebook analyzes Toronto crime data and prepares the street network for risk-based routing.

### **What's in this notebook:**
1. **Data Loading** - Load GeoJSON with neighborhood crime data
2. **Data Analysis** - Understand available features and crime statistics
3. **Risk Scoring** - Calculate weighted risk scores by neighborhood
4. **CRS/Projection** - Check and match coordinate systems
5. **OSM Network Download** - Get Toronto street network from OpenStreetMap
6. **Spatial Join** - Match streets to neighborhoods for risk assignment

### **Key Outputs:**
- Risk scores for all 158 Toronto neighborhoods
- Toronto street network (nodes + edges)
- CRS compatibility verification
- Ready for routing implementation

---


In [1]:
import pandas as pd

df = pd.read_csv("Neighbourhood_Crime_Rates_Open_Data_6759951416839911996.csv")


In [3]:
df.value_counts()

OBJECTID_1  NEIGHBOURHOOD_NAME         HOOD_158  ASSAULT_2014  ASSAULT_2015  ASSAULT_2016  ASSAULT_2017  ASSAULT_2018  ASSAULT_2019  ASSAULT_2020  ASSAULT_2021  ASSAULT_2022  ASSAULT_2023  ASSAULT_2024  ASSAULT_RATE_2014  ASSAULT_RATE_2015  ASSAULT_RATE_2016  ASSAULT_RATE_2017  ASSAULT_RATE_2018  ASSAULT_RATE_2019  ASSAULT_RATE_2020  ASSAULT_RATE_2021  ASSAULT_RATE_2022  ASSAULT_RATE_2023  ASSAULT_RATE_2024  AUTOTHEFT_2014  AUTOTHEFT_2015  AUTOTHEFT_2016  AUTOTHEFT_2017  AUTOTHEFT_2018  AUTOTHEFT_2019  AUTOTHEFT_2020  AUTOTHEFT_2021  AUTOTHEFT_2022  AUTOTHEFT_2023  AUTOTHEFT_2024  AUTOTHEFT_RATE_2014  AUTOTHEFT_RATE_2015  AUTOTHEFT_RATE_2016  AUTOTHEFT_RATE_2017  AUTOTHEFT_RATE_2018  AUTOTHEFT_RATE_2019  AUTOTHEFT_RATE_2020  AUTOTHEFT_RATE_2021  AUTOTHEFT_RATE_2022  AUTOTHEFT_RATE_2023  AUTOTHEFT_RATE_2024  BIKETHEFT_2014  BIKETHEFT_2015  BIKETHEFT_2016  BIKETHEFT_2017  BIKETHEFT_2018  BIKETHEFT_2019  BIKETHEFT_2020  BIKETHEFT_2021  BIKETHEFT_2022  BIKETHEFT_2023  BIKETHEFT_2024  BIKET

In [4]:
df.columns

Index(['OBJECTID_1', 'NEIGHBOURHOOD_NAME', 'HOOD_158', 'ASSAULT_2014',
       'ASSAULT_2015', 'ASSAULT_2016', 'ASSAULT_2017', 'ASSAULT_2018',
       'ASSAULT_2019', 'ASSAULT_2020',
       ...
       'THEFTOVER_RATE_2018', 'THEFTOVER_RATE_2019', 'THEFTOVER_RATE_2020',
       'THEFTOVER_RATE_2021', 'THEFTOVER_RATE_2022', 'THEFTOVER_RATE_2023',
       'THEFTOVER_RATE_2024', 'POPULATION_2024', 'Shape__Area',
       'Shape__Length'],
      dtype='object', length=204)

In [None]:
import geopandas as gpd
import pandas as pd

gdf = gpd.read_file("Neighbourhood_Crime_Rates_Open_Data_-5291801778870948764.geojson")
print("Available columns:")
print(gdf.columns.tolist())
print("\nFirst rows:")
gdf.head()


Columnas disponibles:
['OBJECTID_1', 'AREA_NAME', 'HOOD_ID', 'ASSAULT_2014', 'ASSAULT_2015', 'ASSAULT_2016', 'ASSAULT_2017', 'ASSAULT_2018', 'ASSAULT_2019', 'ASSAULT_2020', 'ASSAULT_2021', 'ASSAULT_2022', 'ASSAULT_2023', 'ASSAULT_2024', 'ASSAULT_RATE_2014', 'ASSAULT_RATE_2015', 'ASSAULT_RATE_2016', 'ASSAULT_RATE_2017', 'ASSAULT_RATE_2018', 'ASSAULT_RATE_2019', 'ASSAULT_RATE_2020', 'ASSAULT_RATE_2021', 'ASSAULT_RATE_2022', 'ASSAULT_RATE_2023', 'ASSAULT_RATE_2024', 'AUTOTHEFT_2014', 'AUTOTHEFT_2015', 'AUTOTHEFT_2016', 'AUTOTHEFT_2017', 'AUTOTHEFT_2018', 'AUTOTHEFT_2019', 'AUTOTHEFT_2020', 'AUTOTHEFT_2021', 'AUTOTHEFT_2022', 'AUTOTHEFT_2023', 'AUTOTHEFT_2024', 'AUTOTHEFT_RATE_2014', 'AUTOTHEFT_RATE_2015', 'AUTOTHEFT_RATE_2016', 'AUTOTHEFT_RATE_2017', 'AUTOTHEFT_RATE_2018', 'AUTOTHEFT_RATE_2019', 'AUTOTHEFT_RATE_2020', 'AUTOTHEFT_RATE_2021', 'AUTOTHEFT_RATE_2022', 'AUTOTHEFT_RATE_2023', 'AUTOTHEFT_RATE_2024', 'BIKETHEFT_2014', 'BIKETHEFT_2015', 'BIKETHEFT_2016', 'BIKETHEFT_2017', 'BIKETHEF

Unnamed: 0,OBJECTID_1,AREA_NAME,HOOD_ID,ASSAULT_2014,ASSAULT_2015,ASSAULT_2016,ASSAULT_2017,ASSAULT_2018,ASSAULT_2019,ASSAULT_2020,...,THEFTOVER_RATE_2017,THEFTOVER_RATE_2018,THEFTOVER_RATE_2019,THEFTOVER_RATE_2020,THEFTOVER_RATE_2021,THEFTOVER_RATE_2022,THEFTOVER_RATE_2023,THEFTOVER_RATE_2024,POPULATION_2024,geometry
0,1,South Eglinton-Davisville,174,55,56,66,73,74,62,74,...,4.915454,14.018037,13.369579,17.041582,24.314138,11.784578,29.877502,21.895412,27403,"POLYGON ((-79.38635 43.69784, -79.38623 43.697..."
1,2,North Toronto,173,53,57,47,61,66,84,80,...,15.913431,36.76741,27.32427,44.651402,11.916821,22.527596,36.672256,30.109901,19927,"POLYGON ((-79.39744 43.70694, -79.39837 43.706..."
2,3,Dovercourt Village,172,62,65,92,105,106,113,91,...,22.38973,30.136368,30.436768,23.027327,15.363343,30.355923,22.052338,51.139683,13688,"POLYGON ((-79.43412 43.66015, -79.43537 43.659..."
3,4,Junction-Wallace Emerson,171,164,159,171,161,163,186,171,...,24.498795,36.736195,16.320536,36.677807,32.307568,31.40457,33.652409,47.570259,27328,"POLYGON ((-79.4387 43.66767, -79.43841 43.6669..."
4,5,Yonge-Bay Corridor,170,387,521,481,602,576,660,377,...,290.095306,353.045013,489.814972,263.812469,188.747726,348.980438,329.405792,289.715118,16568,"POLYGON ((-79.38404 43.64497, -79.38502 43.644..."


In [None]:
print("=" * 80)
print("GEOJSON COLUMN ANALYSIS")
print("=" * 80)

print("\nüìä Available columns:")
for i, col in enumerate(gdf.columns, 1):
    print(f"{i}. {col}")

print("\nüîç Column information:")
print(gdf.info())

print("\nüìà Numeric column statistics:")
print(gdf.describe())

print("\nüèòÔ∏è Geometry types:")
print(gdf.geometry.type.value_counts())

print("\nüó∫Ô∏è Coordinate system:")
print(f"CRS: {gdf.crs}")


AN√ÅLISIS DE COLUMNAS DEL GEOJSON

üìä Columnas disponibles:
1. OBJECTID_1
2. AREA_NAME
3. HOOD_ID
4. ASSAULT_2014
5. ASSAULT_2015
6. ASSAULT_2016
7. ASSAULT_2017
8. ASSAULT_2018
9. ASSAULT_2019
10. ASSAULT_2020
11. ASSAULT_2021
12. ASSAULT_2022
13. ASSAULT_2023
14. ASSAULT_2024
15. ASSAULT_RATE_2014
16. ASSAULT_RATE_2015
17. ASSAULT_RATE_2016
18. ASSAULT_RATE_2017
19. ASSAULT_RATE_2018
20. ASSAULT_RATE_2019
21. ASSAULT_RATE_2020
22. ASSAULT_RATE_2021
23. ASSAULT_RATE_2022
24. ASSAULT_RATE_2023
25. ASSAULT_RATE_2024
26. AUTOTHEFT_2014
27. AUTOTHEFT_2015
28. AUTOTHEFT_2016
29. AUTOTHEFT_2017
30. AUTOTHEFT_2018
31. AUTOTHEFT_2019
32. AUTOTHEFT_2020
33. AUTOTHEFT_2021
34. AUTOTHEFT_2022
35. AUTOTHEFT_2023
36. AUTOTHEFT_2024
37. AUTOTHEFT_RATE_2014
38. AUTOTHEFT_RATE_2015
39. AUTOTHEFT_RATE_2016
40. AUTOTHEFT_RATE_2017
41. AUTOTHEFT_RATE_2018
42. AUTOTHEFT_RATE_2019
43. AUTOTHEFT_RATE_2020
44. AUTOTHEFT_RATE_2021
45. AUTOTHEFT_RATE_2022
46. AUTOTHEFT_RATE_2023
47. AUTOTHEFT_RATE_2024
48. 

In [None]:
print("=" * 80)
print("DATA SAMPLE")
print("=" * 80)

# Show first rows without geometry for better visibility
cols_to_show = [col for col in gdf.columns if col != 'geometry']
print(gdf[cols_to_show].head(3))

print("\nüîç HOOD_ID unique values:")
print(f"Total unique neighborhoods: {gdf['HOOD_ID'].nunique()}")
print(f"Range: {gdf['HOOD_ID'].min()} - {gdf['HOOD_ID'].max()}")

print("\nüîç AREA_NAME examples:")
print(gdf['AREA_NAME'].head(10).tolist())

print("\nüìä Available crime types:")
crime_types = set()
for col in gdf.columns:
    if any(crime in col.upper() for crime in ['ASSAULT', 'AUTOTHEFT', 'BIKETHEFT', 'BREAKENTER', 'HOMICIDE', 'ROBBERY', 'SHOOTING', 'THEFTFROMMV', 'THEFTOVER']):
        crime_type = col.split('_')[0]
        crime_types.add(crime_type)
        
for crime in sorted(crime_types):
    print(f"  - {crime}")


MUESTRA DE DATOS
   OBJECTID_1                  AREA_NAME  HOOD_ID  ASSAULT_2014  ASSAULT_2015  \
0           1  South Eglinton-Davisville      174            55            56   
1           2              North Toronto      173            53            57   
2           3         Dovercourt Village      172            62            65   

   ASSAULT_2016  ASSAULT_2017  ASSAULT_2018  ASSAULT_2019  ASSAULT_2020  ...  \
0            66            73            74            62            74  ...   
1            47            61            66            84            80  ...   
2            92           105           106           113            91  ...   

   THEFTOVER_RATE_2016  THEFTOVER_RATE_2017  THEFTOVER_RATE_2018  \
0            20.788940             4.915454            14.018037   
1            17.290567            15.913431            36.767410   
2            36.979515            22.389730            30.136368   

   THEFTOVER_RATE_2019  THEFTOVER_RATE_2020  THEFTOVER_RATE_2021

## üéØ Data Analysis and Strategy

### üìä **What we HAVE:**

**1. Neighborhood Data (Polygons):**
- `AREA_NAME`: Neighborhood name
- `HOOD_ID`: Unique neighborhood ID (1-158)
- `geometry`: Polygon for each neighborhood
- `POPULATION_2024`: Current population

**2. Crime Data by Year (2014-2024):**
- ASSAULT
- AUTOTHEFT (Auto theft)
- BIKETHEFT (Bike theft)
- BREAKENTER (Break and enter)
- HOMICIDE
- ROBBERY
- SHOOTING
- THEFTFROMMV (Theft from motor vehicle)
- THEFTOVER (Theft over)

**3. Crime Rates (RATE) by year** - Normalized by population

---

### ‚ùå **What we DON'T HAVE:**
- **NO street data** (edges/streets)
- **NO intersections** (nodes)
- **NO street network geometry**

---

### üö® **THE PROBLEM:**
Your GeoJSON only contains **NEIGHBORHOOD POLYGONS**, not the street network. For routing by streets you need:
1. **Nodes**: Street intersections (lat/lon)
2. **Edges**: Street segments connecting nodes
3. **Graph**: Structure that connects everything

---

### üí° **POSSIBLE SOLUTIONS:**


In [None]:
### üó∫Ô∏è OPTION 1: Download Street Network from OpenStreetMap with OSMnx

This is the best option for your project

```python
import osmnx as ox

# Download Toronto street network
place_name = "Toronto, Ontario, Canada"
G = ox.graph_from_place(place_name, network_type='drive')

# This will give you:
# - Nodes: Intersections with coordinates
# - Edges: Streets with attributes (name, length, type, etc.)
# - Graph: NetworkX graph for routing

# Then you can assign risk scores to each edge based on:
# 1. Which neighborhood it's in (spatial join)
# 2. The crime rate of that neighborhood
# 3. Street type (highway, residential, etc.)
# 4. Time of day (day vs night)
```

üì¶ **Required installation:** `pip install osmnx`


üó∫Ô∏è OPCI√ìN 1: Descargar red de calles de Toronto con OSMnx

import osmnx as ox

# Descargar la red de calles de Toronto
place_name = "Toronto, Ontario, Canada"
G = ox.graph_from_place(place_name, network_type='drive')

# Esto te dar√°:
# - Nodes: Intersecciones con coordenadas
# - Edges: Calles con atributos (nombre, longitud, tipo, etc.)
# - Graph: NetworkX graph para routing

# Luego puedes asignar risk scores a cada edge basado en:
# 1. En qu√© vecindario est√° (spatial join)
# 2. El crime rate de ese vecindario
# 3. Tipo de calle (highway, residential, etc.)
# 4. Hora del d√≠a (d√≠a vs noche)


üì¶ Necesitas instalar: pip install osmnx


In [None]:
print("üéØ CALCULATE RISK SCORE BY NEIGHBORHOOD")
print("=" * 80)

# Identify crime columns for 2024 (most recent year)
crime_cols_2024 = [col for col in gdf.columns if '2024' in col and 'RATE' in col]

print(f"\nüìä Using {len(crime_cols_2024)} crime types:")
for col in crime_cols_2024:
    print(f"  - {col}")

# Calculate composite risk score
# Different crimes have different severity weights
weights = {
    'HOMICIDE': 10.0,      # Most severe
    'SHOOTING': 8.0,
    'ASSAULT': 5.0,
    'ROBBERY': 4.0,
    'AUTOTHEFT': 2.0,
    'BREAKENTER': 3.0,
    'THEFTFROMMV': 1.5,
    'THEFTOVER': 1.5,
    'BIKETHEFT': 1.0       # Least severe
}

# Create weighted score
gdf['RISK_SCORE'] = 0
for col in crime_cols_2024:
    crime_type = col.split('_')[0]
    if crime_type in weights:
        gdf['RISK_SCORE'] += gdf[col] * weights[crime_type]

# Normalize to 0-100 scale
gdf['RISK_SCORE_NORMALIZED'] = (gdf['RISK_SCORE'] / gdf['RISK_SCORE'].max()) * 100

print("\nüìà Risk Score Statistics:")
print(gdf['RISK_SCORE_NORMALIZED'].describe())

print("\nüèÜ Top 10 MOST DANGEROUS neighborhoods:")
top_dangerous = gdf.nlargest(10, 'RISK_SCORE_NORMALIZED')[['AREA_NAME', 'RISK_SCORE_NORMALIZED', 'POPULATION_2024']]
print(top_dangerous.to_string(index=False))

print("\nüü¢ Top 10 SAFEST neighborhoods:")
top_safe = gdf.nsmallest(10, 'RISK_SCORE_NORMALIZED')[['AREA_NAME', 'RISK_SCORE_NORMALIZED', 'POPULATION_2024']]
print(top_safe.to_string(index=False))


üéØ CALCULAR RISK SCORE POR VECINDARIO

üìä Usando 9 tipos de crimen:
  - ASSAULT_RATE_2024
  - AUTOTHEFT_RATE_2024
  - BIKETHEFT_RATE_2024
  - BREAKENTER_RATE_2024
  - HOMICIDE_RATE_2024
  - ROBBERY_RATE_2024
  - SHOOTING_RATE_2024
  - THEFTFROMMV_RATE_2024
  - THEFTOVER_RATE_2024

üìà Estad√≠sticas del Risk Score:
count    158.000000
mean      27.083814
std       14.103426
min        8.230158
25%       18.027657
50%       22.755169
75%       32.874901
max      100.000000
Name: RISK_SCORE_NORMALIZED, dtype: float64

üèÜ Top 10 vecindarios M√ÅS PELIGROSOS:
               AREA_NAME  RISK_SCORE_NORMALIZED  POPULATION_2024
      Yonge-Bay Corridor             100.000000            16568
        Mimico-Queensway              96.839992            21790
     Downtown Yonge East              80.512104            23692
    Kensington-Chinatown              72.528427            23271
               Moss Park              60.273511            28930
              University              57.99

## üõ£Ô∏è STRATEGY TO IMPLEMENT ROUTING WITH WEIGHTS

### **STEP 1: Get Street Network (OSMnx)**
```python
import osmnx as ox
import networkx as nx

# Download Toronto streets
G = ox.graph_from_place("Toronto, Ontario, Canada", network_type='drive')
```

### **STEP 2: Assign Risk Scores to Streets**
For each street (edge), determine which neighborhood it's in and assign the risk score:

```python
# Spatial join: Which neighborhood is each street in?
edges_gdf = ox.graph_to_gdfs(G, nodes=False, edges=True)
edges_with_risk = gpd.sjoin(edges_gdf, gdf[['geometry', 'RISK_SCORE_NORMALIZED']], 
                             how='left', predicate='within')

# Assign risk as weight to each edge
for u, v, key, data in G.edges(keys=True, data=True):
    # Get risk score of the edge
    risk = edges_with_risk.loc[(u, v, key), 'RISK_SCORE_NORMALIZED']
    
    # Combine distance with risk
    distance = data['length']  # meters
    risk_weight = risk / 10    # normalize
    
    # Final weight = distance + risk penalty
    data['risk_weight'] = distance * (1 + risk_weight)
```

### **STEP 3: Calculate Optimal Route**
```python
# Find safest route (lowest risk_weight)
origin = (43.65107, -79.347015)  # CN Tower
destination = (43.77152, -79.51080)  # Pearson Airport

route = nx.shortest_path(G, 
                         ox.distance.nearest_nodes(G, origin[1], origin[0]),
                         ox.distance.nearest_nodes(G, destination[1], destination[0]),
                         weight='risk_weight')
```

### **STEP 4: Visualize on Web Map**
```python
# Convert route to GeoJSON
route_gdf = ox.routing.route_to_gdf(G, route)
route_gdf.to_file("safe_route.geojson", driver="GeoJSON")
```

---

### üìã **FEATURES TO COMBINE/USE:**

**‚úÖ USE:**
- All `*_RATE_2024` (rates normalized by population)
- `POPULATION_2024` (for context)
- `AREA_NAME` and `HOOD_ID` (identification)
- `geometry` (for spatial joins)

**‚ùå DISCARD:**
- Absolute counts (e.g., `ASSAULT_2024`) - use RATES instead
- Old years (2014-2023) - use only 2024 or recent average
- `OBJECTID_1` - no value

**üéØ KEY FEATURES FOR RISK SCORE:**
1. `HOMICIDE_RATE_2024` - Weight 10
2. `SHOOTING_RATE_2024` - Weight 8
3. `ASSAULT_RATE_2024` - Weight 5
4. `ROBBERY_RATE_2024` - Weight 4
5. Rest with lower weights


## üó∫Ô∏è COORDINATE REFERENCE SYSTEMS (CRS) - MATCHING PROJECTIONS

### **Your Current GeoJSON CRS:**


In [None]:
print("üîç CHECKING COORDINATE SYSTEM")
print("=" * 80)

print(f"\nüìç Current CRS: {gdf.crs}")
print(f"üìç CRS Name: {gdf.crs.name if gdf.crs else 'None'}")
print(f"üìç EPSG Code: {gdf.crs.to_epsg() if gdf.crs else 'None'}")

# Check bounds in current CRS
bounds = gdf.total_bounds
print(f"\nüìê Bounding Box:")
print(f"  Min X (West): {bounds[0]:.6f}")
print(f"  Min Y (South): {bounds[1]:.6f}")
print(f"  Max X (East): {bounds[2]:.6f}")
print(f"  Max Y (North): {bounds[3]:.6f}")

# Check if it's in lat/lon (WGS84)
if gdf.crs and gdf.crs.to_epsg() == 4326:
    print("\n‚úÖ Already in WGS84 (EPSG:4326) - Standard lat/lon format")
    print("   This matches OSMnx default projection!")
else:
    print(f"\n‚ö†Ô∏è Currently in {gdf.crs.name}")
    print("   OSMnx uses WGS84 (EPSG:4326) by default")
    print("   You'll need to reproject!")


### üìñ **CRS Explanation:**

**EPSG:4326 (WGS84)** - Geographic Coordinate System
- Uses **latitude/longitude** in degrees
- Global standard for GPS and web maps
- What OSMnx downloads by default
- What Leaflet.js uses for web maps
- Range: Lat (-90 to 90), Lon (-180 to 180)

**EPSG:3857 (Web Mercator)** - Projected Coordinate System
- Used by Google Maps, OpenStreetMap tiles
- Uses meters (not degrees)
- Good for visualization, NOT for accurate distance calculations

**EPSG:26917 (UTM Zone 17N)** - Projected Coordinate System
- Common for Toronto area measurements
- Uses meters
- Better for distance/area calculations
- Need to convert to EPSG:4326 for web maps

---

### üéØ **What You Need:**

1. **For OSMnx download:** Use EPSG:4326 (default)
2. **For spatial joins:** Both datasets must be in same CRS
3. **For web map (Leaflet):** Convert everything to EPSG:4326
4. **For accurate distance calculations:** Use projected CRS like UTM


## üöÄ STEP-BY-STEP: Download Toronto Street Network

### Install OSMnx first:
```bash
pip install osmnx
```

Or in notebook:
```python
%pip install osmnx
```


In [None]:
# Download Toronto street network
import osmnx as ox
import networkx as nx

print("üåê DOWNLOADING TORONTO STREET NETWORK FROM OPENSTREETMAP")
print("=" * 80)
print("‚è≥ This may take a few minutes...")

# Option 1: Download by place name (recommended)
place_name = "Toronto, Ontario, Canada"

# network_type options:
# - 'drive': all drivable streets (cars)
# - 'walk': all walkable paths
# - 'bike': all bikeable paths
# - 'all': everything

G = ox.graph_from_place(place_name, network_type='drive')

print(f"\n‚úÖ Downloaded successfully!")
print(f"üìä Network Statistics:")
print(f"   Nodes (intersections): {G.number_of_nodes():,}")
print(f"   Edges (street segments): {G.number_of_edges():,}")

# Check CRS
print(f"\nüìç Network CRS: EPSG:4326 (WGS84)")
print("   ‚úÖ Matches standard lat/lon format")

# Save for later use
print("\nüíæ Saving graph to disk...")
ox.save_graphml(G, "toronto_street_network.graphml")
print("   Saved as: toronto_street_network.graphml")


In [None]:
# Convert graph to GeoDataFrames for analysis
print("üîÑ CONVERTING GRAPH TO GEODATAFRAMES")
print("=" * 80)

# Convert to GeoDataFrames (for spatial operations)
nodes_gdf, edges_gdf = ox.graph_to_gdfs(G, nodes=True, edges=True)

print(f"\nüìç NODES (Intersections):")
print(f"   Total: {len(nodes_gdf):,}")
print(f"   Columns: {list(nodes_gdf.columns)}")
print(f"   CRS: {nodes_gdf.crs}")

print(f"\nüõ£Ô∏è EDGES (Streets):")
print(f"   Total: {len(edges_gdf):,}")
print(f"   Key columns: {[col for col in edges_gdf.columns if col in ['name', 'length', 'highway', 'maxspeed', 'oneway']]}")
print(f"   CRS: {edges_gdf.crs}")

# Show sample edge data
print(f"\nüìã Sample street data:")
sample_cols = ['name', 'length', 'highway', 'maxspeed'] 
available_cols = [col for col in sample_cols if col in edges_gdf.columns]
print(edges_gdf[available_cols].head(5))

# Check CRS match
print(f"\nüîç CRS COMPATIBILITY CHECK:")
print(f"   Neighborhoods CRS: {gdf.crs}")
print(f"   Streets CRS: {edges_gdf.crs}")

if gdf.crs == edges_gdf.crs:
    print("   ‚úÖ CRS MATCH! Ready for spatial join")
else:
    print("   ‚ö†Ô∏è CRS MISMATCH! Need to reproject")
    print(f"   Reprojecting neighborhoods to {edges_gdf.crs}...")
    gdf_reprojected = gdf.to_crs(edges_gdf.crs)
    print("   ‚úÖ Reprojection complete!")


## üéØ NEXT STEPS:

### **1. Spatial Join** - Match streets to neighborhoods
```python
# Join edges with neighborhoods to get risk scores
edges_with_neighborhood = gpd.sjoin(
    edges_gdf, 
    gdf[['geometry', 'AREA_NAME', 'HOOD_ID', 'RISK_SCORE_NORMALIZED']], 
    how='left', 
    predicate='within'
)
```

### **2. Assign Risk Weights** - Add risk to each street
```python
# Calculate risk-weighted distance for routing
for u, v, key, data in G.edges(keys=True, data=True):
    # Get neighborhood risk score for this edge
    edge_id = (u, v, key)
    if edge_id in edges_with_neighborhood.index:
        risk = edges_with_neighborhood.loc[edge_id, 'RISK_SCORE_NORMALIZED']
        distance = data['length']
        
        # Weight = distance * (1 + risk_factor)
        # Higher risk = higher weight = avoid this street
        data['risk_weight'] = distance * (1 + risk / 50)
    else:
        data['risk_weight'] = data['length']  # Default to distance only
```

### **3. Export for Web Map**
```python
# Export streets with risk scores to GeoJSON
edges_gdf.to_file("toronto_streets_with_risk.geojson", driver="GeoJSON")

# Export neighborhoods with risk scores (already done)
gdf.to_file("toronto_neighborhoods_with_risk.geojson", driver="GeoJSON")
```

### **4. Implement Routing**
- Use NetworkX shortest_path with `weight='risk_weight'`
- Compare safe route vs fastest route
- Visualize both on Leaflet map
