BAD

# Highway Exits in California - Overpass API Query

This notebook demonstrates how to use the Overpass API to query highway exits in California. The Overpass API allows us to extract specific data from OpenStreetMap (OSM) using a powerful query language.

## What we'll cover:
1. Install required packages
2. Set up Overpass API connection
3. Query highway exits in California
4. Process and display the results
5. Visualize the exits on a map

In [13]:
import overpy
import requests
import pandas as pd
import folium
from folium import plugins
import json
import time
from datetime import datetime

In [14]:
# Set up Overpass API
api = overpy.Overpass()

# Alternative direct API endpoint for custom queries
OVERPASS_URL = "http://overpass-api.de/api/interpreter"

def query_overpass_direct(query):
    """
    Direct query to Overpass API with error handling
    """
    try:
        response = requests.post(OVERPASS_URL, data={'data': query}, timeout=300)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error querying Overpass API: {e}")
        return None
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON response: {e}")
        return None

In [15]:
# Define the Overpass query for highway exits in California
# This query looks for nodes with highway=motorway_junction (highway exits) in California

overpass_query = """
[out:json][timeout:300];
(
  // Get all highway exits (motorway junctions) in California
  node["highway"="motorway_junction"]["addr:state"="CA"];
  node["highway"="motorway_junction"](32.5,-124.5,42.0,-114.1);
);
out geom;
"""

print("Overpass Query:")
print(overpass_query)

Overpass Query:

[out:json][timeout:300];
(
  // Get all highway exits (motorway junctions) in California
  node["highway"="motorway_junction"]["addr:state"="CA"];
  node["highway"="motorway_junction"](32.5,-124.5,42.0,-114.1);
);
out geom;



In [16]:
# Execute the query
print("Querying Overpass API for highway exits in California...")
print(f"Query started at: {datetime.now()}")

# Using the direct API approach for better control
result = query_overpass_direct(overpass_query)

if result:
    print(f"Query completed at: {datetime.now()}")
    print(f"Found {len(result.get('elements', []))} highway exits")
else:
    print("Query failed or returned no results")

Querying Overpass API for highway exits in California...
Query started at: 2025-06-14 13:09:42.248683
Query completed at: 2025-06-14 13:09:55.383387
Found 8415 highway exits
Query completed at: 2025-06-14 13:09:55.383387
Found 8415 highway exits


In [17]:
# Process the results into a DataFrame
if result and 'elements' in result:
    exits_data = []
    
    # Define exit-related tags we want to extract
    exit_related_tags = {
        'name', 'exit_number', 'exit_to', 'destination', 'ref', 'highway',
        'exit_to:lanes', 'exit_to:left', 'exit_to:right', 'exit_to:forward',
        'destination:lanes', 'destination:left', 'destination:right', 'destination:forward',
        'destination:ref', 'destination:ref:to', 'destination:symbol',
        'name:en', 'name:es', 'alt_name', 'official_name', 'short_name',
        'junction', 'junction:ref', 'operator', 'road', 'route_ref',
        'turn:lanes', 'lanes', 'exit', 'exit:to', 'ramp', 'ramp:name'
    }
    
    for element in result['elements']:
        if element['type'] == 'node':
            # Start with basic node information
            exit_info = {
                'id': element['id'],
                'lat': element['lat'],
                'lon': element['lon'],
                'type': element['type']
            }
            
            # Extract only exit-related tags from this element
            tags = element.get('tags', {})
            for tag_key, tag_value in tags.items():
                # Include tag if it's in our exit-related list or contains exit/destination keywords
                if (tag_key in exit_related_tags or 
                    'exit' in tag_key.lower() or 
                    'destination' in tag_key.lower() or
                    'name' in tag_key.lower()):
                    exit_info[tag_key] = tag_value
            
            # Ensure we have a name field (use 'Unnamed Exit' if no name tag exists)
            if 'name' not in exit_info:
                exit_info['name'] = 'Unnamed Exit'
                
            exits_data.append(exit_info)
    
    # Create DataFrame
    df_exits = pd.DataFrame(exits_data)
    
    print(f"Created DataFrame with {len(df_exits)} highway exits")
    print(f"Total exit-related columns extracted: {len(df_exits.columns)}")
    print(f"\nExit-related columns: {sorted([col for col in df_exits.columns if col not in ['id', 'lat', 'lon', 'type']])}")
    
    print("\nFirst few entries:")
    print(df_exits.head())
    
    # Show which exit-related tags are most commonly available
    print(f"\nExit-related tag availability (non-null counts):")
    tag_counts = df_exits.count().sort_values(ascending=False)
    for col, count in tag_counts.items():
        if col not in ['id', 'lat', 'lon', 'type']:  # Skip basic node info
            percentage = (count / len(df_exits)) * 100
            print(f"  {col}: {count}/{len(df_exits)} ({percentage:.1f}%)")
else:
    print("No data to process")
    df_exits = pd.DataFrame()

Created DataFrame with 8415 highway exits
Total exit-related columns extracted: 16

Exit-related columns: ['destination', 'destination:ref', 'destination:street', 'exit_to', 'highway', 'junction:ref', 'name', 'noname', 'official_name', 'old_exit_to', 'ref', 'source:exit_to']

First few entries:
       id        lat         lon  type            highway   ref          name  \
0  281266  37.558997 -122.301571  node  motorway_junction  414B  Unnamed Exit   
1  302889  37.325649 -122.050793  node  motorway_junction    18  Unnamed Exit   
2  653731  34.036801 -118.286850  node  motorway_junction   13A  Unnamed Exit   
3  653744  34.030077 -118.257228  node  motorway_junction   14B  Unnamed Exit   
4  653772  34.051103 -118.213905  node  motorway_junction  135C  Unnamed Exit   

  official_name noname old_exit_to destination destination:ref source:exit_to  \
0           NaN    NaN         NaN         NaN             NaN            NaN   
1           NaN    NaN         NaN         NaN         

In [18]:
# Summary of extracted exit-related tags and sample data
if not df_exits.empty:
    print("=== EXIT-RELATED TAG EXTRACTION SUMMARY ===")
    print(f"Total exits: {len(df_exits)}")
    print(f"Total exit-related columns (tags): {len(df_exits.columns)}")
    
    # Show all available exit-related columns/tags
    print(f"\nExit-related tags/columns:")
    exit_cols = sorted([col for col in df_exits.columns if col not in ['id', 'lat', 'lon', 'type']])
    for i, col in enumerate(exit_cols):
        if i % 3 == 0:
            print()
        print(f"{col:25}", end="")
    print("\n")
    
    # Show tag usage statistics for exit-related tags
    print("Exit-related tag usage statistics:")
    tag_counts = df_exits.count().sort_values(ascending=False)
    basic_fields = ['id', 'lat', 'lon', 'type']
    tag_stats = [(col, count, (count/len(df_exits))*100) 
                 for col, count in tag_counts.items() 
                 if col not in basic_fields]
    
    for col, count, percentage in tag_stats:
        print(f"  {col:25}: {count:4d}/{len(df_exits)} ({percentage:5.1f}%)")
    
    # Show a sample exit with all its exit-related tags
    print(f"\n=== SAMPLE EXIT WITH EXIT-RELATED TAGS ===")
    sample_exit = df_exits.iloc[0]
    for col, value in sample_exit.items():
        if pd.notna(value) and value != '' and col not in ['id', 'lat', 'lon', 'type']:
            print(f"  {col}: {value}")
    
    # Save to CSV with exit-related tags only
    filename = f"california_highway_exits_exit_tags_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
    df_exits.to_csv(filename, index=False)
    print(f"\nExit-related data saved to: {filename}")
    
else:
    print("No data available")

=== EXIT-RELATED TAG EXTRACTION SUMMARY ===
Total exits: 8415
Total exit-related columns (tags): 16

Exit-related tags/columns:

destination              destination:ref          destination:street       
exit_to                  highway                  junction:ref             
name                     noname                   official_name            
old_exit_to              ref                      source:exit_to           

Exit-related tag usage statistics:
  highway                  : 8415/8415 (100.0%)
  name                     : 8415/8415 (100.0%)
  ref                      : 6478/8415 ( 77.0%)
  destination:street       :   15/8415 (  0.2%)
  destination              :   14/8415 (  0.2%)
  noname                   :    3/8415 (  0.0%)
  destination:ref          :    2/8415 (  0.0%)
  old_exit_to              :    2/8415 (  0.0%)
  official_name            :    1/8415 (  0.0%)
  source:exit_to           :    1/8415 (  0.0%)
  exit_to                  :    1/8415 (  0.0%)
  j

In [19]:
# Quick overview of extracted data
print(f"DataFrame shape: {df_exits.shape}")
print(f"Columns: {df_exits.columns.tolist()}")
print(f"\nSample of available tags for first exit:")
first_exit = df_exits.iloc[0]
non_empty_tags = {k: v for k, v in first_exit.items() if pd.notna(v) and v != ''}
for key, value in list(non_empty_tags.items())[:10]:  # Show first 10 non-empty tags
    print(f"  {key}: {value}")
    
# Show which columns have the most data
print(f"\nColumns with data (top 10):")
for col in df_exits.count().sort_values(ascending=False).head(10).index:
    count = df_exits[col].count()
    print(f"  {col}: {count} entries")

DataFrame shape: (8415, 16)
Columns: ['id', 'lat', 'lon', 'type', 'highway', 'ref', 'name', 'official_name', 'noname', 'old_exit_to', 'destination', 'destination:ref', 'source:exit_to', 'destination:street', 'exit_to', 'junction:ref']

Sample of available tags for first exit:
  id: 281266
  lat: 37.558997
  lon: -122.3015709
  type: node
  highway: motorway_junction
  ref: 414B
  name: Unnamed Exit

Columns with data (top 10):
  id: 8415 entries
  lat: 8415 entries
  lon: 8415 entries
  type: 8415 entries
  highway: 8415 entries
  name: 8415 entries
  ref: 6478 entries
  destination:street: 15 entries
  destination: 14 entries
  noname: 3 entries


In [20]:
# Data analysis and summary
if not df_exits.empty:
    print("=== HIGHWAY EXITS ANALYSIS ===")
    print(f"Total highway exits found: {len(df_exits)}")
    
    # Check for exits with names
    named_exits = df_exits[df_exits['name'] != 'Unnamed Exit']
    print(f"Named exits: {len(named_exits)}")
    print(f"Unnamed exits: {len(df_exits) - len(named_exits)}")
    
    # Check for exits with numbers
    numbered_exits = df_exits[df_exits['exit_number'].notna()]
    print(f"Exits with numbers: {len(numbered_exits)}")
    
    # Check for exits with destinations
    exits_with_dest = df_exits[df_exits['destination'].notna()]
    print(f"Exits with destinations: {len(exits_with_dest)}")
    
    # Geographic bounds
    print(f"\nGeographic bounds:")
    print(f"Latitude: {df_exits['lat'].min():.4f} to {df_exits['lat'].max():.4f}")
    print(f"Longitude: {df_exits['lon'].min():.4f} to {df_exits['lon'].max():.4f}")
    
    # Show some example exit information
    print("\n=== SAMPLE EXIT INFORMATION ===")
    sample_exits = df_exits[df_exits['name'] != 'Unnamed Exit'].head(10)
    for idx, exit in sample_exits.iterrows():
        print(f"Exit: {exit['name']}")
        if exit['exit_number']:
            print(f"  Number: {exit['exit_number']}")
        if exit['destination']:
            print(f"  Destination: {exit['destination']}")
        print(f"  Location: {exit['lat']:.4f}, {exit['lon']:.4f}")
        print("---")
else:
    print("No exits data available for analysis")

=== HIGHWAY EXITS ANALYSIS ===
Total highway exits found: 8415
Named exits: 9
Unnamed exits: 8406


KeyError: 'exit_number'

In [21]:
# Isolate only the core exit-related tags as specified
if not df_exits.empty:
    print("=== ISOLATING CORE EXIT-RELATED TAGS ===")
    
    # Define the core exit-related tags we want to isolate
    core_exit_tags = {
        'ref',                 # Exit number
        'name',                # Exit name (usually destination street/place)
        'exit_to',             # Where the exit leads (older US convention)
        'destination',         # Signed destination (modern standard)
        'destination:ref',     # Signed route number
        'destination:street',  # Signed street name
        'destination:city',    # Signed city name
        'destination:symbol',  # Signed symbol (e.g. airport)
        'junction:ref',        # Exit/junction number (used in some countries)
        'is_in'                # Legacy place info (if present)
    }
    
    # Keep basic fields plus only the core exit tags
    basic_fields = ['id', 'lat', 'lon', 'type']
    
    # Find which of our core tags are actually present in the data
    available_core_tags = [tag for tag in core_exit_tags if tag in df_exits.columns]
    
    # Create new DataFrame with only basic fields + core exit tags
    core_columns = basic_fields + available_core_tags
    df_core_exits = df_exits[core_columns].copy()
    
    print(f"Original DataFrame had {len(df_exits.columns)} columns")
    print(f"Core exit tags DataFrame has {len(df_core_exits.columns)} columns")
    print(f"\nCore exit tags found in data:")
    for tag in available_core_tags:
        count = df_core_exits[tag].count()
        percentage = (count / len(df_core_exits)) * 100
        print(f"  {tag:20}: {count:4d}/{len(df_core_exits)} ({percentage:5.1f}%)")
    
    print(f"\nCore exit tags NOT found in data:")
    missing_tags = [tag for tag in core_exit_tags if tag not in df_exits.columns]
    for tag in missing_tags:
        print(f"  {tag}")
    
    # Show sample of core data
    print(f"\n=== SAMPLE CORE EXIT DATA ===")
    sample_exits = df_core_exits.head(3)
    for idx, exit in sample_exits.iterrows():
        print(f"\nExit {idx + 1}:")
        for col in available_core_tags:
            if pd.notna(exit[col]) and exit[col] != '':
                print(f"  {col}: {exit[col]}")
    
    # Save core exit data
    core_filename = f"california_highway_exits_core_tags_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
    df_core_exits.to_csv(core_filename, index=False)
    print(f"\nCore exit data saved to: {core_filename}")
    
    # Quick stats on exits with different types of identifiers
    print(f"\n=== EXIT IDENTIFICATION STATS ===")
    exits_with_ref = df_core_exits['ref'].notna().sum()
    exits_with_name = df_core_exits['name'].notna().sum()
    exits_with_destination = df_core_exits['destination'].notna().sum()
    exits_with_exit_to = df_core_exits['exit_to'].notna().sum()
    
    print(f"Exits with ref (exit number): {exits_with_ref}")
    print(f"Exits with name: {exits_with_name}")
    print(f"Exits with destination: {exits_with_destination}")
    print(f"Exits with exit_to: {exits_with_exit_to}")
    
else:
    print("No data available to analyze")

=== ISOLATING CORE EXIT-RELATED TAGS ===
Original DataFrame had 16 columns
Core exit tags DataFrame has 11 columns

Core exit tags found in data:
  destination:ref     :    2/8415 (  0.0%)
  exit_to             :    1/8415 (  0.0%)
  name                : 8415/8415 (100.0%)
  destination         :   14/8415 (  0.2%)
  destination:street  :   15/8415 (  0.2%)
  ref                 : 6478/8415 ( 77.0%)
  junction:ref        :    1/8415 (  0.0%)

Core exit tags NOT found in data:
  destination:symbol
  destination:city
  is_in

=== SAMPLE CORE EXIT DATA ===

Exit 1:
  name: Unnamed Exit
  ref: 414B

Exit 2:
  name: Unnamed Exit
  ref: 18

Exit 3:
  name: Unnamed Exit
  ref: 13A

Core exit data saved to: california_highway_exits_core_tags_20250614_131214.csv

=== EXIT IDENTIFICATION STATS ===
Exits with ref (exit number): 6478
Exits with name: 8415
Exits with destination: 14
Exits with exit_to: 1
