## Adding Road Distance

This code defines a function `add_road_dis` that calculates the distance from each location to the nearest primary road, within a specified geographic bounding box. The resulting distances are returned in kilometers as a new column in the data. This can be useful for analyzing the relationship between air quality and proximity to major roads.

### Code Breakdown

- **Inputs:**
  - `data_dir`: The path to a CSV file (`./ground_pm25.csv`) containing air quality data, including columns for latitude (`lat`) and longitude (`lon`).
  - `lon_min`, `lon_max`, `lat_min`, `lat_max`: Geographic bounds within which road data is collected.

- **Function: `add_road_dis`**
  - Reads the input CSV file into a DataFrame (`df`) and converts the latitude and longitude columns into a `GeoDataFrame`.
  - Uses `osmnx` to download road network data for the specified bounding box, focusing on primary road types like motorways, trunks, and primary roads.
  - Converts both the points and road data to UTM projection (`EPSG:26913`) for distance calculations.
  - Uses a spatial join to find the nearest road for each PM2.5 measurement location and calculates the distance in kilometers (`road_dis_km`).

- **Output:**
  - Returns a DataFrame with an additional `road_dis_km` column, indicating each point’s distance to the nearest major road.


In [1]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import osmnx as ox

df_dir='./ground_pm25.csv'
lon_min = -108
lon_max = -105
lat_min = 30
lat_max = 33

def add_road_dis(data_dir, lon_min, lon_max, lat_min, lat_max):
    df = pd.read_csv(data_dir)
    gdf_points = gpd.GeoDataFrame(
        df, geometry=gpd.points_from_xy(df['lon'], df['lat']), crs='EPSG:4326'
    )


    try:
        G = ox.graph_from_bbox(lat_max, lat_min, lon_max, lon_min, network_type='drive')
        nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
        primary_roads = edges[edges['highway'].isin(['motorway', 'trunk', 'primary', 'secondary'])]
        print("Successfully fetched roads within bounding box.")
    except Exception as e:
        print(f"Failed to fetch roads within bounding box, Error: {e}")
        return df
    
    utm_crs = 'EPSG:26913'
    gdf_points_utm = gdf_points.to_crs(utm_crs)
    primary_roads_utm = primary_roads.to_crs(utm_crs)
    
    nearest = gpd.sjoin_nearest(
        gdf_points_utm, primary_roads_utm[['geometry']], how='left', distance_col='road_dis'
    )
    
    df = df.reset_index(drop=True)
    nearest = nearest.reset_index(drop=True)
    
    df['road_dis'] = nearest['road_dis']
    df['road_dis_km'] = df['road_dis'] / 1000
    df.drop(columns=['road_dis'], inplace=True)
    return df


df_with_distance = add_road_dis(df_dir, lon_min, lon_max, lat_min, lat_max)
print(df_with_distance[['lon', 'lat', 'road_dis_km']].head())
print(df_with_distance['road_dis_km'].min(), df_with_distance['road_dis_km'].max())

  G = ox.graph_from_bbox(lat_max, lat_min, lon_max, lon_min, network_type='drive')
  G = ox.graph_from_bbox(lat_max, lat_min, lon_max, lon_min, network_type='drive')
  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)


Successfully fetched roads within bounding box.
          lon        lat  road_dis_km
0 -106.287740  31.667520     0.603648
1 -106.402802  31.746700     0.116397
2 -106.455000  31.765600     0.022483
3 -106.751211  32.310332     0.289801
4 -106.287740  31.667520     0.603648
0.02248281408562361 0.6036479220713542
