<a href="https://colab.research.google.com/github/MODA-NYC/nyc-geography-crosswalks/blob/main/NYC_Geographies_Generate_All_Wide_Crosswalks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NYC Geographies: Generate All Wide Crosswalks

This notebook generates a complete set of wide crosswalk tables for various geographic boundaries in New York City using the BetaNYC `all_bounds.geojson` dataset.

### What this notebook does:
- **Spatial intersections:** Computes overlaps among NYC geographic boundaries using GeoPandas.
- **Negative Buffering:** Applies a negative buffer to each geography to ensure meaningful overlaps and exclude trivial or touching geometries.
- **Wide Crosswalk Tables:** Produces one CSV file per geographic boundary (e.g., Community Districts, ZIP codes, NTAs, BIDs), each structured as a wide table where:
  - Each **row** represents a specific geographic feature.
  - Each **column** shows overlapping features from other geography types (semicolon-separated).
- **Automated Outputs:** All generated CSV files are zipped into a single downloadable archive.

### Data Source:
- [BetaNYC nyc-boundaries GeoJSON](https://github.com/BetaNYC/nyc-boundaries)

### Requirements:
- Python libraries: `geopandas`, `requests`, `pandas`
- Environment: Google Colab recommended for ease of use.

### Output:
- **ZIP file**: `all_geographies_crosswalks.zip` containing individual CSVs for each geography type.

---

In [None]:
# Install required libraries
!pip install geopandas requests --quiet

import geopandas as gpd
import pandas as pd
import requests
from io import BytesIO
from google.colab import files
import zipfile
import os

# Configuration
BUFFER_FEET = -200
MIN_INTERSECTION_AREA = 400

# Geography IDs list
geography_ids = ['pp', 'fb', 'sd', 'bid', 'ibz', 'cd', 'dsny', 'hc',
                 'cc_upcoming', 'cc', 'nycongress', 'sa', 'ss', 'nta', 'zipcode', 'hd']

# Download GeoJSON
geojson_url = "https://raw.githubusercontent.com/BetaNYC/nyc-boundaries/main/script/all_bounds.geojson"
response = requests.get(geojson_url)
response.raise_for_status()
gdf = gpd.read_file(BytesIO(response.content)).to_crs(epsg=2263)

# Spatial index for efficiency
spatial_index = gdf.sindex

# Temporary folder to store CSVs
output_folder = 'geography_crosswalks'
os.makedirs(output_folder, exist_ok=True)

csv_files = []

for primary_geo in geography_ids:
    primary_gdf = gdf[gdf['id'] == primary_geo].copy()
    if primary_gdf.empty:
        continue

    records = []

    for _, primary_row in primary_gdf.iterrows():
        primary_name = primary_row['nameCol']
        primary_geom_buffered = primary_row.geometry.buffer(BUFFER_FEET)

        candidate_idx = list(spatial_index.intersection(primary_geom_buffered.bounds))
        candidate_features = gdf.iloc[candidate_idx]

        # Initial intersection filter
        candidates = candidate_features[candidate_features.intersects(primary_geom_buffered)].copy()
        if candidates.empty:
            final_candidates = candidates
        else:
            candidates['intersection_area'] = candidates.geometry.intersection(primary_geom_buffered).area
            final_candidates = candidates[candidates['intersection_area'] > MIN_INTERSECTION_AREA]

        record = {primary_geo: primary_name}

        for secondary_geo in geography_ids:
            if secondary_geo == primary_geo:
                continue  # skip self-intersection
            subset = final_candidates[final_candidates['id'] == secondary_geo]
            record[secondary_geo] = ";".join(subset['nameCol'].unique()) if not subset.empty else ""

        records.append(record)

    df = pd.DataFrame(records)
    csv_filename = f"{output_folder}/{primary_geo}_wide_crosswalk.csv"
    df.to_csv(csv_filename, index=False)
    csv_files.append(csv_filename)

# Zip and download the files
zip_filename = "all_geographies_wide_crosswalks.zip"
with zipfile.ZipFile(zip_filename, 'w') as zipf:
    for file in csv_files:
        zipf.write(file, arcname=os.path.basename(file))

files.download(zip_filename)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>