# Crashspot — Week 1 Starter Notebook

Welcome! This notebook walks you **step-by-step** through:
1. Verifying your Python environment (inside a virtualenv or conda env).
2. Installing/confirming the required libraries.
3. Creating a clean project folder structure.
4. Loading your accident and road datasets.
5. Checking and aligning CRS (coordinate reference systems).
6. Making your **first quick plots** with GeoPandas/Matplotlib.
7. Building a **simple interactive web map** with Folium.
8. Exporting GeoJSON for the web, and saving outputs.

> If any step fails, read the error message (it's normal!) and follow the tips shown in each cell.


## 1) Environment Check

- Make sure you're inside your project environment:
  - **venv**: `source venv/bin/activate` (mac/linux) or `venv\Scripts\activate` (windows)
  - **conda**: `conda activate crashspot`
- The cell below imports required libraries and prints helpful info.


In [None]:
import sys

print("Python version:", sys.version)
print("Environment OK — now checking imports...")

missing = []
def try_import(name, import_as=None):
    try:
        mod = __import__(name) if import_as is None else __import__(import_as)
        print(f"✔ {name} imported")
    except Exception as e:
        print(f"✖ Could not import {name}: {e}")
        missing.append(name)

try_import("pandas")
try_import("numpy")
try_import("matplotlib")
try:
    import matplotlib.pyplot as plt
    print("✔ matplotlib.pyplot imported")
except Exception as e:
    print("✖ Could not import matplotlib.pyplot:", e); missing.append("matplotlib")

try_import("geopandas")
try_import("shapely")
try_import("rasterio")
try_import("folium")
try_import("sklearn", import_as="sklearn")

if missing:
    print("\nSome packages are missing. Inside your activated environment, run:")
    print("  pip install " + " ".join(missing))
else:
    print("\nAll required packages imported successfully!")


## 2) Create Project Folders

This will create a recommended structure in the **current working directory**.
You can change `PROJECT_ROOT` if you want it elsewhere.


In [None]:
from pathlib import Path

PROJECT_ROOT = Path.cwd() / "Crashspot"
for p in [
    PROJECT_ROOT,
    PROJECT_ROOT / "data_raw",
    PROJECT_ROOT / "data_clean",
    PROJECT_ROOT / "outputs" / "maps",
    PROJECT_ROOT / "outputs" / "figures",
    PROJECT_ROOT / "scripts",
    PROJECT_ROOT / "docs",
]:
    p.mkdir(parents=True, exist_ok=True)
    print("Created/exists:", p)

print("\nProject root is:", PROJECT_ROOT.resolve())


## 3) Put Your Data in `data_raw/`

Place files like:
- `data_raw/accidents.csv` (or `.shp`, `.gpkg`)
- `data_raw/roads_osm.gpkg` (or `.shp`)

> If your accident data doesn't have latitude/longitude, look for separate fields or geocode addresses later (not in Week 1).


## 4) Load Data (Accidents & Roads)

This cell tries to load common formats. Update the filenames if yours differ.


In [None]:
import geopandas as gpd
import pandas as pd

accidents_path_csv = PROJECT_ROOT / "data_raw" / "accidents.csv"
accidents_path_shp = PROJECT_ROOT / "data_raw" / "accidents.shp"
accidents_path_gpkg = PROJECT_ROOT / "data_raw" / "accidents.gpkg"  # layer name may be needed

roads_path_gpkg = PROJECT_ROOT / "data_raw" / "roads_osm.gpkg"
roads_path_shp = PROJECT_ROOT / "data_raw" / "roads_osm.shp"

accidents_gdf = None
roads_gdf = None

# Try accidents
if accidents_path_csv.exists():
    df = pd.read_csv(accidents_path_csv)
    # Try to convert to GeoDataFrame if lat/lon columns exist
    lat_cols = [c for c in df.columns if c.lower() in ("lat","latitude","y")]
    lon_cols = [c for c in df.columns if c.lower() in ("lon","longitude","x","lng")]
    if lat_cols and lon_cols:
        accidents_gdf = gpd.GeoDataFrame(
            df,
            geometry=gpd.points_from_xy(df[lon_cols[0]], df[lat_cols[0]]),
            crs="EPSG:4326"
        )
        print("Loaded accidents from CSV with lat/lon columns.")
    else:
        print("Found accidents.csv but couldn't detect lat/lon columns. You'll add them or use a spatial file.")
elif accidents_path_shp.exists():
    accidents_gdf = gpd.read_file(accidents_path_shp)
    print("Loaded accidents shapefile.")
elif accidents_path_gpkg.exists():
    # If multiple layers exist, specify layer=
    accidents_gdf = gpd.read_file(accidents_path_gpkg)
    print("Loaded accidents GeoPackage.")
else:
    print("No accidents dataset found yet. Put a file into data_raw/.")

# Try roads
if roads_path_gpkg.exists():
    roads_gdf = gpd.read_file(roads_path_gpkg)
    print("Loaded roads from GeoPackage.")
elif roads_path_shp.exists():
    roads_gdf = gpd.read_file(roads_path_shp)
    print("Loaded roads shapefile.")
else:
    print("No roads dataset found yet. Put OSM roads into data_raw/.")

accidents_gdf, roads_gdf


## 5) Check & Align CRS

We will align both layers to **WGS84 (EPSG:4326)** for now.
Later, you can switch to a local projection if needed.


In [None]:
def ensure_epsg4326(gdf):
    if gdf is None:
        return None
    if gdf.crs is None:
        print("Warning: CRS missing; assuming EPSG:4326. Adjust if incorrect.")
        gdf = gdf.set_crs("EPSG:4326")
    elif gdf.crs.to_string() != "EPSG:4326":
        gdf = gdf.to_crs("EPSG:4326")
    return gdf

accidents_gdf = ensure_epsg4326(accidents_gdf)
roads_gdf = ensure_epsg4326(roads_gdf)

if accidents_gdf is not None:
    print("Accidents CRS:", accidents_gdf.crs)
if roads_gdf is not None:
    print("Roads CRS:", roads_gdf.crs)


## 6) Quick Static Plots (Matplotlib)

This gives you a fast visual check that your data lines up.


In [None]:
import matplotlib.pyplot as plt

if roads_gdf is not None or accidents_gdf is not None:
    ax = None
    if roads_gdf is not None:
        ax = roads_gdf.plot(figsize=(8,8))
    if accidents_gdf is not None:
        ax = accidents_gdf.plot(ax=ax, markersize=3)
    plt.title("Roads + Accidents (quick look)")
    plt.show()
else:
    print("Load data first (Section 4).")


## 7) Simple Interactive Map (Folium)

- Centers on the average accident location (if available), else a default location.
- Adds roads (as GeoJSON) and accident points.
- You can pan/zoom and click features.


In [None]:
import folium
from pathlib import Path

m_center = [30.22, -92.02]  # Lafayette-ish default
if accidents_gdf is not None and not accidents_gdf.empty:
    m_center = [accidents_gdf.geometry.y.mean(), accidents_gdf.geometry.x.mean()]

m = folium.Map(location=m_center, zoom_start=11)

# Add roads if present
if roads_gdf is not None and not roads_gdf.empty:
    tmp_roads = PROJECT_ROOT / "data_clean" / "roads_tmp.geojson"
    roads_gdf.to_file(tmp_roads, driver="GeoJSON")
    folium.GeoJson(tmp_roads).add_to(m)
    print("Added roads layer to map.")

# Add points if present
if accidents_gdf is not None and not accidents_gdf.empty:
    tmp_points = PROJECT_ROOT / "data_clean" / "accidents_tmp.geojson"
    accidents_gdf.to_file(tmp_points, driver="GeoJSON")
    folium.GeoJson(tmp_points, name="accidents").add_to(m)
    print("Added accidents layer to map.")

# Save map
out_html = PROJECT_ROOT / "outputs" / "maps" / "quick_map.html"
m.save(str(out_html))
out_html


## 8) Save Cleaned Copies

This shows how to save **GeoJSON** (great for web maps) or **GeoPackage**.


In [None]:
if accidents_gdf is not None and not accidents_gdf.empty:
    out_acc = PROJECT_ROOT / "data_clean" / "accidents_clean.geojson"
    accidents_gdf.to_file(out_acc, driver="GeoJSON")
    print("Saved:", out_acc)

if roads_gdf is not None and not roads_gdf.empty:
    out_roads = PROJECT_ROOT / "data_clean" / "roads_clean.geojson"
    roads_gdf.to_file(out_roads, driver="GeoJSON")
    print("Saved:", out_roads)


## 9) What’s Next (Week 2 Preview)

- Handle missing or invalid coordinates.
- Standardize important fields (date/time, severity, road type).
- Deduplicate records if needed.
- Begin exploratory heatmaps in QGIS (Kernel Density Estimation).
- Fill out `data_sources.md` completely.

> When you’re ready, we’ll add DBSCAN clustering and start preparing features for the predictive model.
