## Exercise 2.5 — Advanced Geospatial Plotting

### What this code does (end-to-end)
- **Data source:** I use the committed sample file `Output/citibike_weather_2022_sample_100k.csv` created in Exercise 2.2 (NYC Citi Bike 2022 trips).
- **Preprocessing:**
  - Drop trips with missing coordinates.
  - Add a helper column `one = 1`.
  - **Aggregate trips by origin→destination (OD)** with `groupby([...])["one"].sum()` → this yields a tidy **flows** table with:
    - `start_station_name, start_lat, start_lng, end_station_name, end_lat, end_lng, trip_count`
  - Build a **stations** table by separately aggregating starts and ends, then merging:
    - `traffic = starts + ends` per station, used to size/color the point markers.
- **Thresholds:** I compute several **quantile-based cutoffs** over `trip_count` (e.g., 90%, 95%, 98%, plus the max). Each cutoff becomes a **separate toggleable layer** so the map never turns into an unreadable hairball.
- **Map:**
  - Use **Folium + OpenStreetMap** tiles (no API token required) and inject a tiny CSS snippet so the map **fills the whole browser window** (true full screen).
  - **Stations layer:** `CircleMarker` points are **sized by sqrt(traffic)** (stable visibility), colored red, with sticky tooltips (`station`, `traffic`).
  - **Flows layers:** For each threshold, I add a `PolyLine` layer with **log-scaled line weight** and sticky tooltips (`start → end`, `trip_count`). These layers are **off by default** except the one you choose via the layer control.  
  - Add **LayerControl** (top-right) to toggle any threshold layer on/off, plus a **Fullscreen** button.
  - **Export:** Save as `Output/citibike_folium_fullscreen.html`. The file opens in any browser, fills the screen, requires **no tokens**, and includes all interactivity.

### Why I changed approach from kepler.gl here
I first implemented kepler.gl to match the exercise narrative (start/end points + arcs + lines; filter slider). In practice, two blockers made it unusable in this environment:
1. **Basemap access:** kepler.gl defaults to Mapbox. Without a **Mapbox token** (and Mapbox now prompting for a credit card), we must use a `"blank"` style. That renders **only dots/lines on a blank background**, which is disorienting in NYC.
2. **HTML rendering quirks:** Saved kepler.gl HTML often showed **partial sizing** (map confined to top-left) and the **enlarged filter UI** overlaying the data unless carefully reconfigured. With heavy OD arcs, performance and readability suffered.
   
I did keep a kepler.gl variant (with a blank basemap and UI fixes) for completeness, but for a clean deliverable without tokens, **Folium + OpenStreetMap** is more robust:
- **No keys, no paywalls**, tiles load from OSM.
- **True full-window layout** via simple CSS (no overlay panels).
- Layer toggles act like a “filter” but remain intuitive and non-obstructive.
- Lightweight export that mentors can open immediately.

### How this meets the task requirements
- **(3) New column + aggregated OD dataframe:** `df["one"]=1` then `groupby` by start/end → `trip_count`.
- **(4) Initialize map:** Folium map with OSM tiles and NYC center.
- **(5) Customize appearance:** Stations sized/colored by demand; arcs/lines colored and thickness scaled by `trip_count`; readable tooltips.
- **(6) Filtering to “most common trips”:** Implemented as **thresholded layers** (90%, 95%, 98%, max). Toggle a higher layer to instantly see the **most popular OD pairs**. You’ll notice clusters around Midtown/central corridors and waterfront routes, consistent with commute and leisure patterns.
- **(7) Save with config:** Exported a single self-contained HTML that preserves all layers. (No external token/config files needed.)

### Notes & next steps
- You can increase `TOP_PER_LAYER` if you want more lines per threshold (browser performance dependent).
- It’s easy to add **Member/Casual** layers or **Weekday/Weekend** layers by filtering before the groupby and creating additional flow layers.
- If kepler.gl must be shown, I also prepared a token-free variant (blank basemap) with **start points, end points, arcs, lines, and a sidebar filter** but Folium gives the clearest, full-screen result for review without extra setup.


In [8]:
# Enhanced Citi Bike interactivity — Plotly + OSM 
# 1) Flows map with a dropdown combining Rider Type × Threshold
# 2) Station density heatmap with All / Weekday / Weekend toggle

from pathlib import Path
import pandas as pd, numpy as np

# Install plotly if missing
try:
    import plotly.graph_objects as go
    import plotly.express as px
except ModuleNotFoundError:
    %pip install plotly
    import plotly.graph_objects as go
    import plotly.express as px

#  Paths & data
PROJECT = Path.cwd()
if not (PROJECT / "Output").exists():
    PROJECT = PROJECT.parent if (PROJECT.parent / "Output").exists() else PROJECT
OUT = PROJECT / "Output"
SAMPLE = OUT / "citibike_weather_2022_sample_100k.csv"
assert SAMPLE.exists(), f"Missing {SAMPLE}. Re-run Exercise 2.2 to create it."

df = pd.read_csv(SAMPLE)
need = ["start_station_name","end_station_name","start_lat","start_lng","end_lat","end_lng","member_casual","started_at"]
missing = [c for c in need if c not in df.columns]
assert not missing, f"Missing columns: {missing}"

# Clean coords
df = df.dropna(subset=["start_lat","start_lng","end_lat","end_lng"]).copy()
for c in ["start_lat","start_lng","end_lat","end_lng"]:
    df[c] = pd.to_numeric(df[c], errors="coerce")
df = df.dropna(subset=["start_lat","start_lng","end_lat","end_lng"])
df["one"] = 1

# Helper: aggregate for a rider subset
def aggregate_for(rider=None):
    d = df
    if rider in ("member","casual"):
        d = d[d["member_casual"] == rider]
    # Flows (OD pairs)
    flows = (
        d.groupby(
            ["start_station_name","start_lat","start_lng",
             "end_station_name","end_lat","end_lng"], as_index=False
        )["one"].sum().rename(columns={"one":"trip_count"})
    ).sort_values("trip_count", ascending=False).reset_index(drop=True)
    # Stations (traffic = starts + ends)
    starts = (d.groupby(["start_station_name","start_lat","start_lng"], as_index=False)["one"]
                .sum().rename(columns={"start_station_name":"station","start_lat":"lat","start_lng":"lon","one":"starts"}))
    ends   = (d.groupby(["end_station_name","end_lat","end_lng"], as_index=False)["one"]
                .sum().rename(columns={"end_station_name":"station","end_lat":"lat","end_lng":"lon","one":"ends"}))
    stations = starts.merge(ends, on=["station","lat","lon"], how="outer").fillna(0)
    stations["traffic"] = stations["starts"] + stations["ends"]
    return flows, stations

flows_all, stations_all     = aggregate_for(None)
flows_member, stations_mem  = aggregate_for("member")
flows_casual, stations_cas  = aggregate_for("casual")

# Thresholds per rider group: use quantiles tailored to each distribution (+ max)
def thresholds_for(flows_df):
    if len(flows_df)==0:
        return [1]
    qs = [0.80, 0.90, 0.95, 0.98]
    thr = sorted({int(np.quantile(flows_df["trip_count"].values, q)) for q in qs})
    thr = [max(1,t) for t in thr]
    if thr[-1] < int(flows_df["trip_count"].max()):
        thr.append(int(flows_df["trip_count"].max()))
    return thr

thr_map = {
    "All": thresholds_for(flows_all),
    "Member": thresholds_for(flows_member),
    "Casual": thresholds_for(flows_casual)
}
flows_map = {"All": flows_all, "Member": flows_member, "Casual": flows_casual}
stations_map = {"All": stations_all, "Member": stations_mem, "Casual": stations_cas}

# ---------- Build traces ----------
fig = go.Figure()
trace_indices = {}            # (group) -> station_trace_index
flow_indices = {}             # (group, thr_idx) -> flow_trace_index
buttons = []                  # dropdown buttons
groups = ["All","Member","Casual"]

# Station traces (one per group)
for g in groups:
    st = stations_map[g].copy()
    if len(st)==0:
        # empty dummy trace
        idx = len(fig.data)
        fig.add_trace(go.Scattermapbox(lat=[], lon=[], mode="markers", name=f"{g} stations"))
        trace_indices[g] = idx
        continue
    # size by sqrt(traffic)
    smin, smax = st["traffic"].min(), st["traffic"].max()
    sz = 6 + 10 * np.sqrt((st["traffic"] - smin) / (smax - smin + 1e-9))
    idx = len(fig.data)
    fig.add_trace(go.Scattermapbox(
        lat=st["lat"], lon=st["lon"],
        mode="markers",
        marker=dict(
            size=sz, color=st["traffic"], colorscale="Reds",
            cmin=smin, cmax=smax, showscale=(g=="All"),
            colorbar=dict(title="Station traffic") if g=="All" else None,
            opacity=0.9
        ),
        text=[f"{row.station}<br>traffic: {int(row.traffic)}" for row in st.itertuples(index=False)],
        hoverinfo="text",
        name=f"{g} stations",
        visible=(g=="All")   # show All by default
    ))
    trace_indices[g] = idx

# Flow traces (one per group × threshold)
for g in groups:
    flows_g = flows_map[g]
    thrs = thr_map[g]
    for j, tmin in enumerate(thrs):
        if len(flows_g)==0:
            idx = len(fig.data)
            fig.add_trace(go.Scattermapbox(lat=[], lon=[], mode="lines", name=f"{g} ≥ {tmin}", visible=False))
            flow_indices[(g, j)] = idx
            continue
        sub = flows_g.loc[flows_g["trip_count"] >= tmin]
        # build line segments with None separators
        lats, lons = [], []
        for r in sub.itertuples(index=False):
            lats.extend([r.start_lat, r.end_lat, None])
            lons.extend([r.start_lng, r.end_lng, None])
        width = 1.5 + 3.0 * (np.log10(max(tmin,1)) - np.log10(max(1, thrs[0]))) / (np.log10(max(1, thrs[-1])) - np.log10(max(1, thrs[0])) + 1e-9)
        idx = len(fig.data)
        fig.add_trace(go.Scattermapbox(
            lat=lats, lon=lons, mode="lines",
            line=dict(width=max(1.0,float(width)), color="#2171b5"),
            opacity=0.6,
            hoverinfo="skip",
            name=f"{g} flows ≥ {tmin}",
            visible=(g=="All" and j==0)  # default: All at lowest threshold
        ))
        flow_indices[(g, j)] = idx

# Build dropdown buttons for every (group, threshold) combo
for g in groups:
    thrs = thr_map[g]
    for j, tmin in enumerate(thrs):
        visible = [False] * len(fig.data)
        # turn on station trace for this group
        visible[trace_indices[g]] = True
        # turn on the flow trace for this group+threshold (if exists)
        idx = flow_indices.get((g, j))
        if idx is not None:
            visible[idx] = True
        # label includes pair count at that cutoff
        flows_g = flows_map[g]
        pairs = int((flows_g["trip_count"] >= tmin).sum())
        buttons.append(dict(
            method="update",
            label=f"{g} ≥ {tmin} trips ({pairs} pairs)",
            args=[{"visible": visible},
                  {"title": f"NYC Citi Bike — {g} stations & flows (≥ {tmin} trips)"}]
        ))

# Layout (full-window, OSM base)
fig.update_layout(
    title="NYC Citi Bike — All stations & flows (use menu to switch Rider × Threshold)",
    mapbox=dict(style="open-street-map", center=dict(lat=40.73, lon=-73.98), zoom=12),
    margin=dict(l=0, r=0, t=60, b=0),
    height=900,
    showlegend=False,
    updatemenus=[dict(
        type="dropdown",
        x=0.01, xanchor="left",
        y=0.99, yanchor="top",
        buttons=buttons,
        bgcolor="white",
        bordercolor="#ccc",
        pad={"r":8,"t":8,"b":8,"l":8}
    )]
)

flows_html = OUT / "citibike_flows_plotly_dropdowns.html"
fig.write_html(str(flows_html), include_plotlyjs="cdn", full_html=True)
print("Saved flows map:", flows_html)

# Station DENSITY heatmap (All / Weekday / Weekend)
df["started_at"] = pd.to_datetime(df["started_at"], errors="coerce")
df["is_weekend"] = df["started_at"].dt.weekday >= 5

def density_trace(mask, name):
    d = df.loc[mask, ["start_lat","start_lng"]].dropna()
    return go.Densitymapbox(
        lat=d["start_lat"], lon=d["start_lng"],
        radius=25, z=None, colorscale="YlOrRd", name=name, visible=False, opacity=0.85
    )

tr_all     = density_trace(df["start_lat"].notna(), "All days")
tr_weekday = density_trace(~df["is_weekend"], "Weekdays")
tr_weekend = density_trace(df["is_weekend"], "Weekends")
tr_all.visible = True

fig2 = go.Figure([tr_all, tr_weekday, tr_weekend])
fig2.update_layout(
    title="Station usage density — All / Weekday / Weekend",
    mapbox=dict(style="open-street-map", center=dict(lat=40.73, lon=-73.98), zoom=12),
    margin=dict(l=0, r=0, t=60, b=0),
    height=900,
    updatemenus=[dict(
        type="dropdown",
        x=0.01, xanchor="left", y=0.99, yanchor="top",
        buttons=[
            dict(method="update", label="All",      args=[{"visible":[True, False, False]}]),
            dict(method="update", label="Weekday",  args=[{"visible":[False, True, False]}]),
            dict(method="update", label="Weekend",  args=[{"visible":[False, False, True]}]),
        ],
        bgcolor="white", bordercolor="#ccc",
        pad={"r":8,"t":8,"b":8,"l":8}
    )]
)
density_html = OUT / "citibike_station_density_plotly.html"
fig2.write_html(str(density_html), include_plotlyjs="cdn", full_html=True)
print("Saved density map:", density_html)



*scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/


*scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/



Saved flows map: C:\Users\arpit\Documents\CareerFoundry\Python_dashboard\citibike-dashboard\Output\citibike_flows_plotly_dropdowns.html
Saved density map: C:\Users\arpit\Documents\CareerFoundry\Python_dashboard\citibike-dashboard\Output\citibike_station_density_plotly.html



*densitymapbox* is deprecated! Use *densitymap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/


*densitymapbox* is deprecated! Use *densitymap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/


*densitymapbox* is deprecated! Use *densitymap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/



In [1]:
# Full-screen interactive Citi Bike map using Folium + OpenStreetMap
# Layers: Stations (sized by traffic), and multiple OD flow thresholds you can toggle

from pathlib import Path
import pandas as pd
import numpy as np

# Auto-install folium in Jupyter if missing
try:
    import folium
    from folium import FeatureGroup
    from folium.plugins import Fullscreen
except ModuleNotFoundError:
    %pip install folium
    import folium
    from folium import FeatureGroup
    from folium.plugins import Fullscreen

#Paths & data 
PROJECT = Path.cwd()
if not (PROJECT / "Output").exists():
    PROJECT = PROJECT.parent if (PROJECT.parent / "Output").exists() else PROJECT
OUT = PROJECT / "Output"
SAMPLE = OUT / "citibike_weather_2022_sample_100k.csv"
assert SAMPLE.exists(), f"Missing {SAMPLE}. Re-run Exercise 2.2 to create it."

df = pd.read_csv(SAMPLE)

need = ["start_station_name","end_station_name","start_lat","start_lng","end_lat","end_lng"]
missing = [c for c in need if c not in df.columns]
assert not missing, f"Missing columns: {missing}"

# Clean coords
df = df.dropna(subset=["start_lat","start_lng","end_lat","end_lng"]).copy()
for c in ["start_lat","start_lng","end_lat","end_lng"]:
    df[c] = pd.to_numeric(df[c], errors="coerce")
df = df.dropna(subset=["start_lat","start_lng","end_lat","end_lng"])

#  Aggregate 
df["one"] = 1
# Stations (size/color by traffic = starts + ends)
starts = (df.groupby(["start_station_name","start_lat","start_lng"], as_index=False)["one"]
            .sum().rename(columns={"start_station_name":"station","start_lat":"lat","start_lng":"lon","one":"starts"}))
ends   = (df.groupby(["end_station_name","end_lat","end_lng"], as_index=False)["one"]
            .sum().rename(columns={"end_station_name":"station","end_lat":"lat","end_lng":"lon","one":"ends"}))
stations = starts.merge(ends, on=["station","lat","lon"], how="outer").fillna(0)
stations["traffic"] = stations["starts"] + stations["ends"]

# Flows (OD pairs)
flows = (
    df.groupby(
        ["start_station_name","start_lat","start_lng",
         "end_station_name","end_lat","end_lng"], as_index=False
    )["one"].sum().rename(columns={"one":"trip_count"})
).sort_values("trip_count", ascending=False).reset_index(drop=True)

# Thresholds for layers (quantiles + max)
qs = [0.90, 0.95, 0.98]
thr = sorted({int(np.quantile(flows["trip_count"].values, q)) for q in qs})
thr = [max(1, t) for t in thr]
if thr and thr[-1] < int(flows["trip_count"].max()):
    thr.append(int(flows["trip_count"].max()))

# Limit per layer to keep it fast (tune if you want)
TOP_PER_LAYER = 2000

#  Map
CENTER = (40.73, -73.98)
m = folium.Map(location=CENTER, zoom_start=12, tiles="OpenStreetMap", control_scale=True)

# Make it truly full-screen (CSS)
m.get_root().header.add_child(folium.Element("""
<style>
  html, body {height: 100%; width: 100%; margin: 0; padding: 0;}
  .folium-map {position: fixed; top:0; bottom:0; right:0; left:0;}
</style>"""))

# Fullscreen button
Fullscreen(position="topleft").add_to(m)

# Stations layer
fg_stations = FeatureGroup(name="Stations (traffic size/color)", show=True)
# Size by sqrt scale
smin, smax = stations["traffic"].min(), stations["traffic"].max()
rng = (smax - smin) if smax > smin else 1.0
for r in stations.itertuples(index=False):
    # radius 3..12 scaled by sqrt
    sz = 3 + 9 * np.sqrt((r.traffic - smin) / (rng + 1e-9))
    folium.CircleMarker(
        location=(r.lat, r.lon),
        radius=float(sz),
        color=None,
        fill=True,
        fill_color="#de2d26",  # red
        fill_opacity=0.85,
        tooltip=folium.Tooltip(f"{r.station}<br>traffic: {int(r.traffic)}", sticky=True)
    ).add_to(fg_stations)
fg_stations.add_to(m)

# Flow layers per threshold
for t in thr:
    sub = flows.loc[flows["trip_count"] >= t].head(TOP_PER_LAYER)
    fg = FeatureGroup(name=f"Flows ≥ {t} trips (show top {len(sub)})", show=False)
    # Line weight by log(trip_count)
    if len(sub):
        vmin, vmax = sub["trip_count"].min(), sub["trip_count"].max()
        vmin = max(vmin, 1)
        denom = (np.log10(vmax) - np.log10(vmin) + 1e-9)
    for r in sub.itertuples(index=False):
        if len(sub):
            w = 1.0 + 3.0 * (np.log10(r.trip_count) - np.log10(vmin)) / denom
        else:
            w = 1.0
        folium.PolyLine(
            locations=[(r.start_lat, r.start_lng), (r.end_lat, r.end_lng)],
            color="#2171b5", weight=float(w), opacity=0.6,
            tooltip=folium.Tooltip(f"{r.start_station_name} → {r.end_station_name}<br>trips: {int(r.trip_count)}", sticky=True)
        ).add_to(fg)
    fg.add_to(m)

# Layer control
folium.LayerControl(collapsed=False).add_to(m)

# Save
html_path = OUT / "citibike_folium_fullscreen.html"
m.save(str(html_path))
print("Saved:", html_path)
print("Open the HTML: it will fill the window. Toggle flow layers on/off in the top-right control.")


Saved: C:\Users\arpit\Documents\CareerFoundry\Python_dashboard\citibike-dashboard\Output\citibike_folium_fullscreen.html
Open the HTML: it will fill the window. Toggle flow layers on/off in the top-right control.
