# Bay Area Boba EDA

**Question:** What patterns in location and pricing relate to higher ratings?

In [None]:
# --- Notebook bootstrap: set project root + imports ---
import sys
from pathlib import Path

# when running notebooks/eda.ipynb, project root is one level up
ROOT = Path.cwd().parent  # -> .../boba-eda-starter
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

# optional: nicer dev loop (reloads src/* on change)
try:
    %load_ext autoreload
    %autoreload 2
except Exception:
    pass

from src.clean import clean_pipeline
from src.viz import savefig

# --- Data load ---
from pathlib import Path
DATA_PATH = ROOT / "data" / "bayarea_boba_spots.csv"  # robust absolute path from ROOT
df = clean_pipeline(DATA_PATH)
df.head()


## Data Dictionary & Assumptions
- `rating`: Yelp star rating (0–5)
- `lat`/`lon`: WGS84 degrees
- `city`: parsed from address; trimmed of whitespace
- Assumptions: ratings are subjective/noisy; no price column in base CSV (add if available).

## Top Cities by Number of Shops

In [None]:
city_counts = (df.groupby("city", as_index=False)
                 .size()
                 .sort_values("size", ascending=False)
                 .head(10))
ax = city_counts.plot(kind="barh", x="city", y="size", figsize=(8,5))
ax.invert_yaxis()
ax.set_title("Top 10 Bay Area Cities by Number of Boba Shops")
ax.set_xlabel("Shop count"); ax.set_ylabel("")
for c in ax.containers: ax.bar_label(c, padding=2)
savefig("../assets/top_cities.png")

**Takeaway:** Fill in after you view the chart.

## Geospatial Density (Hexbin preview)

In [None]:
plt.figure(figsize=(6,6))
plt.hexbin(df["lon"], df["lat"], gridsize=60)
plt.title("Boba Shop Density (Hexbin)")
plt.xlabel("Longitude"); plt.ylabel("Latitude")
savefig("../assets/heatmap_hexbin.png")

**Takeaway:** Fill in a one-liner about density clusters (SF/Peninsula vs East Bay).