# Notebook 04 — Seasonal Patterns

**Purpose:** Identify which products are seasonal and tag them by peak month.

**Approach:** The Instacart dataset has no calendar dates (only relative
`days_since_prior_order`), so we use a curated keyword-based seasonal mapping —
matching product names against known agricultural seasons.

In an interview: *"The dataset lacks timestamps so I curated seasonal rules from
agricultural data. In production I would detect seasonality by measuring monthly
order-frequency spikes against an item's annual average."*

**Input:** `item_catalog.json`

**Output:** `seasonal_items.json`, updated `item_catalog.json`

**Runtime:** < 1 min


In [1]:
import os, json
from collections import defaultdict

IS_KAGGLE  = os.path.exists("/kaggle/input")
OUTPUT_DIR = "/kaggle/working" if IS_KAGGLE else "../data/output"

with open(f"{OUTPUT_DIR}/item_catalog.json", "r") as f:
    item_catalog = json.load(f)

print(f"Catalog loaded: {len(item_catalog)} items")


Catalog loaded: 3000 items


In [2]:
# Month numbers: 1=Jan … 12=Dec
# Keyword must appear anywhere in the product name (case-insensitive)

SEASONAL_KEYWORDS = {
    # ── WINTER (Dec–Feb) ────────────────────────────────────────────────────
    "hot chocolate": [12, 1, 2], "cocoa":      [12, 1, 2],
    "soup":          [11, 12, 1, 2], "broth":  [11, 12, 1, 2],
    "stew":          [11, 12, 1, 2], "chili":  [11, 12, 1, 2],
    "sweet potato":  [10, 11, 12],  "squash":  [9, 10, 11],
    "butternut":     [9, 10, 11],   "pumpkin": [9, 10, 11],
    "cranberry":     [11, 12],      "eggnog":  [11, 12],
    "cider":         [9, 10, 11],   "oatmeal": [10, 11, 12, 1, 2],
    # ── SPRING (Mar–May) ────────────────────────────────────────────────────
    "asparagus": [3, 4, 5], "artichoke": [3, 4, 5],
    "pea":       [3, 4, 5], "radish":    [3, 4, 5],
    "rhubarb":   [4, 5],    "lamb":      [3, 4],
    "spring mix":[3, 4, 5], "sprout":    [3, 4, 5],
    # ── SUMMER (Jun–Aug) ────────────────────────────────────────────────────
    "watermelon": [6, 7, 8], "cantaloupe": [6, 7, 8],
    "honeydew":   [6, 7, 8], "peach":      [6, 7, 8],
    "nectarine":  [6, 7, 8], "plum":       [6, 7, 8],
    "cherry":     [5, 6, 7], "blueberry":  [6, 7, 8],
    "raspberry":  [6, 7, 8], "corn":       [6, 7, 8, 9],
    "tomato":     [6, 7, 8, 9], "zucchini": [6, 7, 8],
    "cucumber":   [6, 7, 8], "bell pepper":[6, 7, 8],
    "ice cream":  [5, 6, 7, 8], "popsicle": [5, 6, 7, 8],
    "lemonade":   [5, 6, 7, 8], "iced tea": [5, 6, 7, 8],
    "seltzer":    [5, 6, 7, 8], "sparkling water": [5, 6, 7, 8],
    "hot dog":    [5, 6, 7, 8], "bun":      [5, 6, 7, 8],
    "bbq":        [5, 6, 7, 8], "barbecue": [5, 6, 7, 8],
    "charcoal":   [5, 6, 7, 8], "coleslaw": [5, 6, 7, 8],
    # ── FALL (Sep–Nov) ──────────────────────────────────────────────────────
    "apple":      [9, 10, 11], "pear":       [9, 10, 11],
    "fig":        [8, 9, 10],  "grape":      [8, 9, 10],
    "pomegranate":[10, 11, 12],"persimmon":  [10, 11],
    "pumpkin pie":[10, 11],    "baking":     [10, 11, 12],
    "flour":      [10, 11, 12],"pie crust":  [10, 11],
    "turkey":     [11],        "stuffing":   [11],
    "gravy":      [11],        "cranberry sauce": [11],
    "yam":        [11, 12],
    # ── HOLIDAYS ────────────────────────────────────────────────────────────
    "champagne":  [12, 1], "wine":      [11, 12],
    "chocolate":  [2, 12], "candy":     [10, 12],
}

print(f"Seasonal keywords defined: {len(SEASONAL_KEYWORDS)}")


Seasonal keywords defined: 67


In [3]:
print("Matching keywords to catalog products...")

monthly_seasonal = defaultdict(list)
seasonal_items_set = set()

for item in item_catalog:
    item_lower = item["name"].lower()
    for keyword, months in SEASONAL_KEYWORDS.items():
        if keyword in item_lower:
            seasonal_items_set.add(item["name"])
            for month in months:
                monthly_seasonal[month].append({
                    "name":          item["name"],
                    "category":      item["category"],
                    "keyword_match": keyword,
                })
            break  # one match per item

# Deduplicate within each month
month_names = ["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
for month in monthly_seasonal:
    seen, deduped = set(), []
    for item in monthly_seasonal[month]:
        if item["name"] not in seen:
            seen.add(item["name"])
            deduped.append(item)
    monthly_seasonal[month] = deduped

print(f"Seasonal items matched: {len(seasonal_items_set)}")
print("\nItems per month:")
for m in range(1, 13):
    items = monthly_seasonal.get(m, [])
    preview = ", ".join(i["name"] for i in items[:3])
    print(f"  {month_names[m]:>3}: {len(items):>3} items  ({preview})")


Matching keywords to catalog products...
Seasonal items matched: 867

Items per month:
  Jan:  61 items  (Organic Low Sodium Chicken Broth, Organic Free Range Chicken Broth, Organic Vegetable Broth)
  Feb: 140 items  (Organic Low Sodium Chicken Broth, Chocolate Chip Cookies, Almonds & Sea Salt in Dark Chocolate)
  Mar: 155 items  (Asparagus, Organic D'Anjou Pears, Organic Bartlett Pear)
  Apr: 157 items  (Asparagus, Organic D'Anjou Pears, Organic Bartlett Pear)
  May: 284 items  (Sparkling Water Grapefruit, Organic Small Bunch Celery, Asparagus)
  Jun: 346 items  (Organic Zucchini, Cucumber Kirby, Organic Grape Tomatoes)
  Jul: 346 items  (Organic Zucchini, Cucumber Kirby, Organic Grape Tomatoes)
  Aug: 354 items  (Organic Zucchini, Cucumber Kirby, Organic Grape Tomatoes)
  Sep: 244 items  (Organic Fuji Apple, Apple Honeycrisp Organic, Organic Grape Tomatoes)
  Oct: 206 items  (Organic Fuji Apple, Apple Honeycrisp Organic, Seedless Red Grapes)
  Nov: 254 items  (Organic Fuji Apple, App

In [4]:
print("\nSaving seasonal_items.json...")
seasonal_output = {str(m): items for m, items in monthly_seasonal.items()}

with open(f"{OUTPUT_DIR}/seasonal_items.json", "w") as f:
    json.dump(seasonal_output, f, indent=2)
print(f"Saved: {OUTPUT_DIR}/seasonal_items.json")

# Update item_catalog.json with is_seasonal + peak_months
seasonal_name_to_months = defaultdict(set)
for month, items in monthly_seasonal.items():
    for item in items:
        seasonal_name_to_months[item["name"]].add(month)

updated = 0
for item in item_catalog:
    if item["name"] in seasonal_name_to_months:
        item["is_seasonal"] = True
        item["peak_months"] = sorted(seasonal_name_to_months[item["name"]])
        updated += 1
    else:
        item["is_seasonal"] = False
        item["peak_months"] = []

with open(f"{OUTPUT_DIR}/item_catalog.json", "w") as f:
    json.dump(item_catalog, f, indent=2)

print(f"Updated item_catalog.json: {updated} items flagged as seasonal")
print(f"Seasonal rate: {updated/len(item_catalog)*100:.1f}%")



Saving seasonal_items.json...
Saved: ../data/output/seasonal_items.json
Updated item_catalog.json: 867 items flagged as seasonal
Seasonal rate: 28.9%


In [5]:
print("\n" + "=" * 60)
print("CHECKPOINT: Notebook 04 Complete")
print("=" * 60)
print(f"""
Files in {OUTPUT_DIR} ready for deployment:
  ✓ item_catalog.json       ~3000 items, categories, seasonal flags
  ✓ category_mapping.json   aisle → 15-category mapping
  ✓ co_purchase_rules.json  Apriori: item → top-10 co-purchases
  ✓ item_similarities.json  Item2Vec: item → top-10 similar
  ✓ substitutes.json        Item2Vec: high-similarity alternatives
  ✓ seasonal_items.json     month → seasonal products

Copy all 6 JSON files to backend/data/ in your repo.
Total size: ~9 MB (committed to git, loaded at startup).

Do NOT commit:
  ✗ order_baskets.pkl
  ✗ item2vec.model
  ✗ apriori_rules.pkl
  ✗ product_frequency.csv
""")



CHECKPOINT: Notebook 04 Complete

Files in ../data/output ready for deployment:
  ✓ item_catalog.json       ~3000 items, categories, seasonal flags
  ✓ category_mapping.json   aisle → 15-category mapping
  ✓ co_purchase_rules.json  Apriori: item → top-10 co-purchases
  ✓ item_similarities.json  Item2Vec: item → top-10 similar
  ✓ substitutes.json        Item2Vec: high-similarity alternatives
  ✓ seasonal_items.json     month → seasonal products

Copy all 6 JSON files to backend/data/ in your repo.
Total size: ~9 MB (committed to git, loaded at startup).

Do NOT commit:
  ✗ order_baskets.pkl
  ✗ item2vec.model
  ✗ apriori_rules.pkl
  ✗ product_frequency.csv

