# PriceLance Category Reinference Plan

**Goal**: Systematically re-wire ALL ~32,000 products in the database into our canonical category system so every tech product lands in the closest existing category.

**Current Status**:
- 20,617 products with NULL category
- Only "Phones" and "Laptops" have partial coverage
- Many category pills appear empty in UI due to incomplete inference logic

**Approach**: Design and implement a comprehensive provider-category-to-canonical mapping, then re-run inference for all products.

## 1. Current Canonical Categories (Fixed - Do NOT Change)

Our canonical category system has **17 categories** across 6 families:

### Tech Family (9 categories)
- **Laptops** (slug: `laptops`)
- **Phones** (slug: `phones`)
- **Phone Cases & Protection** (slug: `phone-cases-protection`)
- **Monitors** (slug: `monitors`)
- **TV & Display** (slug: `tv-display`)
- **Headphones & Audio** (slug: `headphones-audio`)
- **Keyboards & Mice** (slug: `keyboards-mice`)
- **Tablets** (slug: `tablets`)
- **Smartwatches** (slug: `smartwatches`)

### Gifts & Lifestyle Family (3 categories)
- **Personal Care** (slug: `personal-care`)
- **Wellness & Supplements** (slug: `wellness-supplements`)
- **Gifts & Lifestyle** (slug: `gifts-lifestyle`)

### Books & Media, Toys & Games, Kitchen, Home & Garden (5 categories)
- **Books & Media** (slug: `books-media`)
- **Toys & Games** (slug: `toys-games`)
- **Kitchen** (slug: `kitchen`)
- **Small Appliances** (slug: `small-appliances`)
- **Home & Garden** (slug: `home-garden`)

**Key Rules**: No new categories, no renaming. Just route existing products into the best matching category.

## 2. Current Database Schema & Inference

### Product Table Fields
- `Product.category`: The final canonical category (currently NULL for 20,617 products)
- `Product.name`: Product name (used for name-based heuristics)
- `Product.description`: Product description
- `Product.source`: Source/provider name

**Current Inference Logic** (`src/lib/categoryInference.ts`):
- Existing `FEED_CATEGORY_TO_CANONICAL` mapping (incomplete)
- Hard rules for phone cases detection
- Name-based fallback heuristics
- Gaps: 20k+ products with NULL still need inference

### Re-inference Script
- `scripts/runReinfer.ts`: Batches products and updates category field
- Uses `inferCategorySlugFromIngestion()` function
- Needs to be enhanced with better provider-category mapping

## 3. Product Distribution (from inspectProductTypes.ts)

### Current State by Category
| Category | Count | Status |
|----------|-------|--------|
| NULL | 20,617 | **NEEDS INFERENCE** |
| Laptops | 3,314 | Partial |
| Phone Cases & Protection | 1,680 | Some misclassified (lanterns, DVD cases, camera bags) |
| Monitors | 0 | Empty |
| TV & Display | 0 | Empty |
| Headphones & Audio | 0 | Empty |
| Keyboards & Mice | 0 | Empty |
| Tablets | 0 | Empty |
| Smartwatches | 0 | Empty |
| Phones | 0 | Empty (but should have data!) |
| All Others | 0 | Empty |

**Total Products**: ~25,611 existing + 20,617 NULL = ~32,228 total

### Sample Products with NULL Category
- "Spacer SPDS-TypeC-HUP-3in1" (Acer) - likely a USB hub → should be **Laptops**
- "Chain Nose Pliers" - not tech → **Home & Garden** or should be skipped
- "MIXER 2 CANALE" - audio equipment → **Headphones & Audio**

## 4. Provider-to-Canonical Mapping Strategy

### Mapping Rules (by category family)

#### **TECH - Laptops** (Broadest tech accessory category)
- **Raw categories**: "notebook", "laptop", "genti notebook", "mini sisteme pc", "sisteme pc", "baterii externe", "power banks", "hub-uri usb", "incarcatoare"
- **Name keywords**: "laptop", "notebook", "ultrabook", "netbook", "hub", "docking", "charger", "power bank", "baterie externa"
- **Logic**: General PC/laptop accessories, power solutions, USB hubs → **Laptops**

#### **TECH - Phones**
- **Raw categories**: "telefoane mobile", "smartphone", "phone flagship"
- **Name keywords**: "telefon", "smartphone", "iphone", "samsung galaxy", "google pixel"
- **Logic**: Mobile phones → **Phones**

#### **TECH - Phone Cases & Protection**
- **Hard Rule**: If product name starts with "Husa" or contains "huse", "case", "cover", "folie" → **Phone Cases & Protection**
- **Raw categories**: "huse gsm", "huse telefoane", "folii protectie"

#### **TECH - Keyboards & Mice** (Input devices)
- **Raw categories**: "mouse", "tastaturi", "keyboard", "gamepad", "accesorii gaming", "kit tastatura + mouse"
- **Name keywords**: "mouse", "tastatura", "keyboard", "gamepad", "controller"
- **Logic**: Keyboards, mice, gamepads, input devices → **Keyboards & Mice**

#### **TECH - Headphones & Audio**
- **Raw categories**: "casti", "headphones", "boxe", "sistem audio", "boxe portabile", "home cinema"
- **Name keywords**: "casti", "headphones", "earbuds", "speaker", "boxa", "soundbar"
- **Logic**: Audio equipment → **Headphones & Audio**

#### **TECH - Monitors**
- **Raw categories**: "monitoare", "monitor led", "monitor gaming"
- **Name keywords**: "monitor", "display", "lcd", "led" (without "tv" keyword)
- **Logic**: PC monitors → **Monitors**

#### **TECH - TV & Display**
- **Raw categories**: "televizoare", "tv", "videoproiectoare", "ecrane de proiectie"
- **Name keywords**: "tv ", "televisor", "television", "proiector"
- **Logic**: TVs, projectors → **TV & Display**

#### **TECH - Tablets**
- **Raw categories**: "tablete", "tablete grafice", "tablet", "accesorii tablete"
- **Name keywords**: "tablet", "ipad"
- **Logic**: Tablets → **Tablets**

#### **TECH - Smartwatches**
- **Raw categories**: "smartwatch", "bratari fitness", "fitness & wearables"
- **Name keywords**: "smartwatch", "smart watch", "fitness band"
- **Logic**: Smartwatches, fitness bands → **Smartwatches**

#### **LIFESTYLE - Personal Care**
- **Raw categories**: "diverse cosmetice", "periute electrice", "aparate masaj", "epilatoare", "uscatoare par", "masini de ras"
- **Name keywords**: "toothbrush", "epilator", "hair dryer", "shaver", "grooming"
- **Logic**: Personal grooming/care devices → **Personal Care**

#### **LIFESTYLE - Wellness & Supplements**
- **Raw categories**: "supplements", "blood pressure", "cantare corporale"
- **Name keywords**: "supplement", "vitamin", "scale", "blood pressure"
- **Logic**: Health tracking, supplements → **Wellness & Supplements**

#### **APPLIANCES - Kitchen**
- **Raw categories**: "fierbatoare", "electric kettle", "microwave", "blender"
- **Name keywords**: "kettle", "microwave", "blender"
- **Logic**: Kitchen appliances → **Kitchen**

#### **APPLIANCES - Small Appliances**
- **Raw categories**: "aspiratoare", "vacuum", "robot vacuum", "masini spalat"
- **Name keywords**: "vacuum", "washing machine", "dryer"
- **Logic**: Cleaning, laundry appliances → **Small Appliances**

#### **LIFESTYLE - Home & Garden**
- **Raw categories**: "casa si bricolaj", "gradina", "scaune gaming", "decoratiuni"
- **Name keywords**: "chair", "lamp", "tool", "furniture", "garden"
- **Logic**: Furniture, tools, home goods → **Home & Garden**

#### **FALLBACK & NON-TECH**
- Products with ambiguous or non-tech categories → **Gifts & Lifestyle**
- Tools, art supplies, decorations → **Home & Garden**
- Books, media → **Books & Media**
- Games, toys → **Toys & Games**

## 5. Implementation Plan

### Step 1: Enhance `src/lib/categoryInference.ts`
- [x] Review existing `FEED_CATEGORY_TO_CANONICAL` mapping
- [ ] Add missing provider categories based on inspection
- [ ] Implement more robust name-based fallback heuristics
- [ ] Ensure every product gets assigned to one of 17 canonical categories
- [ ] Add logging for debugging inference decisions

### Step 2: Update re-inference script
- [ ] Verify `scripts/runReinfer.ts` uses new inference logic
- [ ] Ensure it handles NULL categories correctly
- [ ] Add before/after statistics
- [ ] Run with full dataset (no limit)

### Step 3: Verify coverage
- [ ] Run `npx tsx scripts/testTopCategories.ts` after re-inference
- [ ] Check API responses for each category slug
- [ ] Identify and fix any categories still returning 0 products

### Step 4: Manual UI verification
- [ ] Start dev server
- [ ] Click each category pill
- [ ] Verify products are displayed and make sense
- [ ] Spot-check sample products by category

### Step 5: Final commit & summary
- [ ] Document mapping strategy
- [ ] Report before/after counts
- [ ] List any categories with 0 products (intentional)