# Query -> Category

<small>
(from <a href="http://maven.com/softwaredoug/cheat-at-search">Cheat at Search with LLMs</a> training course by Doug Turnbull.)
</small>

**Refinement** -- Give some hints we see from the furniture data

We learned that we want to try to model aspects of the user's _information need_ not just make queries better.

One such common dimension is query -> category classification

## Boilerplate

Install deps, mount GDrive, prompt for your OpenAI Key (placed in your GDrive)

In [None]:
!pip install git+https://github.com/softwaredoug/cheat-at-search.git

Collecting git+https://github.com/softwaredoug/cheat-at-search.git
  Cloning https://github.com/softwaredoug/cheat-at-search.git to /tmp/pip-req-build-kajfqg67
  Running command git clone --filter=blob:none --quiet https://github.com/softwaredoug/cheat-at-search.git /tmp/pip-req-build-kajfqg67
  Resolved https://github.com/softwaredoug/cheat-at-search.git to commit 7fbf2bf2845343912918f337a29127f9edc50bd2
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pyarrow<22.0.0,>=21.0.0 (from cheat_at_search==0.1.0)
  Downloading pyarrow-21.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting pystemmer<4.0.0,>=3.0.0 (from cheat_at_search==0.1.0)
  Downloading PyStemmer-3.0.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting searcharray<0.0.73,>=0.0.72 (from cheat_at_searc

In [None]:
from google.colab import drive
import os

# To persist a cache of data between different versions of the training
drive.mount('/content/drive/')
!mkdir -p /content/drive/MyDrive/cheat-at-search-data/
os.environ['CHEAT_AT_SEARCH_DATA_PATH'] = '/content/drive/MyDrive/cheat-at-search-data/'

Mounted at /content/drive/


In [None]:
from cheat_at_search.data_dir import ensure_data_subdir, DATA_PATH
print(f"Using {DATA_PATH} for data")
if not os.path.exists(os.path.join(DATA_PATH, 'openai_key.txt')):
    # Write key
    key = input("Enter your openai key: ")
    with open(os.path.join(DATA_PATH, 'openai_key.txt'), 'w') as f:
        f.write(key)
else:
    print("Found openai key on filesystem")

2025-08-31 21:29:07,491 - data_dir - INFO - Using WANDS data path from environment variable: /content/drive/MyDrive/cheat-at-search-data/


INFO:data_dir:Using WANDS data path from environment variable: /content/drive/MyDrive/cheat-at-search-data/


Using /content/drive/MyDrive/cheat-at-search-data/ for data
Found openai key on filesystem


## Import helpers

Helpers to compute ndcg of each query and other comparison functions

In [None]:
from cheat_at_search.search import run_strategy, graded_bm25, ndcgs, ndcg_delta, vs_ideal

2025-08-31 21:29:08,151 - data_dir - INFO - Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


INFO:data_dir:Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


2025-08-31 21:29:29,773 - data_dir - INFO - Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


INFO:data_dir:Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


2025-08-31 21:29:31,642 - data_dir - INFO - Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


INFO:data_dir:Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


2025-08-31 21:29:35,994 - data_dir - INFO - Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


INFO:data_dir:Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


2025-08-31 21:29:36,005 - data_dir - INFO - Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


INFO:data_dir:Directory /content/drive/MyDrive/cheat-at-search-data/wands_enriched already exists. Checking for updates...


2025-08-31 21:29:38,159 - data_dir - INFO - Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


INFO:data_dir:Updated https://github.com/softwaredoug/WANDS.git dataset at /content/drive/MyDrive/cheat-at-search-data/wands_enriched


## Import WANDS data

Import [Wayfair Annotated Dataset](https://github.com/wayfair/WANDS) a labeled furniture e-commerce dataset

In [None]:
from cheat_at_search.wands_data import products

products

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,category,sub_category,cat_subcat
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0,"[overallwidth-sidetoside:64.7, dsprimaryproduc...",Furniture,Bedroom Furniture,Furniture / Bedroom Furniture
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0,"[capacityquarts:7, producttype : slow cooker, ...",Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0,"[features : keep warm setting, capacityquarts:...",Kitchen & Tabletop,Small Kitchen Appliances,Kitchen & Tabletop / Small Kitchen Appliances
3,3,all-clad all professional tools pizza cutter,"Slicers, Peelers And Graters",Browse By Brand / All-Clad,this original stainless tool was designed to c...,overallwidth-sidetoside:3.5|warrantylength : l...,69.0,4.5,42.0,"[overallwidth-sidetoside:3.5, warrantylength :...",Browse By Brand,All-Clad,Browse By Brand / All-Clad
4,4,baldwin prestige alcott passage knob with roun...,Door Knobs,Home Improvement / Doors & Door Hardware / Doo...,the hardware has a rich heritage of delivering...,compatibledoorthickness:1.375 '' |countryofori...,70.0,5.0,42.0,"[compatibledoorthickness:1.375 '' , countryofo...",Home Improvement,Doors & Door Hardware,Home Improvement / Doors & Door Hardware
...,...,...,...,...,...,...,...,...,...,...,...,...,...
42989,42989,malibu pressure balanced diverter fixed shower...,Shower Panels,Home Improvement / Bathroom Remodel & Bathroom...,the malibu pressure balanced diverter fixed sh...,producttype : shower panel|spraypattern : rain...,3.0,4.5,2.0,"[producttype : shower panel, spraypattern : ra...",Home Improvement,Bathroom Remodel & Bathroom Fixtures,Home Improvement / Bathroom Remodel & Bathro...
42990,42990,emmeline 5 piece breakfast dining set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,,basematerialdetails : steel| : gray wood|ofhar...,1314.0,4.5,864.0,"[basematerialdetails : steel, : gray wood, of...",Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture
42991,42991,maloney 3 piece pub table set,Dining Table Sets,Furniture / Kitchen & Dining Furniture / Dinin...,this pub table set includes 1 counter height t...,additionaltoolsrequirednotincluded : power dri...,49.0,4.0,41.0,[additionaltoolsrequirednotincluded : power dr...,Furniture,Kitchen & Dining Furniture,Furniture / Kitchen & Dining Furniture
42992,42992,fletcher 27.5 '' wide polyester armchair,Teen Lounge Furniture|Accent Chairs,Furniture / Living Room Furniture / Chairs & S...,"bring iconic , modern style to your space in a...",legmaterialdetails : rubberwood|backheight-sea...,1746.0,4.5,1226.0,"[legmaterialdetails : rubberwood, backheight-s...",Furniture,Living Room Furniture,Furniture / Living Room Furniture


## Query -> Full classification

We'll first setup the model of query -> category and subcategory. We've precurated categories / subcategories for you to help.

In [None]:
from pydantic import BaseModel, Field
from typing import List, Literal
from cheat_at_search.enrich import AutoEnricher

from typing import Literal, get_args
FullyQualifiedClassifications = Literal[
 'Furniture / Bedroom Furniture / Beds & Headboards / Beds',
 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs',
 'Rugs / Area Rugs',
 'Furniture / Office Furniture / Desks',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Tables',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / End & Side Tables',
 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows',
 'Furniture / Bedroom Furniture / Dressers & Chests',
 'Outdoor / Outdoor & Patio Furniture / Patio Furniture Sets / Patio Conversation Sets',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Vanities / All Bathroom Vanities',
 'Furniture / Living Room Furniture / Console Tables',
 'Décor & Pillows / Art / All Wall Art',
 'Furniture / Kitchen & Dining Furniture / Bar Furniture / Bar Stools & Counter Stools / All Bar Stools & Counter Stools',
 'Furniture / Kitchen & Dining Furniture / Dining Tables & Seating / Kitchen & Dining Chairs',
 'Furniture / Office Furniture / Office Chairs',
 'Décor & Pillows / Mirrors / All Mirrors',
 'Bed & Bath / Bedding / All Bedding',
 'Décor & Pillows / Wall Décor / Wall Accents',
 'Furniture / Living Room Furniture / Chairs & Seating / Recliners',
 'Furniture / Kitchen & Dining Furniture / Dining Tables & Seating / Kitchen and Dining Sets',
 'Décor & Pillows / Window Treatments / Curtains & Drapes',
 'Furniture / Living Room Furniture / Sectionals',
 'Baby & Kids / Toddler & Kids Bedroom Furniture / Kids Beds',
 'Furniture / Living Room Furniture / TV Stands & Media Storage Furniture / TV Stands & Entertainment Centers',
 'Lighting / Ceiling Lights / Chandeliers',
 'Furniture / Bedroom Furniture / Nightstands',
 'Baby & Kids / Toddler & Kids Bedroom Furniture / Kids Desks',
 'Décor & Pillows / Home Accessories / Decorative Objects',
 'Furniture / Bedroom Furniture / Beds & Headboards / Headboards',
 'Furniture / Living Room Furniture / Sofas',
 'Furniture / Living Room Furniture / Cabinets & Chests',
 'Décor & Pillows / Clocks / Wall Clocks',
 'Storage & Organization / Bathroom Storage & Organization / Bathroom Cabinets & Shelving',
 'Lighting / Table & Floor Lamps / Table Lamps',
 'Furniture / Living Room Furniture / Ottomans & Poufs',
 'Furniture / Kitchen & Dining Furniture / Kitchen Islands & Carts',
 'Furniture / Living Room Furniture / Bookcases',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Patio Seating / Patio Sofas & Sectionals',
 'Furniture / Office Furniture / Office Storage Cabinets',
 'Furniture / Kitchen & Dining Furniture / Dining Tables & Seating / Kitchen & Dining Tables',
 'Contractor / Entry & Hallway / Coat Racks & Umbrella Stands',
 'Bed & Bath / Bedding Essentials / Mattress Pads & Toppers',
 'Home Improvement / Hardware / Home Hardware / Switch Plates',
 'Baby & Kids / Toddler & Kids Playroom / Playroom Furniture / Toddler & Kids Chairs & Seating',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Outdoor Covers / Patio Furniture Covers',
 'Rugs / Doormats',
 'Rugs / Kitchen Mats',
 'Furniture / Bedroom Furniture / Beds & Headboards / Beds / Queen Size Beds',
 'Furniture / Bedroom Furniture / Daybeds',
 'Furniture / Living Room Furniture / Living Room Sets',
 'Outdoor / Outdoor & Patio Furniture / Patio Furniture Sets / Patio Dining Sets',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Sinks & Faucet Components / Bathroom Sink Faucets / Single Hole Bathroom Sink Faucets',
 'Outdoor / Outdoor Décor / Statues & Sculptures',
 'Décor & Pillows / Art / All Wall Art / Green Wall Art',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Table Sets',
 'Furniture / Living Room Furniture / Chairs & Seating / Chaise Lounge Chairs',
 'Storage & Organization / Wall Shelving & Organization / Wall and Display Shelves',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Tables / Rectangle Coffee Tables',
 'Décor & Pillows / Art / All Wall Art / Brown Wall Art',
 'Furniture / Kitchen & Dining Furniture / Bar Furniture / Bar Stools & Counter Stools / All Bar Stools & Counter Stools / Counter (24-27) Bar Stools & Counter Stools',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Plant Stands & Tables',
 'Décor & Pillows / Window Treatments / Curtain Hardware & Accessories',
 'Furniture / Kitchen & Dining Furniture / Dining Tables & Seating / Kitchen & Dining Chairs / Side Kitchen & Dining Chairs',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Patio Seating / Outdoor Club Chairs',
 'Furniture / Living Room Furniture / Chairs & Seating / Benches',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Kitchen Sinks & Faucet Components / Kitchen Sinks / Farmhouse & Apron Kitchen Sinks',
 'Kitchen & Tabletop / Kitchen Organization / Food Pantries',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Towel Storage / Towel & Robe Hooks / Black Towel & Robe Hooks',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Deck Boxes & Patio Storage',
 'Outdoor / Garden / Planters',
 'Lighting / Wall Lights / Bathroom Vanity Lighting',
 'Furniture / Kitchen & Dining Furniture / Sideboards & Buffets',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Storage Racks & Shelving Units',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Pulls / Bronze Cabinet & Drawer Pulls',
 'Storage & Organization / Storage Containers & Drawers / All Storage Containers',
 'Bed & Bath / Shower Curtains & Accessories / Shower Curtains & Shower Liners',
 'Storage & Organization / Bathroom Storage & Organization / Hampers & Laundry Baskets',
 'Lighting / Light Bulbs & Hardware / Light Bulbs / All Light Bulbs / LED Light Bulbs',
 'Décor & Pillows / Art / All Wall Art / Blue Wall Art',
 'Bed & Bath / Mattresses & Foundations / Innerspring Mattresses',
 'Lighting / Outdoor Lighting / Outdoor Wall Lighting',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Natural Material Storage / Log Storage',
 'Bed & Bath / Bathroom Accessories & Organization / Countertop Bath Accessories',
 'Storage & Organization / Shoe Storage / All Shoe Storage',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles / Ceramic Floor Tiles & Wall Tiles',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Pulls / Black Cabinet & Drawer Pulls',
 'Bed & Bath / Mattresses & Foundations / Adjustable Beds',
 "Rugs / Area Rugs / 2' x 3' Area Rugs",
 'Commercial Business Furniture / Commercial Office Furniture / Office Storage & Filing / Office Carts & Stands / All Carts & Stands',
 'Furniture / Bedroom Furniture / Beds & Headboards / Beds / Twin Beds',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Sinks & Faucet Components / Bathroom Sink Faucets / Widespread Bathroom Sink Faucets',
 "Rugs / Area Rugs / 4' x 6' Area Rugs",
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Sinks & Faucet Components / Bathroom Sink Faucets',
 'Kitchen & Tabletop / Tableware & Drinkware / Table & Kitchen Linens / All Table Linens',
 'Kitchen & Tabletop / Kitchen Organization / Food Storage & Canisters / Food Storage Containers',
 'Décor & Pillows / Flowers & Plants / Faux Flowers',
 'Bed & Bath / Bedding / All Bedding / Twin Bedding',
 'Furniture / Bedroom Furniture / Dressers & Chests / White Dressers & Chests',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles / Porcelain Floor Tiles & Wall Tiles',
 'Home Improvement / Flooring, Walls & Ceiling / Flooring Installation & Accessories / Molding & Millwork / Wall Molding & Millwork',
 'Home Improvement / Doors & Door Hardware / Door Hardware & Accessories / Barn Door Hardware',
 'Bed & Bath / Bedding / Sheets & Pillowcases',
 'Furniture / Office Furniture / Chair Mats / Hard Floor Chair Mats',
 'Outdoor / Outdoor Fencing & Flooring / All Fencing',
 'Storage & Organization / Closet Storage & Organization / Clothes Racks & Garment Racks',
 'Kitchen & Tabletop / Kitchen Utensils & Tools / Colanders, Strainers, & Salad Spinners',
 'Outdoor / Hot Tubs & Saunas / Saunas',
 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows / Blue Throw Pillows',
 'Bed & Bath / Bedding Essentials / Bed Pillows',
 'Lighting / Wall Lights / Wall Sconces',
 'Outdoor / Front Door Décor & Curb Appeal / Mailboxes',
 'Outdoor / Garden / Greenhouses',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Showers & Bathtubs Plumbing / Shower Faucets & Systems',
 'Bed & Bath / Mattresses & Foundations / Queen Mattresses',
 'Furniture / Bedroom Furniture / Jewelry Armoires',
 'Outdoor / Outdoor Shades / Awnings',
 'Baby & Kids / Nursery Bedding / Crib Bedding Sets',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Knobs / Brass Cabinet & Drawer Knobs',
 'Décor & Pillows / Art / All Wall Art / Red Wall Art',
 'Lighting / Ceiling Lights / All Ceiling Lights',
 'Lighting / Light Bulbs & Hardware / Lighting Components',
 'Furniture / Game Tables & Game Room Furniture / Poker & Card Tables',
 'Appliances / Kitchen Appliances / Range Hoods / All Range Hoods',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles / Natural Stone Floor Tiles & Wall Tiles',
 'Furniture / Kitchen & Dining Furniture / Bar Furniture / Bar Stools & Counter Stools / All Bar Stools & Counter Stools / Bar (28-33) Bar Stools & Counter Stools',
 'Outdoor / Outdoor Cooking & Tableware / Outdoor Serving & Tableware / Coolers, Baskets & Tubs / Picnic Baskets & Backpacks',
 'Décor & Pillows / Picture Frames & Albums / All Picture Frames',
 'Bed & Bath / Shower Curtains & Accessories / Shower Curtain Hooks',
 'Outdoor / Outdoor Shades / Outdoor Umbrellas / Patio Umbrella Stands & Bases',
 'Outdoor / Outdoor & Patio Furniture / Patio Bar Furniture / Patio Bar Stools',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Toilets & Bidets / Toilet Paper Holders / Free Standing Toilet Paper Holders',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Bike & Sport Racks',
 'Appliances / Kitchen Appliances / Refrigerators & Freezers / All Refrigerators / French Door Refrigerators',
 'Décor & Pillows / Home Accessories / Decorative Trays',
 'School Furniture and Supplies / School Spaces / Computer Lab Furniture / Podiums & Lecterns',
 'Lighting / Light Bulbs & Hardware / Lighting Shades',
 'Furniture / Kitchen & Dining Furniture / Bar Furniture / Home Bars & Bar Sets',
 'Lighting / Table & Floor Lamps / Floor Lamps',
 'Décor & Pillows / Wall Décor / Wall Accents / Brown Wall Accents',
 'Kitchen & Tabletop / Small Kitchen Appliances / Pressure & Slow Cookers / Slow Cookers / Slow Slow Cookers',
 'Décor & Pillows / Window Treatments / Curtains & Drapes / 90 Inch Curtains & Drapes',
 'Furniture / Bedroom Furniture / Armoires & Wardrobes',
 'Kitchen & Tabletop / Tableware & Drinkware / Flatware & Cutlery / Serving Utensils',
 'Baby & Kids / Baby & Kids Décor & Lighting / All Baby & Kids Wall Art',
 'Furniture / Office Furniture / Desks / Writing Desks',
 'Furniture / Office Furniture / Office Chairs / Task Office Chairs',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Shower & Bathtub Doors',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Patio Seating / Patio Rocking Chairs & Gliders',
 'Home Improvement / Flooring, Walls & Ceiling / Walls & Ceilings / Wall Paneling',
 'Outdoor / Garden / Plant Stands & Accessories',
 'Furniture / Kitchen & Dining Furniture / Dining Tables & Seating / Kitchen & Dining Tables / 4 Seat Kitchen & Dining Tables',
 'Décor & Pillows / Home Accessories / Vases, Urns, Jars & Bottles',
 'Lighting / Wall Lights / Under Cabinet Lighting / Strip Under Cabinet Lighting',
 'Furniture / Bedroom Furniture / Bedroom and Makeup Vanities',
 'Pet / Dog / Dog Bowls & Feeding Supplies / Pet Bowls & Feeders',
 'Décor & Pillows / Candles & Holders / Candle Holders',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Shower & Bathtub Accessories',
 'Furniture / Office Furniture / Office Chair Accessories / Seat Cushion Office Chair Accessories',
 'Furniture / Office Furniture / Chair Mats',
 'Furniture / Living Room Furniture / Chairs & Seating / Massage Chairs',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Vanities / All Bathroom Vanities / Modern & Contemporary Bathroom Vanities',
 'Lighting / Ceiling Fans / All Ceiling Fans',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Kitchen Sinks & Faucet Components / Kitchen Faucets / Black Kitchen Faucets',
 'Lighting / Light Bulbs & Hardware / Light Bulbs / All Light Bulbs / Incandescent Light Bulbs',
 'Home Improvement / Flooring, Walls & Ceiling / Flooring Installation & Accessories / Molding & Millwork',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Bathtubs',
 'Décor & Pillows / Art / All Wall Art / Yellow Wall Art',
 'Pet / Dog / Pet Gates, Fences & Doors / Pet Gates',
 'Furniture / Bedroom Furniture / Beds & Headboards / Bed Frames / Twin Bed Frames',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Towel Storage / Towel Bars, Racks, and Stands / Metal Towel Bars, Racks, and Stands',
 'Décor & Pillows / Art / All Wall Art / Pink Wall Art',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Smoke Detectors / Wall & Ceiling Mounted Smoke Detectors',
 'Outdoor / Garden / Planters / Plastic Planters',
 'Décor & Pillows / Mirrors / All Mirrors / Accent Mirrors',
 'Appliances / Kitchen Appliances / Range Hoods / All Range Hoods / Wall Mount Range Hoods',
 'Outdoor / Garden / Garden Décor / Lawn & Garden Accents',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Tables / Round Coffee Tables',
 'Kitchen & Tabletop / Tableware & Drinkware / Dinnerware / Dining Bowls',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Showers & Bathtubs Plumbing / Shower Heads / Dual Shower Heads',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles / Glass Floor Tiles & Wall Tiles',
 'School Furniture and Supplies / Facilities & Maintenance / Trash & Recycling',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Pulls / Nickel Cabinet & Drawer Pulls',
 'Storage & Organization / Closet Storage & Organization / Closet Systems',
 'Furniture / Bedroom Furniture / Beds & Headboards / Beds / Full & Double Beds',
 'Commercial Business Furniture / Commercial Office Furniture / Office Storage & Filing / Office Carts & Stands / All Carts & Stands / Printer Carts & Stands',
 'Storage & Organization / Closet Storage & Organization / Closet Accessories',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Vanities / All Bathroom Vanities / Traditional Bathroom Vanities',
 'Home Improvement / Plumbing / Core Plumbing / Parts & Components',
 'Holiday Décor / Christmas / Christmas Trees / All Christmas Trees',
 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows / Black Throw Pillows',
 'Furniture / Game Tables & Game Room Furniture / Sports Team Fan Shop & Memorabillia / Life Size Cutouts',
 'Lighting / Ceiling Lights / Pendant Lighting',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Towel Storage / Towel & Robe Hooks',
 'Appliances / Washers & Dryers / Dryers / All Dryers / Gas Dryers',
 'Outdoor / Outdoor Recreation / Backyard Play / Kids Cars & Ride-On Toys',
 'Kitchen & Tabletop / Small Kitchen Appliances / Coffee, Espresso, & Tea / Coffee Makers',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Showers & Bathtubs Plumbing / Shower Heads',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Patio Seating / Patio Sofas & Sectionals / Sectional Patio Sofas & Sectionals',
 'Lighting / Wall Lights / Under Cabinet Lighting',
 'Foodservice / Foodservice Tables / Table Parts',
 'Lighting / Outdoor Lighting / Landscape Lighting / All Landscape Lighting / Fence Post Cap Landscape Lighting',
 'Lighting / Outdoor Lighting / Landscape Lighting / All Landscape Lighting',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Tables / All Patio Tables',
 'Commercial Business Furniture / Commercial Office Furniture / Office Storage & Filing / Office Carts & Stands / All Carts & Stands / Utility Carts & Stands',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Outdoor Chaise & Lounge Chairs',
 'Furniture / Living Room Furniture / Chairs & Seating / Recliners / Brown Recliners',
 'Pet / Bird / Bird Perches & Play Gyms',
 'Décor & Pillows / Picture Frames & Albums / All Picture Frames / Single Picture Picture Frames',
 'Lighting / Outdoor Lighting / Outdoor Lanterns & Lamps',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Pulls',
 'Bed Accessories',
 'Clips/Clamps',
 'Décor & Pillows / Wall Décor / Wall Decals',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles',
 'Bed & Bath / Bedding / Sheets & Pillowcases / Twin XL Sheets & Pillowcases',
 'Kitchen & Tabletop / Tableware & Drinkware / Serveware / Serving Trays & Boards / Serving Trays & Platters / Serving Serving Trays & Platters',
 'Holiday Décor / Holiday Lighting',
 'Décor & Pillows / Wall Décor / Memo Boards',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Toilets & Bidets / Toilet Paper Holders / Wall Mounted Toilet Paper Holders',
 'Décor & Pillows / Window Treatments / Curtains & Drapes / 63 Inch and Less Curtains & Drapes',
 'Home Improvement / Doors & Door Hardware / Door Hardware & Accessories / Door Knobs / Egg Door Knobs',
 'Décor & Pillows / Clocks / Wall Clocks / Analog Wall Clocks',
 'Home Improvement / Doors & Door Hardware / Interior Doors / Sliding Interior Doors',
 'Outdoor / Outdoor Recreation / Outdoor Games / All Outdoor Games',
 'Home Improvement / Doors & Door Hardware / Door Hardware & Accessories / Door Levers / Round Door Levers',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Sheds / Storage Sheds',
 'Home Improvement / Doors & Door Hardware / Door Hardware & Accessories / Door Levers',
 'School Furniture and Supplies / School Furniture / School Tables / Folding Tables / Wood Folding Tables',
 'Décor & Pillows / Wall Décor / Wall Accents / Green Wall Accents',
 'School Furniture and Supplies / Facilities & Maintenance / Commercial Signage',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Garage Storage Cabinets',
 'Furniture / Bedroom Furniture / Dressers & Chests / Beige Dressers & Chests',
 'Storage & Organization / Wall Shelving & Organization / Wall & Display Shelves',
 'Furniture / Game Tables & Game Room Furniture / Dartboards & Cabinets',
 'Outdoor / Outdoor Décor / Outdoor Pillows & Cushions / Patio Furniture Cushions / Lounge Chair Patio Furniture Cushions',
 'Outdoor / Outdoor & Patio Furniture / Patio Furniture Sets / Patio Dining Sets / Two Person Patio Dining Sets',
 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows / Ivory & Cream Throw Pillows',
 'Appliances / Washers & Dryers / Washer & Dryer Sets / Black Washer & Dryer Sets',
 'School Furniture and Supplies / School Furniture / School Chairs & Seating / Stackable Chairs',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Pulls / Brass Cabinet & Drawer Pulls',
 'School Furniture and Supplies / School Boards & Technology / AV, Mounts & Tech Accessories / Electronic Mounts & Stands / Computer Mounts',
 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs / Papasan Accent Chairs',
 'Storage & Organization / Shoe Storage / All Shoe Storage / Rack Shoe Storage',
 'Storage & Organization / Shoe Storage / All Shoe Storage / Cabinet Shoe Storage',
 'Storage & Organization / Storage Containers & Drawers / Storage Drawers',
 'Appliances / Kitchen Appliances / Wine & Beverage Coolers / Water Coolers',
 'Furniture / Living Room Furniture / Chairs & Seating / Rocking Chairs',
 'Kitchen & Tabletop / Tableware & Drinkware / Serveware / Serving Bowls & Baskets / Serving Bowls / NA Serving Bowls',
 'Furniture / Living Room Furniture / TV Stands & Media Storage Furniture / Projection Screens / Inflatable Projection Screens',
 'Appliances / Kitchen Appliances / Large Appliance Parts & Accessories',
 'Storage & Organization / Bathroom Storage & Organization / Hampers & Laundry Baskets / Laundry Hampers & Laundry Baskets',
 'Furniture / Office Furniture / Office Stools',
 'Outdoor / Outdoor & Patio Furniture / Outdoor Seating & Patio Chairs / Patio Seating / Outdoor Club Chairs / Metal Outdoor Club Chairs',
 'School Furniture and Supplies / School Furniture / School Tables / Folding Tables',
 'Lighting / Wall Lights / Bathroom Vanity Lighting / Traditional Bathroom Vanity Lighting',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Bathroom Sinks & Faucet Components / Bathroom Sink Faucets / Centerset Bathroom Sink Faucets',
 'Décor & Pillows / Flowers & Plants / Faux Flowers / Orchid Faux Flowers',
 'Home Improvement / Flooring, Walls & Ceiling / Floor Tiles & Wall Tiles / Metal Floor Tiles & Wall Tiles',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Kitchen Sinks & Faucet Components / Kitchen Sinks',
 'Storage & Organization / Garage & Outdoor Storage & Organization / Outdoor Covers / Grill Covers / Charcoal Grill Grill Covers',
 'Outdoor / Outdoor Décor / Outdoor Wall Décor',
 'Storage & Organization / Cleaning & Laundry Organization / Laundry Room Organizers',
 'Reception Area / Reception Seating / Reception Sofas & Loveseats',
 'Kitchen & Tabletop / Cookware & Bakeware / Baking Sheets & Pans / Bread & Loaf Pans / Steel Bread & Loaf Pans',
 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs / Wingback Accent Chairs',
 'Home Improvement / Bathroom Remodel & Bathroom Fixtures / Showers & Bathtubs / Showers & Bathtubs Plumbing / Shower Heads / Fixed Shower Heads',
 'Kitchen & Tabletop / Kitchen Utensils & Tools / Kitchen Gadgets / Pasta Makers & Accessories',
 'School Furniture and Supplies / School Furniture / School Chairs & Seating / Classroom Chairs / High School & College Classroom Chairs',
 'Furniture / Living Room Furniture / Sectionals / Stationary Sectionals',
 'Furniture / Kitchen & Dining Furniture / Sideboards & Buffets / Drawer Equipped Sideboards & Buffets',
 'Kitchen & Tabletop / Cookware & Bakeware / Baking Sheets & Pans / Bread & Loaf Pans',
 'Kitchen & Tabletop / Kitchen Utensils & Tools / Cooking Utensils / All Cooking Utensils / Kitchen Cooking Utensils',
 'Décor & Pillows / Flowers & Plants / Live Plants',
 'Furniture / Living Room Furniture / TV Stands & Media Storage Furniture / Projection Screens / Folding Frame Projection Screens',
 'Kitchen & Tabletop / Kitchen Organization / Food Storage & Canisters / Kitchen Canisters & Jars / Metal Kitchen Canisters & Jars',
 'Outdoor / Outdoor Décor / Outdoor Fountains',
 'Outdoor / Outdoor Shades / Pergolas / Wood Pergolas',
 'Décor & Pillows / Candles & Holders / Candle Holders / Sconce Candle Holders',
 'Kitchen & Tabletop / Tableware & Drinkware / Serveware / Cake & Tiered Stands',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Kitchen Sinks & Faucet Components / Kitchen Faucets / Chrome Kitchen Faucets',
 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows / White Throw Pillows',
 'Outdoor / Outdoor Fencing & Flooring / Turf',
 'Décor & Pillows / Window Treatments / Valances & Kitchen Curtains',
 'Home Improvement / Hardware / Cabinet Hardware / Cabinet & Drawer Knobs / Black Cabinet & Drawer Knobs',
 'Home Improvement / Kitchen Remodel & Kitchen Fixtures / Kitchen Sinks & Faucet Components / Kitchen Faucets / Bronze Kitchen Faucets',
 'Appliances / Washers & Dryers / Washer & Dryer Sets',
 'Décor & Pillows / Clocks / Mantel & Tabletop Clocks',
 'Home Improvement / Doors & Door Hardware / Interior Doors',
 'Storage & Organization / Wall Shelving & Organization / Wall & Display Shelves / Floating Wall & Display Shelves',
 'Outdoor / Outdoor Recreation / Backyard Play / Climbing Toys & Slides',
 'Home Improvement / Building Equipment / Dollies / Hand Truck Dollies',
 'Baby & Kids / Toddler & Kids Bedroom Furniture / Baby & Kids Dressers',
 'Décor & Pillows / Mirrors / All Mirrors / Leaning & Floor Mirrors',
 'Kitchen & Tabletop / Tableware & Drinkware / Drinkware / Mugs & Teacups',
 'Décor & Pillows / Flowers & Plants / Wreaths',
 'Outdoor / Outdoor Shades / Pergolas / Metal Pergolas',
 'Bed & Bath / Bedding / Sheets & Pillowcases / Twin Sheets & Pillowcases',
 'Outdoor / Outdoor Shades / Pergolas',
 'Reception Area / Reception Seating / Office Sofas & Loveseats',
 'Décor & Pillows / Home Accessories / Indoor Fountains',
 'Kitchen & Tabletop / Kitchen Organization / Food Storage & Canisters / Kitchen Canisters & Jars / Ceramic Kitchen Canisters & Jars',
 'Décor & Pillows / Window Treatments / Curtain Hardware & Accessories / Bracket Curtain Hardware & Accessories',
 'Home Improvement / Flooring, Walls & Ceiling / Walls & Ceilings / Accent Tiles / Ceramic Accent Tiles',
 'Home Improvement / Flooring, Walls & Ceiling / Walls & Ceilings / Accent Tiles',
 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs / Arm Accent Chairs',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Tables / Free Form Coffee Tables',
 'Décor & Pillows / Flowers & Plants / Faux Flowers / Rose Faux Flowers',
 'Bed & Bath / Mattresses & Foundations / Innerspring Mattresses / Twin Innerspring Mattresses',
 'Outdoor / Outdoor Décor / Outdoor Pillows & Cushions / Patio Furniture Cushions / Dining Chair Patio Furniture Cushions',
 'Furniture / Living Room Furniture / TV Stands & Media Storage Furniture / TV Stands & Entertainment Centers / Traditional TV Stands & Entertainment Centers',
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Plant Stands & Tables / Square Plant Stands & Tables',
 'Storage & Organization / Wall Shelving & Organization / Wall & Display Shelves / Corner Wall & Display Shelves',
 "Rugs / Area Rugs / 3' x 5' Area Rugs",
 'Kitchen & Tabletop / Tableware & Drinkware / Drinkware / Mugs & Teacups / Coffee Mugs & Teacups',
 'Contractor / Entry & Hallway / Coat Racks & Umbrella Stands / Wall Mounted Coat Racks & Umbrella Stands',
 "Baby & Kids / Toddler & Kids Playroom / Indoor Play / Kids' Playhouses",
 'Furniture / Living Room Furniture / Coffee Tables & End Tables / Coffee Tables / Square Coffee Tables',
 'Baby & Kids / Toddler & Kids Playroom / Indoor Play / Dollhouses & Accessories',
 'Bed & Bath / Bedding / All Bedding / Queen Bedding',
 'No Classification Fits'
]

classifications_list = sorted(get_args(FullyQualifiedClassifications))

known_categories = set([c.split(" / ")[0].strip() for c in classifications_list])
known_sub_categories = set([c.split(" / ")[1].strip() for c in classifications_list if len(c.split(" / ")) > 1])

known_sub_categories


class Query(BaseModel):
    """
    Base model for search queries, containing common query attributes.
    """
    keywords: str = Field(
        ...,
        description="The original search query keywords sent in as input"
    )


class QueryClassification(Query):
    """
    Structured representation of a search query for furniture e-commerce.
    Inherits keywords from the base Query model and adds category and sub-category.
    """
    classifications: list[FullyQualifiedClassifications] = Field(
        description="A possible classification for the product."
    )

    @property
    def categories(self):
        return set([c.split(" / ")[0] for c in self.classifications])

    @property
    def sub_categories(self):
        return set([c.split(" / ")[1] for c in self.classifications if len(c.split(" / ")) > 1])



### Query classification code

In [None]:
enricher = AutoEnricher(
     model="openai/gpt-4.1-nano",
     system_prompt="You are a helpful furniture shopping agent that helps users construct search queries.",
     response_model=QueryClassification
)

def get_prompt_fully_qualified(query):
        prompt = f"""
        As a helpful agent, you'll recieve requests from users looking for furniture products.

        Your task is to search with a structured query against a furniture product catalog.

        Here is the users request:

        {query}

        Return the best classifications for this user's query.

        Try to pick as diverse a set of possible to ensure the customer finds what they need.
        (IE different top level categories are better than very similar classifications that share most of their tree / subtree)

        Keep in mind some notes about the furniture domain
        * bistro tables are for outdoors
        *

        Return an empty list if no classification fits, or its too ambiguous.

        """

        return prompt

def fully_classified(query):
    prompt = get_prompt_fully_qualified(query)
    classification = enricher.enrich(prompt).model_copy()
    if "No Classification Fits" in classification.classifications:
        classification.classifications = []
    return classification

fully_classified("dinosaur"), fully_classified("sofa loveseat")

2025-08-31 21:29:43,727 - data_dir - INFO - Looking for openai in environment variables or globals...


INFO:data_dir:Looking for openai in environment variables or globals...


2025-08-31 21:29:43,729 - data_dir - INFO - Reading openai_api_key API key from /content/drive/MyDrive/cheat-at-search-data//keys.json


INFO:data_dir:Reading openai_api_key API key from /content/drive/MyDrive/cheat-at-search-data//keys.json


2025-08-31 21:29:43,938 - data_dir - INFO - openai_api_key key loaded successfully.


INFO:data_dir:openai_api_key key loaded successfully.


2025-08-31 21:29:44,416 - query_parser - INFO - Loading enrich cache from /content/drive/MyDrive/cheat-at-search-data/enrich_cache/openaienricher_3e979c5575201e6082871b9e34bd5889_cache.pkl


INFO:query_parser:Loading enrich cache from /content/drive/MyDrive/cheat-at-search-data/enrich_cache/openaienricher_3e979c5575201e6082871b9e34bd5889_cache.pkl


2025-08-31 21:29:45,495 - query_parser - ERROR - Error loading cache file /content/drive/MyDrive/cheat-at-search-data/enrich_cache/openaienricher_3e979c5575201e6082871b9e34bd5889_cache.pkl: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte


ERROR:query_parser:Error loading cache file /content/drive/MyDrive/cheat-at-search-data/enrich_cache/openaienricher_3e979c5575201e6082871b9e34bd5889_cache.pkl: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte


2025-08-31 21:29:45,499 - query_parser - ERROR - Starting with empty cache due to error.


ERROR:query_parser:Starting with empty cache due to error.


(QueryClassification(keywords='dinosaur', classifications=['Décor & Pillows / Art / All Wall Art', 'Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows', 'Outdoor / Garden / Garden Décor / Lawn & Garden Accents']),
 QueryClassification(keywords='sofa loveseat', classifications=['Furniture / Living Room Furniture / Sofas', 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs']))

### Redefine ground truth

In [None]:
CUTOFF = 0.8

from cheat_at_search.wands_data import labeled_query_products, queries

# Get relevant products per query
top_products = labeled_query_products[labeled_query_products['grade'] == 2]

# Aggregate top categories
categories_per_query_ideal = top_products.groupby('query')['category'].value_counts().reset_index()

# Get as percentage of all categories for this query
top_cat_proportion = categories_per_query_ideal.groupby(['query', 'category']).sum() / categories_per_query_ideal.groupby('query').sum()
top_cat_proportion = top_cat_proportion.drop(columns='category').reset_index()

# Only look at cases where the category is > 0.8
top_cat_proportion = top_cat_proportion[top_cat_proportion['count'] > CUTOFF]
top_cat_proportion['category'].fillna('No Category Fits', inplace=True)
ground_truth_cat = top_cat_proportion
# Give No Category Fits to all others without dominant category
ground_truth_cat = ground_truth_cat.merge(queries, how='right', on='query')[['query', 'category', 'count']]
ground_truth_cat['category'].fillna('No Category Fits', inplace=True)
ground_truth_cat

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  top_cat_proportion['category'].fillna('No Category Fits', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  ground_truth_cat['category'].fillna('No Category Fits', inplace=True)


Unnamed: 0,query,category,count
0,salon chair,No Category Fits,
1,smart coffee table,Furniture,1.000000
2,dinosaur,No Category Fits,
3,turquoise pillows,Décor & Pillows,0.963636
4,chair and a half recliner,Furniture,0.956522
...,...,...,...
475,rustic twig,Décor & Pillows,1.000000
476,nespresso vertuo next premium by breville with...,No Category Fits,
477,pedistole sink,No Category Fits,
478,54 in bench cushion,No Category Fits,


In [None]:
def prec_cat(categorized):
    hits = []
    misses = []
    idx = 0
    for _, row in ground_truth_cat.sample(frac=1).iterrows():
        query = row['query']
        expected_category = row['category']

        cat = categorized(query)
        if len(cat.classifications) == 0:
            print(f"{idx} Skipping {query}")
            continue
        if expected_category != "No Category Fits":
            if expected_category.strip() in cat.categories:
                print(f"{idx} q:{query} -- pred:{cat.categories} == expected:{expected_category.strip()}")
                hits.append((expected_category, cat))
            else:
                print("***")
                print(f"{query} -- pred:{cat.categories} != expected:{expected_category.strip()}")
                print(cat.classifications)
                misses.append((expected_category, cat))
                num_so_far = len(hits) + len(misses)
                print(f"{idx} recall (N={num_so_far}) -- {len(hits) / (len(hits) + len(misses))}")
        idx += 1

    return len(hits) / (len(hits) + len(misses)), hits, misses

prec, hits, misses = prec_cat(fully_classified)
prec

0 q:rose gold lounge -- pred:{'Furniture'} == expected:Furniture
***
foutains with brick look -- pred:{'Décor & Pillows', 'Outdoor'} != expected:
['Décor & Pillows / Home Accessories / Decorative Objects', 'Outdoor / Outdoor Décor / Outdoor Fountains']
1 recall (N=2) -- 0.5
2 q:glow in the dark silent wall clock -- pred:{'Lighting', 'Décor & Pillows', 'Home Improvement'} == expected:Décor & Pillows
3 q:rolande heavy duty power lift assist recliner -- pred:{'Furniture'} == expected:Furniture
4 q:small curtain rods -- pred:{'Décor & Pillows'} == expected:Décor & Pillows
5 q:wine bar -- pred:{'Furniture', 'Décor & Pillows'} == expected:Furniture
6 q:wood rack wide -- pred:{'Furniture', 'Storage & Organization'} == expected:Storage & Organization
9 q:cloud modular sectional -- pred:{'Furniture'} == expected:Furniture
10 q:tye dye duvet cover -- pred:{'Bed & Bath'} == expected:Bed & Bath
11 q:bistro sets patio -- pred:{'Furniture', 'Outdoor'} == expected:Outdoor
12 q:itchington butterfly --

0.8599348534201955

In [None]:
def recall_all(categorized):
    """When we retrieve a category, is it correct?"""
    hits = []
    misses = []
    idx = 0
    for _, row in ground_truth_cat.sample(frac=1).iterrows():
        query = row['query']
        expected_category = row['category']

        cat = categorized(query)
        if len(cat.classifications) == 0:
            print(f"{idx} Skipping {query}")
            continue
        # ***
        # Now also consider this a miss, we should not have predicted any caterogies
        if expected_category == "No Category Fits" and len(cat.categories) > 0:
            print("!**")
            print(f"{query} -- pred:{cat.categories} != expected:{expected_category.strip()}")
            print(cat.classifications)
            misses.append((expected_category, cat))
            num_so_far = len(hits) + len(misses)
            print(f"{idx} recall (N={num_so_far}) -- {len(hits) / (len(hits) + len(misses))}")
        elif expected_category.strip() in cat.categories:
            # print(f"{idx} q:{query} -- pred:{cat.categories} == expected:{expected_category.strip()}")
            hits.append((expected_category, cat))
        else:
            print("***")
            print(f"{query} -- pred:{cat.categories} != expected:{expected_category.strip()}")
            print(cat.classifications)
            misses.append((expected_category, cat))
            num_so_far = len(hits) + len(misses)
            print(f"{idx} recall (N={num_so_far}) -- {len(hits) / (len(hits) + len(misses))}")
        idx += 1

    return len(hits) / (len(hits) + len(misses)), hits, misses

recall, hits, misses = recall_all(fully_classified)
recall

!**
hitchcock mid-century wall shelf -- pred:{'Furniture', 'Décor & Pillows'} != expected:No Category Fits
['Décor & Pillows / Wall Décor / Wall Accents', 'Furniture / Living Room Furniture / Bookcases', 'Furniture / Bedroom Furniture / Nightstands']
1 recall (N=2) -- 0.5
!**
surge protector -- pred:{'Home Improvement'} != expected:No Category Fits
['Home Improvement / Hardware / Home Hardware / Switch Plates', 'Home Improvement / Building Equipment / Dollies / Hand Truck Dollies']
3 recall (N=4) -- 0.5
!**
auburn throw pillows -- pred:{'Décor & Pillows'} != expected:No Category Fits
['Décor & Pillows / Decorative Pillows & Blankets / Throw Pillows', 'Décor & Pillows / Art / All Wall Art']
7 recall (N=8) -- 0.625
8 Skipping kisner
8 Skipping rattan truck
!**
alter furniture -- pred:{'Furniture', 'Outdoor'} != expected:No Category Fits
['Furniture / Bedroom Furniture / Beds & Headboards / Beds', 'Furniture / Living Room Furniture / Chairs & Seating / Accent Chairs', 'Furniture / Office 

0.5581395348837209

## Run Category search strategy with classifier

In [None]:
from searcharray import SearchArray
from cheat_at_search.tokenizers import snowball_tokenizer
from cheat_at_search.strategy.strategy import SearchStrategy
import numpy as np

from cheat_at_search.agent.enrich import CachedEnricher, OpenAIEnricher


class CategorySearch(SearchStrategy):
    def __init__(self, products, query_to_cat,
                 name_boost=9.3,
                 description_boost=4.1,
                 category_boost=10,
                 sub_category_boost=5):
        super().__init__(products)
        self.index = products
        self.index['product_name_snowball'] = SearchArray.index(
            products['product_name'], snowball_tokenizer)
        self.index['product_description_snowball'] = SearchArray.index(
            products['product_description'], snowball_tokenizer)

        cat_split = products['category hierarchy'].fillna('').str.split("/")

        products['category'] = cat_split.apply(
            lambda x: x[0].strip() if len(x) > 0 else ""
        )
        products['subcategory'] = cat_split.apply(
            lambda x: x[1].strip() if len(x) > 1 else ""
        )
        self.index['category_snowball'] = SearchArray.index(
            products['category'], snowball_tokenizer
        )
        self.index['subcategory_snowball'] = SearchArray.index(
            products['subcategory'], snowball_tokenizer
        )

        self.query_to_cat = query_to_cat
        self.name_boost = name_boost
        self.description_boost = description_boost
        self.category_boost = category_boost
        self.sub_category_boost = sub_category_boost

    def search(self, query, k=10):
        """Dumb baseline lexical search, but add a constant boost when
           the desired category or subcategory"""
        bm25_scores = np.zeros(len(self.index))
        structured = self.query_to_cat(query)
        tokenized = snowball_tokenizer(query)

        # ****
        # Baseline BM25 search from before
        for token in tokenized:
            bm25_scores += self.index['product_name_snowball'].array.score(token) * self.name_boost
            bm25_scores += self.index['product_description_snowball'].array.score(
                token) * self.description_boost

        # ****
        # If there's a subcategory, boost that by a constant amount
        for sub_category in structured.sub_categories:
            tokenized_subcategory = snowball_tokenizer(sub_category)
            subcategory_match = np.ones(len(self.index))
            if tokenized_subcategory:
                subcategory_match = self.index['subcategory_snowball'].array.score(tokenized_subcategory) > 0
            bm25_scores[subcategory_match] += self.sub_category_boost

        # ****
        # If there's a category, boost that by a constant amount
        for category in structured.categories:
            tokenized_category = snowball_tokenizer(category)
            category_match = np.ones(len(self.index))
            if tokenized_category:
                category_match = self.index['category_snowball'].array.score(tokenized_category) > 0
            bm25_scores[category_match] += self.category_boost

        top_k = np.argsort(-bm25_scores)[:k]
        scores = bm25_scores[top_k]

        return top_k, scores


ModuleNotFoundError: No module named 'cheat_at_search.agent'

In [None]:
categorized_search = CategorySearch(products, fully_classified)
graded_categorized = run_strategy(categorized_search)
graded_categorized

In [None]:
ndcgs(graded_bm25).mean(), ndcgs(graded_categorized).mean()

In [None]:
deltas = ndcg_delta(graded_categorized, graded_bm25)

In [None]:
sig_improved = len(deltas[deltas > 0.1])
print(f"Num Significatly Improved: {sig_improved}")
deltas[deltas > 0.1]

In [None]:
sig_harmed = len(deltas[deltas < -0.1])
print(f"Num Significatly Harmed: {sig_harmed}")
print(f"Prop improved/harmed: {sig_improved / (sig_harmed + sig_improved)} | {sig_harmed / (sig_harmed + sig_improved)}")
deltas[deltas < -0.1]

### Look at a query

In [None]:
QUERY = "sugar canister"
fully_classified(QUERY)

In [None]:
ground_truth_cat[ground_truth_cat['query'] == QUERY]

In [None]:
graded_categorized[graded_categorized['query'] == QUERY][['product_name', 'category hierarchy', 'grade']]

In [None]:
graded_bm25[graded_bm25['query'] == QUERY][['product_name', 'category hierarchy', 'grade']]

In [None]:
from searcharray import SearchArray
from cheat_at_search.tokenizers import snowball_tokenizer
from cheat_at_search.strategy.strategy import SearchStrategy
import numpy as np

from cheat_at_search.agent.enrich import CachedEnricher, OpenAIEnricher


class CategorySearch(SearchStrategy):
    def __init__(self, products, query_to_cat,
                 name_boost=9.3,
                 description_boost=4.1,
                 category_boost=10,
                 sub_category_boost=5):
        super().__init__(products)
        self.index = products
        self.index['product_name_snowball'] = SearchArray.index(
            products['product_name'], snowball_tokenizer)
        self.index['product_description_snowball'] = SearchArray.index(
            products['product_description'], snowball_tokenizer)

        cat_split = products['category hierarchy'].fillna('').str.split("/")

        products['category'] = cat_split.apply(
            lambda x: x[0].strip() if len(x) > 0 else ""
        )
        products['subcategory'] = cat_split.apply(
            lambda x: x[1].strip() if len(x) > 1 else ""
        )
        self.index['category_snowball'] = SearchArray.index(
            products['category'], snowball_tokenizer
        )
        self.index['subcategory_snowball'] = SearchArray.index(
            products['subcategory'], snowball_tokenizer
        )

        self.query_to_cat = query_to_cat
        self.name_boost = name_boost
        self.description_boost = description_boost
        self.category_boost = category_boost
        self.sub_category_boost = sub_category_boost

    def search(self, query, k=10):
        """Dumb baseline lexical search, but add a constant boost when
           the desired category or subcategory"""
        bm25_scores = np.zeros(len(self.index))
        structured = self.query_to_cat(query)
        tokenized = snowball_tokenizer(query)

        # ****
        # Baseline BM25 search from before
        num_tokens_matched = np.zeros(len(self.index))
        for token in tokenized:

            name_score = self.index['product_name_snowball'].array.score(token) * self.name_boost
            desc_score = self.index['product_description_snowball'].array.score(
                token) * self.description_boost
            bm25_scores += name_score + desc_score
            num_tokens_matched[(name_score + desc_score) > 0] += 1

        # ****
        # If there's a subcategory, boost that by a constant amount
        for sub_category in structured.sub_categories:
            tokenized_subcategory = snowball_tokenizer(sub_category)
            subcategory_match = np.ones(len(self.index))
            if tokenized_subcategory:
                subcategory_match = self.index['subcategory_snowball'].array.score(tokenized_subcategory) > 0
            bm25_scores[subcategory_match] += self.sub_category_boost

        # ****
        # If there's a category, boost that by a constant amount
        for category in structured.categories:
            tokenized_category = snowball_tokenizer(category)
            category_match = np.ones(len(self.index))
            if tokenized_category:
                category_match = self.index['category_snowball'].array.score(tokenized_category) > 0
            bm25_scores[category_match] += self.category_boost

        # ***
        # Require all tokens to match
        min_token_match = max(len(tokenized) / 2, 1)
        bm25_scores[num_tokens_matched < min_token_match] *= 0.8

        top_k = np.argsort(-bm25_scores)[:k]
        scores = bm25_scores[top_k]

        return top_k, scores


In [None]:
categorized_search = CategorySearch(products, fully_classified)
graded_categorized2 = run_strategy(categorized_search)
graded_categorized2

In [None]:
ndcgs(graded_bm25).mean(), ndcgs(graded_categorized2).mean()