# Spelling Correction with LLMs

<small>
(from <a href="http://maven.com/softwaredoug/cheat-at-search">Cheat at Search with LLMs</a> training course by Doug Turnbull.)
</small>

We'll use an LLM to correct some search queries. We'll try a first pass naive example, hit some walls, and see where we went wrong.

**Note** If you haven't already, you may want te review [the first notebook](https://colab.research.google.com/drive/1aUCvcBa1YdmsbIgYc74jlknl9_iRotp1?authuser=2#scrollTo=ccUNd_mLZWdA) which goes through the helpers and other tools here in more detail.

## Boilerplate

Install deps, mount GDrive, prompt for your OpenAI Key (placed in your GDrive)

In [None]:
!pip install git+https://github.com/softwaredoug/cheat-at-search.git
from cheat_at_search.data_dir import mount
mount(use_gdrive=True)    # colab, share data across notebook runs on gdrive
# mount(use_gdrive=False) # <- colab without gdrive
# mount(use_gdrive=False, manual_path="/path/to/directory")  # <- force data path to specific directory, ie you're running locally.

from cheat_at_search.search import run_strategy, graded_bm25, ndcgs, ndcg_delta, vs_ideal
from cheat_at_search.wands_data import products

products

Collecting git+https://github.com/softwaredoug/cheat-at-search.git
  Cloning https://github.com/softwaredoug/cheat-at-search.git to /tmp/pip-req-build-koze7w5a
  Running command git clone --filter=blob:none --quiet https://github.com/softwaredoug/cheat-at-search.git /tmp/pip-req-build-koze7w5a
  Resolved https://github.com/softwaredoug/cheat-at-search.git to commit 38a087b480422fb5f29fea8b25fbfb25f3492da3
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
2025-10-08 17:49:12,124 - data_dir - ERROR - Failed to update https://github.com/softwaredoug/WANDS.git dataset: fatal: Unable to create '/content/drive/MyDrive/cheat-at-search-data/wands_enriched/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.

ERROR:data_dir:Failed to update https://github.com/softwaredoug/WANDS.git dataset: fatal: Unable to create '/content/drive/MyDrive/cheat-at-search-data/wands_enriched/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.



CalledProcessError: Command '['git', '-C', '/content/drive/MyDrive/cheat-at-search-data/wands_enriched', 'reset', '--hard', 'origin/main']' returned non-zero exit status 128.

## Spelling correction types

We setup a a pydantic type mapping the full query to a spelling corrected version.

In [None]:
from pydantic import BaseModel, Field
from typing import List
from cheat_at_search.enrich import AutoEnricher


class SpellingCorrectedQuery(BaseModel):
    """
    Search query with spelling corrections applied
    """
    corrected_keywords: str = Field(
        ...,
        description="Spell-corrected search query per instructions"
    )

### Spellchecking generation code

We ask OpenAI to generate spelling corrections for a given query.

In [None]:
spell_correct_enricher = AutoEnricher(
    model="openai/gpt-4.1-nano",
    # model="google/gemini-2.5-flash-lite",
    system_prompt="You are a helpful AI assistant that very lightly spell-checks furniture e-commerce queries.",
    response_model=SpellingCorrectedQuery
)

def get_prompt(query: str) -> str:
    prompt = f"""
        Take this furniture e-commerce query and correct any obvious spelling mistakes

        {query}
    """
    return prompt


def corrector(query: str) -> SpellingCorrectedQuery:
    return spell_correct_enricher.enrich(get_prompt(query))

corrector("raaack glass")

### Spelling corrected search strategy

We **replace** the original query with spelling checked query.

Ideas to try
* What if we **boost** and keep the original?

In [None]:
from searcharray import SearchArray
from cheat_at_search.tokenizers import snowball_tokenizer
from cheat_at_search.strategy.strategy import SearchStrategy
import numpy as np

class SpellingCorrectedSearch(SearchStrategy):
    def __init__(self, products, corrector,
                 name_boost=9.3,
                 description_boost=4.1):
        super().__init__(products)
        self.index = products
        self.name_boost = name_boost
        self.description_boost = description_boost
        self.index['product_name_snowball'] = SearchArray.index(
            products['product_name'], snowball_tokenizer)
        self.index['product_description_snowball'] = SearchArray.index(
            products['product_description'], snowball_tokenizer)
        self.corrector = corrector


    def search(self, query, k=10):
        """Spellcheck lexical search"""
        bm25_scores = np.zeros(len(self.index))
        corrected = self.corrector(query)
        different = corrected.corrected_keywords.lower().split() != query.lower().split()
        asterisk = "*" if different else ""
        if different:
            print(f"Query: {query} -> Corrected: {corrected.corrected_keywords}{asterisk}")
            query = corrected.corrected_keywords
        tokenized = snowball_tokenizer(query)
        for token in tokenized:
            bm25_scores += self.index['product_name_snowball'].array.score(token) * self.name_boost
            bm25_scores += self.index['product_description_snowball'].array.score(token) * self.description_boost
        top_k = np.argsort(-bm25_scores)[:k]
        scores = bm25_scores[top_k]

        return top_k, scores


In [None]:
spell_check_search = SpellingCorrectedSearch(products, corrector)
corrected_results1 = run_strategy(spell_check_search)
corrected_results1

### Compare NDCGs of our first attempt

In [None]:
ndcgs(graded_bm25).mean(), ndcgs(corrected_results1).mean()

In [None]:
ndcg_delta(corrected_results1, graded_bm25)

In [None]:
corrector("tye dye duvet cover")

## Develop ground truth for spelling correction

We have a manually curated list of spelling corrections to evaluate our corrector. We also add in queries we DONT want to change.

**Ideas to try** -- how good is this ground truth, can it be improved?

In [None]:
def leave_alones(phrases):
    """Queries we don't want to change."""
    return {phrase: phrase for phrase in phrases}

# These should be left alone
spellcheck_ground_truth = leave_alones(["kohen 5 drawer dresser",
                                        "grantola wall mirror",
                                        "kisner",
                                        "malachi sled",
                                        "tressler rug",
                                        "bed side table",
                                        "pennfield playhouse",
                                        "platform bed side table",
                                        "liberty hardware francisco",
                                        "wood coffee table set by storage",
                                        "mahone porch rocking chair",
                                        "odum velvet",
                                        "mobley zero gravity adjustable bed with wireless remote",
                                        "fortunat coffee table",
                                        "alter furniture",
                                        "love seat wide faux leather tuxedo arm sofa",
                                        "regner power loom red",
                                        "gurney  slade 56",
                                        "mahone porch rocking chair",
                                        "golub dining table",
                                        "itchington butterfly"])

actual_correction = {
    "outdoor sectional doning": "outdoor sectional dining",
    "pedistole sink": "pedestal sink",
    "biycicle plant stands": "bicycle plant stands",
    "7 draw white dresser": "7 drawer white dresser",
    "glass lsmp shades": "glass lamp shades",
    "desk for kids tjat ate 10 year old": "desk for kids that are 10 year old",
    "twin over full bunk beds cool desins": "twin over full bunk beds cool designs",
    "sheets for twinxl": "sheets for twin xl",
    "tye dye duvet cover": "tie dye duvet cover",
    "foutains with brick look": "fountains with brick look",
    "westling coffee table": "wesling coffee table",
    "trinaic towel rod": "trinsic towel rod",
    "blk 18x18 seat cushions": "black 18x18 seat cushions",
    "big basket for dirty cloths": "big basket for dirty clothes"
}

ground_truth = {**spellcheck_ground_truth, **actual_correction}
ground_truth

{'kohen 5 drawer dresser': 'kohen 5 drawer dresser',
 'grantola wall mirror': 'grantola wall mirror',
 'kisner': 'kisner',
 'malachi sled': 'malachi sled',
 'tressler rug': 'tressler rug',
 'bed side table': 'bed side table',
 'pennfield playhouse': 'pennfield playhouse',
 'platform bed side table': 'platform bed side table',
 'liberty hardware francisco': 'liberty hardware francisco',
 'wood coffee table set by storage': 'wood coffee table set by storage',
 'mahone porch rocking chair': 'mahone porch rocking chair',
 'odum velvet': 'odum velvet',
 'mobley zero gravity adjustable bed with wireless remote': 'mobley zero gravity adjustable bed with wireless remote',
 'fortunat coffee table': 'fortunat coffee table',
 'alter furniture': 'alter furniture',
 'love seat wide faux leather tuxedo arm sofa': 'love seat wide faux leather tuxedo arm sofa',
 'regner power loom red': 'regner power loom red',
 'gurney  slade 56': 'gurney  slade 56',
 'golub dining table': 'golub dining table',
 'itc

### Function to compute accuracy

In [None]:
from tqdm import tqdm

def acc(corrector):
    hits = []
    misses = []
    for query, correction in tqdm(ground_truth.items()):
        corrected = corrector(query)
        corrected_keywords = corrected.corrected_keywords.strip().lower()
        expected_correction = correction.strip().lower()
        if corrected_keywords == expected_correction:
            hits.append(corrected)
        else:
            print(f"Bad correction: Query: {query} -> Corrected: {corrected.corrected_keywords}")
            misses.append(corrected)

    return len(hits) / (len(hits) + len(misses)), hits, misses

accuracy, hits, misses = acc(corrector)
accuracy, hits, misses

100%|██████████| 34/34 [00:00<00:00, 21017.88it/s]

Bad correction: Query: kohen 5 drawer dresser -> Corrected: Kohen 5-drawer dresser
Bad correction: Query: grantola wall mirror -> Corrected: grande wall mirror
Bad correction: Query: tressler rug -> Corrected: Tassler rug
Bad correction: Query: bed side table -> Corrected: bedside table
Bad correction: Query: wood coffee table set by storage -> Corrected: wood coffee table set with storage
Bad correction: Query: mahone porch rocking chair -> Corrected: mahogany porch rocking chair
Bad correction: Query: mobley zero gravity adjustable bed with wireless remote -> Corrected: Mobily Zero Gravity Adjustable Bed with Wireless Remote
Bad correction: Query: fortunat coffee table -> Corrected: fortunate coffee table
Bad correction: Query: gurney  slade 56 -> Corrected: gurney slade 56
Bad correction: Query: itchington butterfly -> Corrected: Hitchington butterfly
Bad correction: Query: desk for kids tjat ate 10 year old -> Corrected: desk for kids that are 10 years old
Bad correction: Query: we




(0.6176470588235294,
 [SpellingCorrectedQuery(corrected_keywords='Kisner'),
  SpellingCorrectedQuery(corrected_keywords='Malachi sled'),
  SpellingCorrectedQuery(corrected_keywords='Pennfield Playhouse'),
  SpellingCorrectedQuery(corrected_keywords='platform bed side table'),
  SpellingCorrectedQuery(corrected_keywords='Liberty Hardware Francisco'),
  SpellingCorrectedQuery(corrected_keywords='odum velvet'),
  SpellingCorrectedQuery(corrected_keywords='alter furniture'),
  SpellingCorrectedQuery(corrected_keywords='love seat wide faux leather tuxedo arm sofa'),
  SpellingCorrectedQuery(corrected_keywords='regner power loom red'),
  SpellingCorrectedQuery(corrected_keywords='Golub dining table'),
  SpellingCorrectedQuery(corrected_keywords='outdoor sectional dining'),
  SpellingCorrectedQuery(corrected_keywords='pedestal sink'),
  SpellingCorrectedQuery(corrected_keywords='bicycle plant stands'),
  SpellingCorrectedQuery(corrected_keywords='7 drawer white dresser'),
  SpellingCorrectedQ

## Build a better corrector

We bake in some information to the prompt.

**Thoughts** - is this overfit to our data? Is this appropriate guidance?

In [None]:
def better_corrector_prompt(query):
    prompt = f"""
        You're a furniture expert, you know all about what Wayfair sells.

        Take the user's query and correct any spelling mistakes.

        * Dont compound words. Just leave the original form alone: IE don't turn anti scratch into anti scratch
        * Dont decompound words Just leave the original form alone: IE dont turn antiscratch into anti scratch
        * Dont add hyphens (ie "anti scratch" not "anti-scratch")
        * *Remember your Wayfair expertise* -- DO NOT correct stylized product names known from the wayfair product line or other furniture / home improvement brands

        Here's the users query:
        {query}
    """
    return prompt



def better_corrector(query):
    return spell_correct_enricher.enrich(better_corrector_prompt(query))

acc(better_corrector)

 53%|█████▎    | 18/34 [00:18<00:16,  1.01s/it]

Bad correction: Query: gurney  slade 56 -> Corrected: gurney slade 56


 62%|██████▏   | 21/34 [00:21<00:15,  1.21s/it]

Bad correction: Query: outdoor sectional doning -> Corrected: outdoor sectional doning


 91%|█████████ | 31/34 [00:30<00:02,  1.22it/s]

Bad correction: Query: westling coffee table -> Corrected: westling coffee table


 94%|█████████▍| 32/34 [00:31<00:02,  1.09s/it]

Bad correction: Query: trinaic towel rod -> Corrected: tranic towel rod


 97%|█████████▋| 33/34 [00:32<00:01,  1.03s/it]

Bad correction: Query: blk 18x18 seat cushions -> Corrected: blk 18x18 seat cushions


100%|██████████| 34/34 [00:33<00:00,  1.01it/s]

Bad correction: Query: big basket for dirty cloths -> Corrected: big basket for dirty cloths





(0.8235294117647058,
 [SpellingCorrectedQuery(corrected_keywords='kohen 5 drawer dresser'),
  SpellingCorrectedQuery(corrected_keywords='grantola wall mirror'),
  SpellingCorrectedQuery(corrected_keywords='Kisner'),
  SpellingCorrectedQuery(corrected_keywords='malachi sled'),
  SpellingCorrectedQuery(corrected_keywords='tressler rug'),
  SpellingCorrectedQuery(corrected_keywords='bed side table'),
  SpellingCorrectedQuery(corrected_keywords='pennfield playhouse'),
  SpellingCorrectedQuery(corrected_keywords='platform bed side table'),
  SpellingCorrectedQuery(corrected_keywords='liberty hardware francisco'),
  SpellingCorrectedQuery(corrected_keywords='wood coffee table set by storage'),
  SpellingCorrectedQuery(corrected_keywords='mahone porch rocking chair'),
  SpellingCorrectedQuery(corrected_keywords='odum velvet'),
  SpellingCorrectedQuery(corrected_keywords='mobley zero gravity adjustable bed with wireless remote'),
  SpellingCorrectedQuery(corrected_keywords='fortunat coffee tab

### Rerun to see if NDCG improved

In [None]:
spell_check_search = SpellingCorrectedSearch(products, better_corrector)
corrected_results2 = run_strategy(spell_check_search)
corrected_results2

2025-10-08 12:02:37,522 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2025-10-08 12:02:37,531 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2025-10-08 12:02:37,535 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2025-10-08 12:02:37,831 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2025-10-08 12:02:38,123 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2025-10-08 12:02:38,437 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2025-10-08 12:02:38,724 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2025-10-08 12:02:38,947 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2025-10-08 12:02:38,953 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2025-10-08 12:02:38,960 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2025-10-08 12:02:39,000 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2025-10-08 12:02:39,051 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2025-10-08 12:02:39,053 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2025-10-08 12:02:39,094 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete


2025-10-08 12:02:39,133 - searcharray.indexing - INFO - Indexing begins w/ 4 workers


INFO:searcharray.indexing:Indexing begins w/ 4 workers


2025-10-08 12:02:39,145 - searcharray.indexing - INFO - 0 Batch Start tokenization


INFO:searcharray.indexing:0 Batch Start tokenization


2025-10-08 12:02:39,148 - searcharray.indexing - INFO - Tokenizing 42994 documents


INFO:searcharray.indexing:Tokenizing 42994 documents


2025-10-08 12:02:40,313 - searcharray.indexing - INFO - Tokenized 10000 (23.259059403637718%)


INFO:searcharray.indexing:Tokenized 10000 (23.259059403637718%)


2025-10-08 12:02:41,446 - searcharray.indexing - INFO - Tokenized 20000 (46.518118807275435%)


INFO:searcharray.indexing:Tokenized 20000 (46.518118807275435%)


2025-10-08 12:02:42,604 - searcharray.indexing - INFO - Tokenized 30000 (69.77717821091315%)


INFO:searcharray.indexing:Tokenized 30000 (69.77717821091315%)


2025-10-08 12:02:43,753 - searcharray.indexing - INFO - Tokenized 40000 (93.03623761455087%)


INFO:searcharray.indexing:Tokenized 40000 (93.03623761455087%)


2025-10-08 12:02:44,385 - searcharray.indexing - INFO - Tokenization -- vstacking


INFO:searcharray.indexing:Tokenization -- vstacking


2025-10-08 12:02:44,440 - searcharray.indexing - INFO - Tokenization -- DONE


INFO:searcharray.indexing:Tokenization -- DONE


2025-10-08 12:02:44,466 - searcharray.indexing - INFO - Inverting docs->terms


INFO:searcharray.indexing:Inverting docs->terms


2025-10-08 12:02:45,323 - searcharray.indexing - INFO - Encoding positions to bit array


INFO:searcharray.indexing:Encoding positions to bit array


2025-10-08 12:02:45,665 - searcharray.indexing - INFO - Batch tokenization complete


INFO:searcharray.indexing:Batch tokenization complete


2025-10-08 12:02:45,669 - searcharray.indexing - INFO - (main thread) Processing 1 batch results


INFO:searcharray.indexing:(main thread) Processing 1 batch results


2025-10-08 12:02:45,994 - searcharray.indexing - INFO - Indexing from tokenization complete


INFO:searcharray.indexing:Indexing from tokenization complete
Searching:   4%|▍         | 19/480 [00:17<05:47,  1.33it/s]

Query: foutains with brick look -> Corrected: fountains with brick look*


Searching:  12%|█▏        | 56/480 [00:47<05:21,  1.32it/s]

Query: tollette teal outdoor rug -> Corrected: toilet teal outdoor rug*


Searching:  17%|█▋        | 80/480 [01:09<05:58,  1.12it/s]

Query: 7 draw white dresser -> Corrected: 7 drawer white dresser*


Searching:  39%|███▉      | 189/480 [02:51<03:49,  1.27it/s]

Query: biycicle plant stands -> Corrected: bicycle plant stands*


Searching:  40%|████      | 194/480 [02:54<03:49,  1.25it/s]

Query: chabely 5 draw chest -> Corrected: chably 5 draw chest*
Query: desk for kids tjat ate 10 year old -> Corrected: desk for kids that are 10 year old*


Searching:  43%|████▎     | 205/480 [03:04<04:04,  1.13it/s]

Query: dull bed with shirt head board -> Corrected: dull bed with short head board*


Searching:  43%|████▎     | 206/480 [03:05<03:55,  1.16it/s]

Query: fawkes 36" blue vanity -> Corrected: fawkes 36 inch blue vanity*


Searching:  45%|████▍     | 215/480 [03:14<03:52,  1.14it/s]

Query: trinaic towel rod -> Corrected: tranic towel rod*
Query: sheets for twinxl -> Corrected: sheets for twin xl*


Searching:  46%|████▌     | 219/480 [03:16<02:52,  1.51it/s]

Query: small loving roomtables -> Corrected: small loving room tables*


Searching:  46%|████▌     | 220/480 [03:16<02:55,  1.48it/s]

Query: twin over full bunk beds cool desins -> Corrected: twin over full bunk beds cool designs*


Searching:  59%|█████▉    | 282/480 [04:13<02:36,  1.27it/s]

Query: 48" sliding single track , barn door for laundry -> Corrected: 48" sliding single track, barn door for laundry*


Searching:  63%|██████▎   | 301/480 [04:30<02:42,  1.10it/s]

Query: ligth bulb -> Corrected: light bulb*


Searching:  82%|████████▏ | 395/480 [05:50<01:09,  1.23it/s]

Query: glass lsmp shades -> Corrected: glass lamp shades*


Searching:  84%|████████▎ | 401/480 [05:55<01:11,  1.10it/s]

Query: refrigerator with ice an water in door -> Corrected: refrigerator with ice and water in door*


Searching:  94%|█████████▍| 452/480 [06:41<00:23,  1.17it/s]

Query: tye dye duvet cover -> Corrected: tie dye duvet cover*


Searching:  99%|█████████▉| 477/480 [07:03<00:02,  1.09it/s]

Query: pedistole sink -> Corrected: pedestal sink*


Searching: 100%|██████████| 480/480 [07:04<00:00,  1.13it/s]


Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count,features,...,query_id,rank,query_class,id,label,grade,discounted_gain,idcg,dcg,ndcg
0,7465,hair salon chair,Massage Chairs|Recliners,Furniture / Living Room Furniture / Chairs & S...,offers a wide selection of professional salon ...,fauxleathertype : pu|legheight-toptobottom:18|...,69.0,4.5,53.0,"[fauxleathertype : pu, legheight-toptobottom:1...",...,0,1,Massage Chairs,80.0,Exact,2.0,3.00,8.786905,8.10119,0.921962
1,25431,barberpub salon massage chair,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,salon chairs are a wonderful avenue for hairst...,supplierintendedandapproveduse : non residenti...,4.0,5.0,4.0,[supplierintendedandapproveduse : non resident...,...,0,2,Massage Chairs,29.0,Exact,2.0,1.50,8.786905,8.10119,0.921962
2,7468,mercer41 hair salon chair hydraulic styling ch...,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,mercer41 beauty offers a wide selection profes...,seatfillmaterial : foam|waterrepellant : no re...,1.0,5.0,1.0,"[seatfillmaterial : foam, waterrepellant : no ...",...,0,3,Massage Chairs,104.0,Exact,2.0,1.00,8.786905,8.10119,0.921962
3,39461,professional salon reclining massage chair,Massage Chairs,Furniture / Living Room Furniture / Chairs & S...,new and in a good condition . first-rate metal...,overalldepth-fronttoback:39.4|warrantylength:1...,,,,"[overalldepth-fronttoback:39.4, warrantylength...",...,0,4,Massage Chairs,114.0,Exact,2.0,0.75,8.786905,8.10119,0.921962
4,9234,beauty salon task chair,,Furniture / Office Furniture / Office Chairs,"applicable scene : office , home life , beauty...",overallheight-toptobottom:37|backcolor : brown...,,,,"[overallheight-toptobottom:37, backcolor : bro...",...,0,5,Massage Chairs,32.0,Partial,1.0,0.20,8.786905,8.10119,0.921962
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4795,22194,wine glass rack,Kitchen Sink Storage,Kitchen & Tabletop / Kitchen Organization / Co...,drip-dry up to eight wineglasses with this cle...,glasscapacity:8|countryoforigin : united state...,5.0,4.5,3.0,"[glasscapacity:8, countryoforigin : united sta...",...,487,6,,,,0.0,0.00,8.786905,0.00000,0.000000
4796,40243,madisen hanging wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,complement your farmhouse kitchen decor with t...,producttype : wine glass rack|overallwidth-sid...,29.0,5.0,20.0,"[producttype : wine glass rack, overallwidth-s...",...,487,7,,,,0.0,0.00,8.786905,0.00000,0.000000
4797,40244,kena hanging wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,spruce up your farmhouse kitchen decor with th...,warrantylength:1 year|producttype : wine glass...,23.0,5.0,18.0,"[warrantylength:1 year, producttype : wine gla...",...,487,8,,,,0.0,0.00,8.786905,0.00000,0.000000
4798,39976,wall mounted wine glass rack,Wine Racks,Kitchen & Tabletop / Tableware & Drinkware / B...,"the latest addition to this collection , this ...",overallheight-toptobottom:4|design : wall moun...,34.0,4.5,18.0,"[overallheight-toptobottom:4, design : wall mo...",...,487,9,,,,0.0,0.00,8.786905,0.00000,0.000000


In [None]:
ndcgs(corrected_results2).mean(), ndcgs(graded_bm25).mean()

(np.float64(0.542740651673215), np.float64(0.5411098691836396))

In [None]:
ndcg_delta(corrected_results2, graded_bm25)

Unnamed: 0_level_0,ndcg
query,Unnamed: 1_level_1
tye dye duvet cover,0.349546
foutains with brick look,0.33523
glass lsmp shades,0.279637
sheets for twinxl,0.104322
7 draw white dresser,0.005419
refrigerator with ice an water in door,0.002529
desk for kids tjat ate 10 year old,-0.002439
pedistole sink,-0.063858
tollette teal outdoor rug,-0.227611


### Examine some of the differences

**Ideas to try** - Can you improve this corrector further?

* Produce a list of possible spell corrections
* Trying a new / different model
* Few-shotting by passing examples to the prompt

In [None]:
better_corrector('kisner')

SpellingCorrectedQuery(corrected_keywords='Kisner')

In [None]:
better_corrector("sheets for twinxl"), corrector("sheets for twinxl")

(SpellingCorrectedQuery(corrected_keywords='sheets for twin xl'),
 SpellingCorrectedQuery(corrected_keywords='sheets for twin XL'))

In [None]:
better_corrector("desk for kids tjat ate 10 year old"), corrector("desk for kids tjat ate 10 year old")

(SpellingCorrectedQuery(corrected_keywords='desk for kids that are 10 year old'),
 SpellingCorrectedQuery(corrected_keywords='desk for kids that are 10 years old'))