# Building a Recommender 

*In this project, I will be creating  a spelling recommender that uses nltk to find words similar to the misspelling.* 
Have you ever searched for a book on Amazon but mistyped the title? Instead of a frustrating dead-end, This pops up below the search bar: **'Did you mean...?'** This is the power of spelling recommender systems!

### Business Value
* By correcting spelling mistakes, advanced recommender versions of this one below ensure that users find what they are looking for more efficiently
* From a user experience perspective, users feel better supported and find the platform more user-friendly when their typos are rightfully corrected, which can increase customer satisfaction.
* *Sales!* Correcting search terms can directly better match users with products they 'actually" wish to buy, which would increase *conversion rates*, sales and less search abandonment!
* Reduced Search Abandonment: By providing correct alternatives to misspelled queries, these systems decrease the likelihood of users abandoning the search out of frustration or inability to find what they are looking for.

## Text Filtering

In [18]:
import nltk
from nltk.corpus import words
from nltk.metrics.distance import jaccard_distance, edit_distance
from nltk.util import ngrams
from nltk.corpus import cmudict


correct_spellings = words.words()
# Phonetic dictionary
prondict = cmudict.dict()

def spelling_recommender(misspelled_words):
    recommendations = []

    for misspelled_word in misspelled_words:
        # Generating phonetic code for the misspelled word
        phonetic_misspelled = [phonetic for word, phonetic in prondict.items() if misspelled_word in word]

        # Initial filtering
        candidates = [word for word in correct_spellings if word[0].lower() == misspelled_word[0].lower()]

        if phonetic_misspelled:
            candidates = [word for word in candidates if word in prondict and any(
                ph in phonetic_misspelled for ph in prondict[word])]

        if len(candidates) < 10:
            candidates = [word for word in correct_spellings if word[0].lower() == misspelled_word[0].lower()]

        # Calculateing combined metric for the rest of candidate words
        distances = ((edit_distance(misspelled_word, word) +
                      jaccard_distance(set(ngrams(misspelled_word, n=3)), set(ngrams(word, n=3))), word)
                     for word in candidates)
        
        # Let's find the candidate with the minimum combined distance
        closest_word = min(distances, key=lambda x: x[0])[1]
        recommendations.append(closest_word)

    return recommendations

# Testing the recommender with a list of common data science words
test_words = ['algoritm', 'neaural', 'netwrok', 'dat', 'scienc', 'mchine', 'learnin', 'statistcs', 'pythn']
recommendations = spelling_recommender(test_words)

print(recommendations)

['algorithm', 'neural', 'network', 'data', 'science', 'machine', 'learning', 'statistics', 'python']


#### Challenges & Learnings
The development of this spelling recommender underwent many iterations to improve its robustness and general applicability. Initially based on edit distance, I had to further refine it to include Jaccard distance and dynamic n-gram sizing to better address diverse spelling errors and difficulties handling short words. For example, after learning new text mining techniques, I later updated the project by adding phonetic similarity instead of filtering only by the initial letter and similar length. These modifications helped ensure the recommender could handle a wide range of input scenarios.

#### Acknowledged Limitations
* The model struggles with very short words
* The recommender does not consider the context in which words are used ( in testing this system, I used the data science context), which can lead to recommendations that are syntactically correct but semantically inappropriate for the given text

##### *Thank You for Reviewing This Project*

I appreciate you taking the time to go through my work. Please feel free to reach out if you have any questions, suggestions, or would like to discuss any aspects of this project further.

Best Regards,

Chaymae
##### *Chaymaejawhar@gmail.com*