# Search

Before we get recommendations for a product, we first need to solve search by matching a user's search terms to a book, album, artist, or author. This notebook showcases how Presto! solves product search as part of its recommendation system.

Presto!'s search algorithm is an exercise in test-driven development and in efficient use of development resources. A production-grade general-purpose search algorithm could easily become a project-sized endeavor. Instead, Presto!'s approach to search was to continually identify real issues with real-world searches and to solve each issue as efficiently as possible. As a side effect of this test-driven approach, Presto!'s search algorithm notably solves issues specific to music and book searches that are likely not addressed even by much more sophisticated search algorithms.

In [41]:
import pandas as pd

from shared.query import search_text, find_products, connect

conn = connect()

## Search Features
This section shows the various ways that Presto!'s search is robust against common real-world problems:

In [30]:
def show_matches(term: str, matches: list) -> pd.DataFrame:
    search_term = search_text(term)
    match_terms = list(map(search_text, matches))
    match_results = list(map(lambda term: '✓' if term == search_term else '', match_terms))
    result = pd.DataFrame([match_results], index = [term], columns = matches)
    return result

### Case Insensitivity

In [32]:
print('Case-insensitive matches:')
show_matches('george orwell', ['George Orwell', 'GEORGE ORWELL'])

Case-insensitive matches:


Unnamed: 0,George Orwell,GEORGE ORWELL
george orwell,✓,✓


### Diacritic Insensitivity

Diacritics - letters with accents/marks such as 'Piña', 'Céline' or 'Blümchen' - are commonplace in all languages except English, and can have multiple different representations. For search to work outside of English artists, it is important that matches ignore any diacritic marks.

In [58]:
show_matches('blumchen', ['Blümchen', 'Bľùmçħëň', 'Bluemchen'])

Unnamed: 0,Blümchen,Bľùmçħëň,Bluemchen
blumchen,✓,✓,


### Punctuation Insensitivity

In [56]:
print('Punctuation-insensitive matches:')
show_matches('smile.dk', ['SmileDK', 'Smile DK', 'smile_dk', 'smile.dk!'])

Punctuation-insensitive matches:


Unnamed: 0,SmileDK,Smile DK,smile_dk,smile.dk!
smile.dk,✓,✓,,✓


### Article Insensitivity
In music, it is common to prefix a band name with 'The' and to ignore the leading 'The' when listing or searching the band:

In [61]:
print('Article-insensitive matches:')
show_matches('offspring', ['The Offspring', 'Offspring, The', 'Off The Spring'])

Article-insensitive matches:


Unnamed: 0,The Offspring,"Offspring, The",Off The Spring
offspring,✓,✓,✓


### Name Order Insensitivity
For authors and artists, some data sources may list the name in Lastname, Firstname format. Presto! is robust in handling this format when users search for a name:

In [112]:
show_matches('antoine de st exuperie', ['De St. Éxupérie, Antoine', 'St-Éxupérie, Antoine de', 'de st exuperie, antoine', 'st exuperie, antoine de', 'Antoine De Saint Exuperie'])

Unnamed: 0,"De St. Éxupérie, Antoine","St-Éxupérie, Antoine de","de st exuperie, antoine","st exuperie, antoine de",Antoine De Saint Exuperie
antoine de st exuperie,✓,✓,✓,✓,


## Real-World Examples
This section showcases search robustness with examples from the Presto! product catalog:

In [94]:
def search(term: str, category: str, field: str, conn = conn) -> pd.DataFrame:
    results = find_products(category, term, conn, field, exact_match = True)
    return results

In [110]:
search('antoine de saint exupery', 'Books', 'creator')

find_products: 9 results in 0.005 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,B0006QSHNA,639,The little prince,Antoine de Saint-Exupery,HMH Books For Young Readers,"Hello, I am the Little Prince. Bonjour, je sui...",2015-10-13,Books,Juvenile Fiction
2,B0006AONZC,73,"Wind, Sand and Stars",Antoine de Saint-Exupéry,HarperCollins,From the author of the beloved classic The Lit...,2010-01-01,Books,Biography & Autobiography
3,B0007FEXDS,72,"Wind, sand and stars (Harbrace paperbound libr...",Antoine de Saint-Exupéry,Houghton Mifflin Harcourt,A group of pilots must conquer the savage Ande...,1974,Books,Fiction
4,B0006APKFY,13,Flight to Arras,Antoine de Saint-Exupéry,Penguin Classics,This work stresses French writer and aviator A...,2000,Books,"Air pilots, Military"
5,B0006AQ7CO,4,"Airman's Odyssey (A Trilogy comprising Wind, S...",Antoine de Saint-Exupéry,,,1965,Books,"World War, 1939-1945"
6,B000I9XO74,2,Pilote De Guerre,Antoine de Saint-Exupéry,,,1994,Books,"World War, 1939-1945"
7,B0007IYD7G,2,Pilote de guerre (Collection Folio),Antoine de Saint-Exupéry,,,1972,Books,"Authors, French"
8,2070537625,1,Le Petit Prince (Livre + CD) (French Edition),Antoine de Saint-Exupéry,,'The Little Prince' is the most translated boo...,2011-07-07,Books,"Children's stories, French"


In [124]:
search('jrr tolkien', 'Books', 'creator')

find_products: 23 results in 0.049 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,B000NWU3I4,4316,"The Hobbitt, or there and back again; illustra...",J. R. R. Tolkien,Houghton Mifflin Harcourt,"Bilbo Baggins, a respectable, well-to-do hobbi...",2013,Books,Fiction
1,B000Q032UY,4266,The Hobbit or There and Back Again,J. R. R. Tolkien,Mariner Books,Celebrating 75 years of one of the world's mos...,2012,Books,Juvenile Fiction
2,B000NDSX6C,4118,The Hobbit,J. R. R. Tolkien,Mariner Books,Celebrating 75 years of one of the world's mos...,2012,Books,Juvenile Fiction
3,B000J1OR0Y,2397,The Lord of the Rings (3 Volume Set),J. R. R. Tolkien,,"A saga of dwarfs and elves, goblins and trolls...",1996,Books,"Baggins, Bilbo (Fictitious character)"
4,B000PIIMPW,2389,The Lord of the Rings Trilogy (The Fellowship ...,J. R. R. Tolkien,Mariner Books,"Presents a box set including the complete ""Lor...",2012-09-18,Books,Fiction
5,B000GQK706,2388,The Lord of the Rings - Boxed Set,J.R.R. Tolkien,HarperCollins,"This beautiful gift edition of The Hobbit, J.R...",2012-11-08,Books,Young Adult Fiction
6,B000L4056E,987,The Fellowship of the Ring,J.R.R. Tolkien,HarperCollins,Begin your journey into Middle-earth... The in...,2012-02-15,Books,Fiction
7,B000ND63P0,823,The Silmarillion,J. R. R. Tolkien,,Tales and legends chronicling the world's begi...,2014-10-07,Books,Fiction
8,0807209074,674,The Two Towers: Part II of The Lord of the Rin...,J.R.R. Tolkien,HarperCollins,Begin your journey into Middle-earth... The in...,2012-02-15,Books,Young Adult Fiction
9,B000Q08CDQ,443,Return of the King Being the Third Part of The...,J. R. R. Tolkien,HarperCollins,"Concluding the story begun in The Hobbit, this...",2012,Books,Fiction


In [122]:
search('lord of rings', 'Books', 'title')

find_products: 1 results in 0.003 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,613999592,2,Lord Of The Rings,John Ronald Reuel Tolkien,,"Frodo Baggins, bearer of the Ring of Power tha...",1973,Books,"Baggins, Frodo (Fictitious character)"


In [126]:
search('celine dion', 'Music', 'creator')

find_products: 243 results in 0.145 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,B009CVRQLY,384,Loved Me Back to Life,Celine Dion,,"LOVED ME BACK TO LIFE, the highly anticipated ...","September 3, 2013",Music,
1,B00005YXZI,301,A New Day Has Come,Celine Dion,,"Product description, This has the best song wh...","July 27, 2006",Music,
2,B000031XCR,292,All The Way...A Decade of Song,Celine Dion,,"Product Description, 16 of Celine's latest and...","April 30, 2006",Music,
3,B00000DHR0,229,These Are Special Times,Celine Dion,,"Product description, No Description AvailableN...","April 30, 2006",Music,
4,B001D0EI3Q,182,My Love Essential Collection,Céline Dion,,,,Music,
...,...,...,...,...,...,...,...,...,...
232,B00000DDY8,1,Love Can Move Mountains,Celine Dion,,,"January 26, 2007",Music,
233,B0000088GO,1,My Heart Will Go on,Celine Dion,,I will ship by EMS or SAL items in stock in Ja...,"December 13, 2006",Music,
235,B000007U0S,1,Dion Chante Plamondon,Celine Dion,,Dion sings the songs of Luc Plamondon with 2 e...,"January 26, 2007",Music,
238,B000002CPL,1,Misled / Real Emotion,Celine Dion,,,"February 10, 2007",Music,


Notice that result #4 uses 'Céline'. The next example should make diacritic insensitivity more clear:

In [129]:
search('blumchen', 'Music', 'creator')

find_products: 11 results in 0.010 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,B00000B4W5,8,Verliebt,Blümchen,,,"December 6, 2006",Music,
1,B00004TSZN,4,Die Welt Gehoert Dir,Blümchen,,Die Welt Gehoert Dir,"November 19, 2006",Music,
2,B0000245M8,3,Jasmin,Blümchen,,,"January 26, 2007",Music,
3,B0000561ZQ,1,Boomerang,Blümchen,,,"December 8, 2002",Music,
4,B0000561ZP,1,Bicycle Race,Blümchen,,,"December 8, 2002",Music,
5,B000053ZOI,1,Fuer Immer & Ewig,Blümchen,,,"October 29, 2006",Music,
6,B000050W5B,1,Blaue Augen [Single-CD],Blümchen,,,"March 16, 2010",Music,
7,B00003JA1I,1,Unter'm Weihnachtsbaum,Blümchen,,,"February 22, 2007",Music,
8,B000028DZO,1,Jasmin-Fan-Edition,Blümchen,,,"August 12, 2012",Music,
9,B00000JOZ8,1,Heut Ist Mein Tag,Blümchen,,,"August 13, 2012",Music,


In [133]:
search('offspring', 'Music', 'creator')

find_products: 80 results in 0.039 seconds


Unnamed: 0,id,reviews,title,creator,publisher,description,release_date,category,subcategory
0,B00000DHRZ,134,Americana,The Offspring,,"Product description, THE OFFSPRING, Amazon.com...","April 30, 2006",Music,
1,B000001IPL,120,Smash,The Offspring,,"Product description, Early CD issue of the 199...","July 26, 2006",Music,
2,B0000DIC87,118,Splinter Explicit Lyrics,The Offspring,,Drill hole on case,"September 30, 2006",Music,
3,B0018OAPAW,108,"Rise And Fall, Rage And Grace Explicit L...",The Offspring,,"Rise and Fall, Rage and Grace [Explicit Conten...","May 1, 2008",Music,
4,B000051XVK,107,Conspiracy Of One,The Offspring,,,,Music,
5,B01N63W54I,79,Greatest Hits explicit_lyrics,The Offspring,,Greatest Hits compilation from The Offspring i...,"November 19, 2016",Music,
6,B007Y6OZE0,75,Days Go By Explicit Lyrics,The Offspring,,"The Offspring return with DAYS GO BY, their ni...","April 27, 2012",Music,
7,B00097A5IQ,65,The Offspring - Greatest Hits,The Offspring,,"Long before there was, The OC, , there was The...","July 27, 2006",Music,
8,B08TZ9QY8F,45,Let The Bad Times Roll Explicit Lyrics,The Offspring,,The legendary So-Cal punk group The Offspring ...,"February 24, 2021",Music,
9,B000001IP0,43,Ignition,The Offspring,,"Amazon.com, It seems like the terms, catchy, a...","November 1, 2006",Music,


## Unhandled Cases

Though Presto!'s search is robust against a wide variety of use cases, handling all possible cases is very difficult. Known limitations of Presto!s search capabilities include:

### Substring Search
Presto! relies on database indices for performance throughout its architecture. Though indices can handle a wide variety of cases, substring search is notably not supported indices. Because of this, most substring searches are prohibitively slow.

One possible solution would be to duplicate all text fields used in search (title, artist/author) in reverse. This would allow us to implement efficient substring search by using a prefix search on the normal and reversed search strings.

Another possible solution would involve the use of FTS (full text search) capabilties built into SQL. However, note that this approach will not be compatible with the custom search capabilities that Presto! already offers. Also, FTS capabilities are vendor-specific, leading to vendor lock-in.

### Alternative Terms
Presto! does not disambiguate between different variations of the same term. For example, 'Jackson 5' vs 'Jackson Five', or 'St' vs 'Saint' or 'Street'.

### German Diacritic Variations
In German, it is valid to replace diacritics like the ü in 'Fahrvergnügen' with trailing e's, as in 'Fahrvergnuegen'. Presto! does not handle this case.