# Tabular semantic search on top of Amazon products using Superlinked

In this notebook we will explore how Superlinked works in building a tabular semantic search solution with natural language queries.

## Imports

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from superlinked import framework as sl

from superlinked_app import index, query
from superlinked_app.config import settings

settings.validate_processed_dataset_exists()

  from .autonotebook import tqdm as notebook_tqdm
[32m2024-12-14 12:34:45.355[0m | [1mINFO    [0m | [36msuperlinked_app.config[0m:[36m<module>[0m:[36m9[0m - [1mLoading '.env' file from: /Users/pauliusztin/Documents/01_projects/hands-on-retrieval/.env[0m


## Define the Superlinked app

For exploring how Superlinked multi-attribute indexes and queries work we will use an `InMemory` vector database and executor. 

Mongo will be used when shipping the Superlinked app as a RESTful API.

In [2]:
source: sl.InMemorySource = sl.InMemorySource(
    index.product,
    parser=sl.DataFrameParser(schema=index.product, mapping={index.product.id: "asin"}),
)
executor = sl.InMemoryExecutor(sources=[source], indices=[index.product_index])
app = executor.run()

## Load the processed dataset

In [3]:
df = pd.read_json(settings.PROCESSED_DATASET_PATH, lines=True)
df.head()

Unnamed: 0,asin,type,category,title,description,price,review_rating,review_count
0,B07WP4RXHY,product,[Tools & Home Improvement],YUEPIN U-Tube Clamp 304 Stainless Steel Hose P...,Product Description Specification: Material: 3...,9.99,4.7,54
1,B07VRZTK2N,product,[],"Apron for Women, Waterproof Adjustable Bib Coo...",,11.99,4.0,152
2,B07V2F5SN1,product,"[Arts, Crafts & Sewing]",DIY 5D Diamond Painting by Number Kit for Adul...,Product Description 5D DIY Diamond Painting is...,9.99,4.6,378
3,B00MNLQQ7K,product,"[Patio, Lawn & Garden]","Design Toscano QM2787100 Darby, the Forest Faw...",,40.72,4.7,274
4,B089YD2KK5,product,"[Clothing, Shoes & Jewelry]",Crocs Jibbitz 5-Pack Alien Shoe Charms | Jibbi...,From the brand Previous page Shop Crocs Collec...,9.99,4.7,0


In [4]:
len(df)

300

In [5]:
source.put([df])

pd.set_option("display.max_colwidth", 500)

## Query books using filters & natural queries

In [6]:
results = app.query(
    query.filter_query,
    natural_query="books with a price lower than 100",
    limit=3,
)
results.knn_params

{'description_weight': 1.0,
 'review_rating_maximizer_weight': 0.0,
 'price_minimizer_weights': 0.0,
 'limit': 3,
 'natural_query': 'books with a price lower than 100',
 'filter_by_type': 'book',
 'query_description': 'books',
 'filter_by_cateogry': None,
 'review_rating_bigger_than': None,
 'price_smaller_than': 100.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0}

In [7]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.532175,0
1,book,[Books],"Stables: Beautiful Paddocks, Horse Barns, and Tack Rooms",,4.7,100,53.1,0847833143,0.532175,1
2,book,[Books],"Spectrum Algebra 1 Workbook, Grades 6-8 Math Covering Algebra Equations, Fractions, Inequalities, Graphing, Rational Numbers, Classroom or Homeschool Curriculum",,4.6,0,7.86,1483816648,0.532175,2


In [8]:
results = app.query(
    query.filter_query,
    natural_query="books with a price lower than 100 and a rating bigger than 4",
    limit=3,
)
results.knn_params

{'description_weight': 1.0,
 'review_rating_maximizer_weight': 0.0,
 'price_minimizer_weights': 0.0,
 'limit': 3,
 'natural_query': 'books with a price lower than 100 and a rating bigger than 4',
 'filter_by_type': 'book',
 'query_description': 'books',
 'filter_by_cateogry': None,
 'review_rating_bigger_than': 4.0,
 'price_smaller_than': 100.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0}

In [9]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.532175,0
1,book,[Books],"Stables: Beautiful Paddocks, Horse Barns, and Tack Rooms",,4.7,100,53.1,0847833143,0.532175,1
2,book,[Books],"Spectrum Algebra 1 Workbook, Grades 6-8 Math Covering Algebra Equations, Fractions, Inequalities, Graphing, Rational Numbers, Classroom or Homeschool Curriculum",,4.6,0,7.86,1483816648,0.532175,2


📚 More on how [Superlinked natural queries (NLQ) works](https://rebrand.ly/superlinked-nlq-notebook).

## Query books using tabular semantic search & natural queries

In [10]:
results = app.query(
    query.semantic_query,
    natural_query="books with a price lower than 100",
    limit=3,
)
results.knn_params

{'description_weight': 0.0,
 'review_rating_maximizer_weight': 0.0,
 'price_minimizer_weights': 1.0,
 'limit': 3,
 'natural_query': 'books with a price lower than 100',
 'filter_by_type': 'book',
 'query_description': 'books',
 'query_price': 100.0,
 'query_review_rating': 0.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0,
 'price_similar_clause_weight': 1.0,
 'review_rating_similar_clause_weight': 1.0}

In [11]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],"Spectrum Algebra 1 Workbook, Grades 6-8 Math Covering Algebra Equations, Fractions, Inequalities, Graphing, Rational Numbers, Classroom or Homeschool Curriculum",,4.6,0,7.86,1483816648,0.999924,0
1,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.9999,1
2,book,[Books],All Aboard! New York: A City Primer,,4.6,74,9.99,1423640748,0.999877,2


In [12]:
results = app.query(
    query.semantic_query,
    natural_query="books with a price lower than 100 and a rating bigger than 4",
    limit=3,
)
results.knn_params

{'description_weight': 0.0,
 'review_rating_maximizer_weight': 1.0,
 'price_minimizer_weights': 1.0,
 'limit': 3,
 'natural_query': 'books with a price lower than 100 and a rating bigger than 4',
 'filter_by_type': 'book',
 'query_description': 'books',
 'query_price': 100.0,
 'query_review_rating': 4.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0,
 'price_similar_clause_weight': 1.0,
 'review_rating_similar_clause_weight': 1.0}

In [13]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.998409,0
1,book,[Books],"The Mindful Dragon: A Dragon Book about Mindfulness. Teach Your Dragon To Be Mindful. A Cute Children Story to Teach Kids about Mindfulness, Focus and Peace. (My Dragon Books)",,4.7,623,11.69,1948040107,0.998374,1
2,book,[Books],"Build Your Running Body (A Total-Body Fitness Plan for All Distance Runners, from Milers to Ultramarathoners—Run Farther, Faster, and Injury-Free)",,4.7,573,13.49,161519102X,0.998346,2


In [14]:
results = app.query(
    query.semantic_query,
    natural_query="Return the top 5 books (along with their review count and price) with the highest reviews rating.",
    limit=3,
)
results.knn_params

{'description_weight': 1.0,
 'review_rating_maximizer_weight': 1.0,
 'price_minimizer_weights': 0.5,
 'limit': 3,
 'natural_query': 'Return the top 5 books (along with their review count and price) with the highest reviews rating.',
 'filter_by_type': 'book',
 'query_description': 'books',
 'query_price': 0.0,
 'query_review_rating': 5.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0,
 'price_similar_clause_weight': 1.0,
 'review_rating_similar_clause_weight': 1.0}

In [15]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.780979,0
1,book,[Books],"The Mindful Dragon: A Dragon Book about Mindfulness. Teach Your Dragon To Be Mindful. A Cute Children Story to Teach Kids about Mindfulness, Focus and Peace. (My Dragon Books)",,4.7,623,11.69,1948040107,0.780966,1
2,book,[Books],"Build Your Running Body (A Total-Body Fitness Plan for All Distance Runners, from Milers to Ultramarathoners—Run Farther, Faster, and Injury-Free)",,4.7,573,13.49,161519102X,0.780955,2


In [16]:
results = app.query(
    query.semantic_query,
    natural_query="books on psychology with a price lower than 100 and a rating bigger than 4",
    limit=3,
)
results.knn_params

{'description_weight': 1.0,
 'review_rating_maximizer_weight': 1.0,
 'price_minimizer_weights': 1.0,
 'limit': 3,
 'natural_query': 'books on psychology with a price lower than 100 and a rating bigger than 4',
 'filter_by_type': 'book',
 'query_description': 'psychology',
 'query_price': 100.0,
 'query_review_rating': 4.0,
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 0.0,
 'description_similar_clause_weight': 1.0,
 'price_similar_clause_weight': 1.0,
 'review_rating_similar_clause_weight': 1.0}

In [17]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,0.841239,0
1,book,[Books],"The Mindful Dragon: A Dragon Book about Mindfulness. Teach Your Dragon To Be Mindful. A Cute Children Story to Teach Kids about Mindfulness, Focus and Peace. (My Dragon Books)",,4.7,623,11.69,1948040107,0.841216,1
2,book,[Books],"Build Your Running Body (A Total-Body Fitness Plan for All Distance Runners, from Milers to Ultramarathoners—Run Farther, Faster, and Injury-Free)",,4.7,573,13.49,161519102X,0.841198,2


📚 More on how [Superlinked natural queries (NLQ) works](https://rebrand.ly/superlinked-nlq-notebook).

## Find similar books based on a given product

In [18]:
df[df["asin"] == "B07WP4RXHY"]

Unnamed: 0,asin,type,category,title,description,price,review_rating,review_count
0,B07WP4RXHY,product,[Tools & Home Improvement],"YUEPIN U-Tube Clamp 304 Stainless Steel Hose Pipe Cable Strap Clips With Rubber Cushioned (1-21/32""(42mm)-10pcs)","Product Description Specification: Material: 304 Stainless Steel,100% New Rubber Color: Silver Shape: U Shape Quantity: 10 Pieces Note: Note: Since the size above is measured by hand, the size of the actual item you received could be slightly different from the size above. Product Description Specification: Material: 304 Stainless Steel,100% New Rubber Color: Silver Shape: U Shape Quantity: 10 Pieces Note: Note: Since the size above is measured by hand, the size of the actual item you receiv...",9.99,4.7,54


In [19]:
results = app.query(
    query.similar_items_query,
    natural_query="similar books to B07WP4RXHY with a price lower than 100 and a rating bigger than 4",
    limit=3,
)
results.knn_params

{'description_weight': 1.0,
 'review_rating_maximizer_weight': 1.0,
 'price_minimizer_weights': 1.0,
 'limit': 3,
 'natural_query': 'similar books to B07WP4RXHY with a price lower than 100 and a rating bigger than 4',
 'filter_by_type': 'book',
 'query_description': 'similar to B07WP4RXHY',
 'query_price': 100.0,
 'query_review_rating': 4.0,
 'product_id': 'B07WP4RXHY',
 'radius_param': None,
 'space_weight_CategoricalSimilaritySpace_7262928026088864445_param': 1.0,
 'description_similar_clause_weight': 1.0,
 'price_similar_clause_weight': 1.0,
 'review_rating_similar_clause_weight': 1.0,
 'with_vector_id_weight_param': 1.0}

In [20]:
results.to_pandas()

Unnamed: 0,type,category,title,description,review_rating,review_count,price,id,similarity_score,rank
0,book,[Books],100 Days to Brave: Devotions for Unlocking Your Most Courageous Self,,4.7,0,9.01,031008962X,65.636003,0
1,book,[Books],"The Mindful Dragon: A Dragon Book about Mindfulness. Teach Your Dragon To Be Mindful. A Cute Children Story to Teach Kids about Mindfulness, Focus and Peace. (My Dragon Books)",,4.7,623,11.69,1948040107,65.635986,1
2,book,[Books],"Build Your Running Body (A Total-Body Fitness Plan for All Distance Runners, from Milers to Ultramarathoners—Run Farther, Faster, and Injury-Free)",,4.7,573,13.49,161519102X,65.635972,2
