# What's the impact of weights in the ranking process ?

In [2]:
from productsearchengine import ProductSearchEngine

[nltk_data] Downloading package stopwords to /home/ensai/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


The main goal of this file is to find out how different could be our search engine results if the ranking changes.

First, let's try 3 simple querry adapted to our dataset : 

In [6]:
# Initialize search engine
search_engine = ProductSearchEngine()

# Example queries for testing
test_queries = [
    "Cat-Ear Beanie america"
]

# Test each query with all search types
search_types = ['all', 'exact', 'any']

# Define custom weights for testing
custom_weights = {
    "bm25_weight": 0.5,
    "exact_match_weight": 1.5,
    "review_weight": 0.2,
    "title_match_weight": 0.3,
    "origin_match_weight": 0.1
}

# Execute searches and display results
for query in test_queries:
    print(f"\nTesting query: {query}")
    for search_type in search_types:
        print(f"\nSearch type: {search_type}")
        results = search_engine.execute_search(query, search_mode=search_type, save_results=False, **custom_weights)
        print(f"Found {results['metadata']['document_count']} documents")
        
        # Display the top 3 results
        for i, (doc_url, doc_data) in enumerate(results['ranked_documents'][:3], 1):
            print(f"\n{i}. {search_engine.products[doc_url]['title']} (Score: {doc_data['final_score']:.3f})")
            print(f"URL: {doc_url}")
            for component, score in doc_data.items():
                if component != 'final_score':
                    print(f"  - {component}: {score:.3f}")
            if 'description' in search_engine.products[doc_url] and search_engine.products[doc_url]['description']:
                print(f"Description: {search_engine.products[doc_url]['description'][:200]}...")



Testing query: Cat-Ear Beanie america

Search type: all
Found 0 documents

Search type: exact
Found 0 documents

Search type: any
Found 31 documents

1. Cat-Ear Beanie - Pink small (Score: 3.976)
URL: https://web-scraping.dev/product/12?variant=pink-small
  - bm25_score: 2.924
  - exact_match_score: 0.000
  - review_score: 0.352
  - title_match_score: 0.600
  - origin_match_score: 0.100
Description: Available in a variety of colors like black, grey, white, pink, and blue, this beanie not only keeps you warm but also adds a playful element to your outfit. Add a touch of whimsy to your winter wardr...

2. Cat-Ear Beanie (Score: 3.972)
URL: https://web-scraping.dev/product/12
  - bm25_score: 2.920
  - exact_match_score: 0.000
  - review_score: 0.352
  - title_match_score: 0.600
  - origin_match_score: 0.100
Description: Add a touch of whimsy to your winter wardrobe with our cat ear beanie. Stay warm, look cute, and let your playful side shine with our cat ear beanie. Available in a varie

A simple observation is the impact of the origin match score, which is relatively low overall. It seems reasonable to keep its weight low since users may not often specify the location where they want to find the product.

Moreover, if we decide to increase this weight, we must exercise caution, as it could lead to irrelevant results appearing at the top simply because they share the same location as the product the user is looking for. We can assume that if a user is searching for a specific product in a particular location, they would likely prefer results showing the same product in other countries rather than a different product in the same location.

(Illustrate those examples with our dataset is quite difficult as we don't have a lot of data or pages having a lot of similarities)

In [10]:
# Initialize search engine
search_engine = ProductSearchEngine()

# Example queries for testing
test_queries = [
    "Box of Chocolate Candy"
]

# Test each query with all search types
search_types = ['all', 'exact', 'any']

# Define custom weights for testing
custom_weights = {
    "bm25_weight": 0.5,
    "exact_match_weight": 1.5,
    "review_weight": 0.2,
    "title_match_weight": 0.3,
    "origin_match_weight": 0.1
}

# Execute searches and display results
for query in test_queries:
    print(f"\nTesting query: {query}")
    for search_type in search_types:
        print(f"\nSearch type: {search_type}")
        results = search_engine.execute_search(query, search_mode=search_type, save_results=False, **custom_weights)
        print(f"Found {results['metadata']['document_count']} documents")
        
        # Display the top 3 results
        for i, (doc_url, doc_data) in enumerate(results['ranked_documents'][:3], 1):
            print(f"\n{i}. {search_engine.products[doc_url]['title']} (Score: {doc_data['final_score']:.3f})")
            print(f"URL: {doc_url}")
            for component, score in doc_data.items():
                if component != 'final_score':
                    print(f"  - {component}: {score:.3f}")
            if 'description' in search_engine.products[doc_url] and search_engine.products[doc_url]['description']:
                print(f"Description: {search_engine.products[doc_url]['description'][:200]}...")



Testing query: Box of Chocolate Candy

Search type: all
Found 21 documents

1. Box of Chocolate Candy (Score: 6.012)
URL: https://web-scraping.dev/product/1
  - bm25_score: 3.236
  - exact_match_score: 1.500
  - review_score: 0.376
  - title_match_score: 0.900
  - origin_match_score: 0.000
Description: Whether you're looking for the perfect gift or just want to treat yourself, our box of chocolate candy is sure to satisfy. Each box contains an assortment of rich, flavorful chocolates with a smooth, ...

2. Box of Chocolate Candy (Score: 5.976)
URL: https://web-scraping.dev/product/25
  - bm25_score: 3.200
  - exact_match_score: 1.500
  - review_score: 0.376
  - title_match_score: 0.900
  - origin_match_score: 0.000
Description: Whether you're looking for the perfect gift or just want to treat yourself, our box of chocolate candy is sure to satisfy. Indulge your sweet tooth with our box of chocolate candy. Choose from a varie...

3. Box of Chocolate Candy (Score: 5.850)
URL: https://we

The bm25_weight and exact_match_weight can be seen as opposing forces. A higher bm25_weight prioritizes term frequency and document length normalization, making it better for relevance based on term usage across a document. However, this might downplay criteria like exact matches, which can be critical in certain contexts. 

For example, if a user is trying to find a link they previously visited by searching with the exact title, an overly high bm25_weight could reduce the relevance of the result. In such cases, the top results might favor pages where the query terms appear frequently throughout the content, even if the title doesn't precisely match the query.


In [13]:
# Initialize search engine
search_engine = ProductSearchEngine()

# Example queries for testing
test_queries = [
    "Box of Chocolate Candy"
]

# Test each query with all search types
search_types = ['all', 'exact', 'any']

# Define custom weights for testing
custom_weights = {
    "bm25_weight": 0.5,
    "exact_match_weight": 5,
    "review_weight": 0.2,
    "title_match_weight": 0.3,
    "origin_match_weight": 0.1
}

# Execute searches and display results
for query in test_queries:
    print(f"\nTesting query: {query}")
    for search_type in search_types:
        print(f"\nSearch type: {search_type}")
        results = search_engine.execute_search(query, search_mode=search_type, save_results=False, **custom_weights)
        print(f"Found {results['metadata']['document_count']} documents")
        
        # Display the top 3 results
        for i, (doc_url, doc_data) in enumerate(results['ranked_documents'][:3], 1):
            print(f"\n{i}. {search_engine.products[doc_url]['title']} (Score: {doc_data['final_score']:.3f})")
            print(f"URL: {doc_url}")
            for component, score in doc_data.items():
                if component != 'final_score':
                    print(f"  - {component}: {score:.3f}")
            if 'description' in search_engine.products[doc_url] and search_engine.products[doc_url]['description']:
                print(f"Description: {search_engine.products[doc_url]['description'][:200]}...")



Testing query: Box of Chocolate Candy

Search type: all
Found 21 documents

1. Box of Chocolate Candy (Score: 9.512)
URL: https://web-scraping.dev/product/1
  - bm25_score: 3.236
  - exact_match_score: 5.000
  - review_score: 0.376
  - title_match_score: 0.900
  - origin_match_score: 0.000
Description: Whether you're looking for the perfect gift or just want to treat yourself, our box of chocolate candy is sure to satisfy. Each box contains an assortment of rich, flavorful chocolates with a smooth, ...

2. Box of Chocolate Candy (Score: 9.476)
URL: https://web-scraping.dev/product/25
  - bm25_score: 3.200
  - exact_match_score: 5.000
  - review_score: 0.376
  - title_match_score: 0.900
  - origin_match_score: 0.000
Description: Whether you're looking for the perfect gift or just want to treat yourself, our box of chocolate candy is sure to satisfy. Indulge your sweet tooth with our box of chocolate candy. Choose from a varie...

3. Box of Chocolate Candy (Score: 9.350)
URL: https://we

Finally, having an heavy weight on review score could be really interesting in some cases, as trying to buy a product and trying to chose the best one. In this situation, our searching engine would valorize the links that others users, with the same querry, have found as the best response on the database. 

In [14]:
# Initialize search engine
search_engine = ProductSearchEngine()

# Example queries for testing
test_queries = [
    "Cat-Ear Beanie america"
]

# Test each query with all search types
search_types = ['all', 'exact', 'any']

# Define custom weights for testing
custom_weights = {
    "bm25_weight": 0.5,
    "exact_match_weight": 1.5,
    "review_weight": 5,
    "title_match_weight": 0.3,
    "origin_match_weight": 0.1
}

# Execute searches and display results
for query in test_queries:
    print(f"\nTesting query: {query}")
    for search_type in search_types:
        print(f"\nSearch type: {search_type}")
        results = search_engine.execute_search(query, search_mode=search_type, save_results=False, **custom_weights)
        print(f"Found {results['metadata']['document_count']} documents")
        
        # Display the top 3 results
        for i, (doc_url, doc_data) in enumerate(results['ranked_documents'][:3], 1):
            print(f"\n{i}. {search_engine.products[doc_url]['title']} (Score: {doc_data['final_score']:.3f})")
            print(f"URL: {doc_url}")
            for component, score in doc_data.items():
                if component != 'final_score':
                    print(f"  - {component}: {score:.3f}")
            if 'description' in search_engine.products[doc_url] and search_engine.products[doc_url]['description']:
                print(f"Description: {search_engine.products[doc_url]['description'][:200]}...")



Testing query: Cat-Ear Beanie america

Search type: all
Found 0 documents

Search type: exact
Found 0 documents

Search type: any
Found 31 documents

1. Cat-Ear Beanie - Pink small (Score: 12.424)
URL: https://web-scraping.dev/product/12?variant=pink-small
  - bm25_score: 2.924
  - exact_match_score: 0.000
  - review_score: 8.800
  - title_match_score: 0.600
  - origin_match_score: 0.100
Description: Available in a variety of colors like black, grey, white, pink, and blue, this beanie not only keeps you warm but also adds a playful element to your outfit. Add a touch of whimsy to your winter wardr...

2. Cat-Ear Beanie (Score: 12.420)
URL: https://web-scraping.dev/product/12
  - bm25_score: 2.920
  - exact_match_score: 0.000
  - review_score: 8.800
  - title_match_score: 0.600
  - origin_match_score: 0.100
Description: Add a touch of whimsy to your winter wardrobe with our cat ear beanie. Stay warm, look cute, and let your playful side shine with our cat ear beanie. Available in a var