# FinlyWealth Search Engine API Usage Demo

This notebook demonstrates how to use the FinlyWealth search engine backend API with different retrieval method combinations.


In [43]:
import requests
import json
import pandas as pd
from typing import Dict, List, Optional, Tuple
import os
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("Dependencies loaded successfully!")

Dependencies loaded successfully!


# API Setup and Initialization

To prepare the API for use, you will need to set up the Docker container, load the necessary data into the database, and then launch the backend service. This process leverages instructions detailed in the `README.md` for a seamless setup.

## Prerequisites

- Docker and Docker Compose installed
- Python environment set up
- `make` utility available
- `.env` file configured (see step 2)

---

## Getting Started

Follow these steps to get the API up and running:

### 1. Start the Database Container

Open your terminal and execute the following command to build and launch the database container in detached mode:

```bash
docker compose -f docker-compose.db.yml up -d
```

## 2. Configure Database Credentials

Create a .env file in the root directory and populate it with the required database credentials. This file is used to securely configure your environment:

```env
# .env file - Database configuration
PGUSER=postgres
PGPASSWORD=postgres
PGHOST=localhost
PGPORT=5432
PGDATABASE=postgres
PGTABLE=products
```

## 3. Load Example Data

Use the make utility to load the example product data and associated images into your database:

```bash
make train


# 4. Launch the Backend API

Once the database is populated, start the backend API service by running the following Python script:

```bash
python src/backend/api.py

## API Client for Search Operations

This class provides a convenient interface for interacting with the search API.

In [44]:
class SearchAPIClient:
    """Client for interacting with the FinlyWealth search API."""
    
    def __init__(self, base_url: str = "http://localhost:5001"):
        """Initialize API client.

        Args:
            base_url: Base URL for the API
        """
        self.base_url = base_url
        self.session = requests.Session()
    
    def search_text(self, query: str, search_type: str = "text") -> Dict:
        """Perform text search.

        Args:
            query: Search query text
            search_type: Type of search ("text", "image", "multimodal")

        Returns:
            dict: Search results
        """
        try:
            data = {
                "query": query,
                "search_type": search_type
            }

            response = self.session.post(
                f"{self.base_url}/api/search",
                data=data,
                timeout=60
            )
            response.raise_for_status()
            return response.json()
        
        except requests.exceptions.RequestException as e:
            return {"error": f"Request failed: {str(e)}"}

    def search_image(self, image_path: str, search_type: str = "image") -> Dict:
        """Perform image search.

        Args:
            image_path: Path to image file or URL
            search_type: Type of search ("text", "image", "multimodal")

        Returns:
            dict: Search results
        """
        try:
            data = {
                "image_path": image_path,
                "search_type": search_type
            }

            response = self.session.post(
                f"{self.base_url}/api/search",
                data=data,
                timeout=60
            )
            response.raise_for_status()
            return response.json()
        
        except requests.exceptions.RequestException as e:
            return {"error": f"Request failed: {str(e)}"}

    def search_multimodal(self, query: str, image_path: str) -> Dict:
        """Perform multimodal search with both text and image.

        Args:
            query: Search query text
            image_path: Path to image file or URL

        Returns:
            dict: Search results
        """
        try:
            data = {
                "query": query,
                "image_path": image_path,
                "search_type": "multimodal"
            }

            response = self.session.post(
                f"{self.base_url}/api/search",
                data=data,
                timeout=60
            )
            response.raise_for_status()
            return response.json()
        
        except requests.exceptions.RequestException as e:
            return {"error": f"Request failed: {str(e)}"}

    def check_api_status(self) -> Dict:
        """Check API status and readiness.

        Returns:
            dict: API status information
        """
        try:
            response = self.session.get(f"{self.base_url}/api/ready", timeout=10)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            return {"error": f"Request failed: {str(e)}"}

# Initialize API client
api_client = SearchAPIClient()
print("API client initialized!")

API client initialized!


## Vannila Request

This section shows how normal requests can be carried out via the api. These includes
- Text only
- Image only
- Image and text

## Text Only Search

In [45]:
text_query = "leather shoes"

results = api_client.search_text(text_query, search_type="text")


In [46]:
results.keys()


dict_keys(['average_price', 'brand_distribution', 'category_distribution', 'elapsed_time_sec', 'price_range', 'reasoning', 'results', 'session_id'])

THe request returns a dictionary with the following structure

```python
{
    "average_price": 23.45,  # Average price of selected items
    "brand_distribution": {"Nike": 5, "Adidas": 3},  # Frequency per brand
    "category_distribution": {"Shoes": 4, "Hats": 2},  # Frequency per category
    "elapsed_time_sec": 1.23,  # Processing time
    "price_range": [10.0, 50.0],  # Min and max prices
    "reasoning": "Filtered by top-rated shoes",  # Explanation of logic
    "results": [...],  # List of results (products, recommendations, etc.)
    "session_id": "a1b2c3d4"  # Unique identifier for the session
}
```
### Output Dictionary Structure

| Key                  | Description                                                      | Example Value                      |
|----------------------|------------------------------------------------------------------|------------------------------------|
| `average_price`      | Average price of the selected products                           | `23.45`                            |
| `brand_distribution` | Count or percentage breakdown of brands in the selection         | `{'Nike': 5, 'Adidas': 3}`         |
| `category_distribution` | Count or percentage breakdown by product category             | `{'Shoes': 4, 'Hats': 2}`          |
| `elapsed_time_sec`   | Time taken to process the request in seconds                     | `1.23`                             |
| `price_range`        | Min and max prices observed in the selection                     | `[10.0, 50.0]`                     |
| `reasoning`          | Text explanation of how results were chosen                      | `"Filtered by top-rated shoes"`    |
| `results`            | List of resulting product records or summaries                   | `[...]`                            |
| `session_id`         | Unique session identifier                                        | `"a1b2c3d4"`                        |


## Image only Search

In [47]:
# Image search, we pass the 
image_query = "data/images/127.2.DFF8DD86A0648144.935C81483CAB9DD1.810122210719.jpeg"

results_image = api_client.search_image(image_query)

In [48]:
results_image


{'average_price': 94.5,
 'brand_distribution': {'Dockers': 50, 'Pazstor': 50},
 'category_distribution': {'Apparel & Accessories >Shoes ': 100},
 'elapsed_time_sec': 0.192,
 'price_range': [80.0, 109.0],
 'reasoning': 'Image search only, no LLM reordering performed',
 'results': [{'Brand': 'Pazstor',
   'Category': 'Apparel & Accessories >Shoes ',
   'Color': 'Barista brown',
   'Description': 'These oxfords are designed to fit ergonomically offering premium Comfort, Made of Premium soft lambskin Leather. Whole size only, please choose one number above if you usually wear half number (e.g. if your size is 7.5 then go up to 8) Made in Mexico',
   'Gender': 'male',
   'Name': "Pazstor Men's Premium Comfort Lambskin Leather Oxfords Classic - Barista brown",
   'Pid': '127.2.DFF8DD86A0648144.935C81483CAB9DD1.810122210719',
   'Price': '109.0',
   'Size': '12',
   'similarity': 1.0},
  {'Brand': 'Dockers',
   'Category': 'Apparel & Accessories >Shoes ',
   'Color': 'Dark Tan',
   'Descripti

The result has the same output as the Text search

## Hybrid Search without LLM reranking

## Making search without LLM reranking

- To make search without LLM reranking, all you have to do is comment out yor API keys in your .env file, in this case I will comment out the api keys in the `.env` file and rerun the command `python src/backend/api.py`

> If you get an error on the `python src/backend/api.py` command, just run `make clean` to free the port 

In [55]:

# Hybrid search, we pass both the image and text
from dotenv import load_dotenv

load_dotenv()

image_query = "data/images/127.2.DFF8DD86A0648144.935C81483CAB9DD1.810122210719.jpeg"
search_query = "black shoes"
results_image = api_client.search_multimodal(query=search_query, image_path=image_query)

results_image

{'average_price': 143.96,
 'brand_distribution': {'8 by YOOX': 6,
  'Bernardo Footwear': 6,
  'Clarks': 6,
  'Converse': 6,
  'Deer Stags': 6,
  'Dv Dolce Vita': 6,
  'Easy Spirit': 6,
  'I.n.c. International Concepts': 6,
  'Kenneth Cole Reaction': 6,
  'Kingsize': 6,
  'Naturalizer': 12,
  'Stacy Adams': 6,
  'Teva': 6,
  'Trotters': 6,
  'Vince Camuto': 6,
  'adidas': 6},
 'category_distribution': {'Apparel & Accessories >Shoes ': 100},
 'elapsed_time_sec': 1.417,
 'price_range': [49.99, 400.0],
 'reasoning': 'No API key available, no LLM reordering performed',
 'results': [{'Brand': 'Stacy Adams',
   'Category': 'Apparel & Accessories >Shoes ',
   'Color': 'Black',
   'Description': 'The rich hand burnishing on the Stacy Adams Bryant cap-toe Oxford highlights the intricate lines of broguing and quality of its leather. The sophisticated style and all-day comfort of the Bryant make it the perfect choice for any special occasion.',
   'Gender': 'male',
   'Name': "Stacy Adams Men's Br

In [56]:
results_image["reasoning"]

'No API key available, no LLM reordering performed'

In this case, since the API KEYS were removed from the `.env` file No LLM reranking was used in this.

## Changing the weights of the Hybrid Retieval

If we look at the line `311` and `314`and `308` of `src/backend/api.py` we can see how weight is assigned to the different types of retrieval, The weighting of different retrieval types is configured within the system. You can modify these weights to fine-tune the hybrid search performance.

To adjust the retrieval weights:

> Remeber that the summation of the weights has to add up to one (1).

### 1. Text only Search
The default in the "text-only" search is `[0.5, 0, 0.5]` This distribution allocates 0.5 to the fusion embeddings (representing a combined score), 0 to CLIP (as it's not a CLIP-only search), and 0.5 to TF-IDF.

To increase the emphasis on TF-IDF in a text-only search (e.g., to 0.9) while reducing the weight of fusion embeddings (e.g., to 0.1), navigate to the `config/db.py` and and modify the `SEARCH_WEIGHTS` dictionary as follows:

```python
SEARCH_WEIGHTS = {
    "text_only": [0.1, 0, 0.9],
    "image_only": [0, 1, 0],
    "hybrid": [0.5, 0, 0.5],

}
```

After making these adjustments in `config/db.py`, you must restart the backend API for the new weights to take effect. Stop the currently running backend process (e.g., by pressing Ctrl+C in your terminal) and then execute the launch command again`python src/backend/api.py`

### 1. Hybrid Search
This follows the same logic but in this case you will be modifying the `hybrid` key in the dictionary.i.e

```python
SEARCH_WEIGHTS = {
    "text_only": [0.5, 0, 0.5],
    "image_only": [0, 1, 0],
    "hybrid": [0.7, 0, 0.3],

}
```

## Text serch after a different weight distribution [0.1,0, 0.9]

In [51]:
text_query = "leather shoes"

results = api_client.search_text(text_query, search_type="text")

In [52]:
results

{'average_price': 348.87,
 'brand_distribution': {'8 by YOOX': 6,
  'BARRACUDA': 6,
  'Bella Vita': 6,
  'Clarks': 6,
  "DOUCAL'S": 6,
  'Dockers': 6,
  'HOGAN': 6,
  'Naturalizer': 18,
  'Pazstor': 6,
  'SAINT LAURENT': 6,
  'SANTONI': 6,
  'STELE': 6,
  'Soul Naturalizer': 6,
  "TOD'S": 6,
  'Vince Camuto': 6},
 'category_distribution': {'Apparel & Accessories >Shoes ': 100},
 'elapsed_time_sec': 2.524,
 'price_range': [80.0, 825.0],
 'reasoning': "The reordering is based on the following factors: \n  1. Direct keyword match: Results with 'leather' in the name are ranked higher.\n  2. Brand Name mentions: Results from well-known brands like Clarks, Vince Camuto, and Naturalizer are given priority.\n  3. Semantic similarity to the query intent: Results that closely match the search query 'leather shoes' are ranked higher.\n  4. Price comparison: Lower-priced items are given some weight in the ranking.",
 'results': [{'Brand': 'Clarks',
   'Category': 'Apparel & Accessories >Shoes ',
 

The result here looks different from the first seach even though it is the same query
