The purpose of this phase is to deploy Dynamic Alpha Tuning, which is a Novel framework that adaptively adjusts the retrieval weighting coefficient based on 
query-specific characteristics.

It contains a hybrid weighting parameter alpha, for each query that should reflect the relative effectiveness of sparse and dense methods.

We have done the LLM-Based retrieval effectiveness scoring for all 3 languages, but reference to be taken from the Italian Language, which has been integrated in the main solution

## Start of Phase 3 (Chinese)

In [9]:
import jieba
import numpy as np

In [64]:
from dotenv import load_dotenv
import os
load_dotenv(dotenv_path='../.env')
CHARRAN_API = os.getenv('CHARRAN_API')
CHERYL_API = os.getenv('CHERYL_API')

In [3]:
pip install dotenv

Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading dotenv-0.9.9-py2.py3-none-any.whl (1.9 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, dotenv
Successfully installed dotenv-0.9.9 python-dotenv-1.1.0
Note: you may need to restart the kernel to use updated packages.


In [5]:
#Read the Chinese Embeddings
import pandas as pd
chinese_embeddings = pd.read_pickle("en_to_cn_embeddings.pkl")
chinese_embeddings

Unnamed: 0,title,chinese translation,english_embedding,chinese_embedding
0,Oppo A75 A75S A73 Phone Case Soft Rabbit Silic...,OPPO A75 A75s A73 手机壳 软壳 挂绳壳 大眼兔硅胶壳,"[-0.030606616, 0.010501585, -0.04400219, -0.00...","[-0.020788355, 0.032136466, -0.03952156, -0.04..."
1,SOFT 99 Coating Car Wax Strong Water Watt,SOFT 99 鍍膜車蠟(強力撥水型),"[-0.02521394, -0.0062141055, -0.02523462, -0.0...","[-0.013580757, -0.013445883, 0.013568486, -0.0..."
2,Low Sugar Mango Dry 250g Be The Royal,低糖芒果乾 250g 臻御行,"[-0.06998538, 0.025515176, -0.006934945, -0.02...","[-0.056555215, 0.015317621, 0.0015813652, -0.0..."
3,* the culture Japan Imported Round Top Space C...,＊小徑文化＊日本進口ROUND TOP space craft - diamond (SC-...,"[-0.003708915, 0.024768945, -0.062792934, 0.02...","[-0.018781146, 0.033165023, -0.05913993, 0.019..."
4,Hello Kitty Sandals Shoes White/Red Children n...,Hello Kitty 凱蒂貓 KITTY 涼鞋 童鞋 白/紅色 小童 no739,"[-0.019042147, 0.031313036, -0.06666778, 0.049...","[-0.043943617, 0.021419879, -0.059569906, 0.03..."
...,...,...,...,...
995,Hippored Torn Fun Unique Style Straight Jeans ...,【HippoRed】撕破乐趣★独特风格★中直筒牛仔裤 O591_445,"[-0.015312562, 0.002696402, -0.046150953, 0.00...","[-0.04397009, -0.013235806, -0.034632586, 0.03..."
996,Kids Set Table Bay - Thin Long Sleeve Home Sui...,兒童套裝 台灣製薄長袖居家套裝 魔法Baby~k60092,"[-0.00460147, 0.029976973, -0.080628425, 0.003...","[0.00086109334, 0.012746421, -0.04744607, 0.00..."
997,LONGCHAMP Le Pliage Neo High Density Nylon Bac...,LONGCHAMP Le Pliage Neo高密尼龍後背包(中型),"[-0.025269749, -0.050276544, -0.059641942, -0....","[-0.0399163, -0.031578567, -0.04178574, 0.0254..."
998,IFairies Opening Adjustable Ring ifairies [564...,iFairies 開口可調節戒指★ifairies【56472】【56472】,"[0.018464142, 0.016518341, -0.034174442, 0.007...","[0.029656759, 0.03718795, -0.042785533, -0.027..."


In [6]:
#Building BM25

english_titles = chinese_embeddings['title']
chinese_titles = chinese_embeddings['chinese translation']

tokenized_en = [title.split() for title in english_titles]
tokenized_cn = [title.split() for title in chinese_titles]

from rank_bm25 import BM25Okapi

bm25_en = BM25Okapi(tokenized_en)
bm25_cn = BM25Okapi(tokenized_cn)


In [12]:
def BM25(query, top_k=5):

    #To detect the language from the query
    from langdetect import detect

    lang = detect(query)
    #tokenising the query to be suitable for BM25
    tokens = list(jieba.cut(query))

    # --- BM25 Retrieval ---(Routes the tokenized query to the appropiate BM25 engine)
    if lang == 'cn':
        bm25_scores = bm25_cn.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens) #returns a list of scores, one for each document in the dataset

    bm25_ranks = {i: rank for rank, i in enumerate(np.argsort(bm25_scores)[::-1], start=1)} #Converts the float BM25 scores to rankings for Reciprocal rank fusion(rrf)

    # Sort by RRF score
    ranked = sorted(bm25_ranks.items(), key=lambda x: x[1], reverse=True)  #Sorting the docs by their RRF score, with the highest first.

    return ranked[:top_k]  # list of (doc_id, final_score)


In [17]:
def BM25(query, top_k=5):
    # To detect the language from the query
    from langdetect import detect

    lang = detect(query)
    # Tokenizing the query to be suitable for BM25
    tokens = list(jieba.cut(query))

    # --- BM25 Retrieval ---
    if lang == 'cn':
        bm25_scores = bm25_cn.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # Get top-k document IDs based on BM25 scores
    top_k_ids = sorted(range(len(bm25_scores)), key=lambda i: bm25_scores[i], reverse=True)[:top_k]

    # Return the document IDs and their corresponding BM25 scores
    return [(doc_id, bm25_scores[doc_id]) for doc_id in top_k_ids]

In [21]:
results = BM25("men running shoe")

for doc_id, score in results:
    print(f"{score:.4f} | {chinese_embeddings['chinese translation'][doc_id]}")

5.8797 | ADIDAS 愛迪達 D ROSE ENGLEWOOD IV TD 籃球鞋 男 S85564
5.3842 | New Balance 247 運動鞋 跑鞋 黑色 中童 童鞋 KA247T2P no338
5.3842 | NIKE 耐吉 WMNS NIKE AIR ZOOM PEGASUS 34 高階慢跑鞋 女 880560001
5.1665 | native JEFFERSON BLOCK 洞洞鞋 白色 男女鞋 11100102-8559 no410
0.0000 | OPPO A75 A75s A73 手机壳 软壳 挂绳壳 大眼兔硅胶壳


In [13]:
results = BM25("女式皮夹克")

for doc_id, score in results:
    print(f"{score:.4f} | {chinese_embeddings['title'][doc_id]}")


1000.0000 | Oppo A75 A75S A73 Phone Case Soft Rabbit Silicone Case
999.0000 | Tree De Sc Multifunctional Locker - Scm3 - 3M6S
998.0000 | COGHLANS Canada 0044 Finger Compass Thermometer Whistle
997.0000 | [With Incense] Jujube Pack 5 Pc / Pack (2 Pack) 【
996.0000 | Japan gex schisandra Kittens Water Dispenser 900Ml GE2316


In [None]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=CHERYL_API)


In [74]:
#Initialising Pinecone index
index = pc.Index('entocn')


In [76]:
# Generate IDs
en_ids = [f"en-{i}" for i in range(len(chinese_embeddings['english_embedding']))]
cn_ids = [f"cn-{i}" for i in range(len(chinese_embeddings['chinese_embedding']))]

In [77]:
# Helper function to batch upsert
def batch_upsert(index, vectors, batch_size=50):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i+batch_size]
        index.upsert(vectors=batch)


In [79]:
# Combine English + Chinese as before
to_upsert = list(zip(en_ids, chinese_embeddings['english_embedding'])) + \
            list(zip(cn_ids, chinese_embeddings['chinese_embedding']))

# Run batch upload
batch_upsert(index, to_upsert, batch_size=50)


#With this code, the vectorDB has been established for en to cn.   

Start of New Parts of the Code

In [15]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-m3")

  return self.fget.__get__(instance, owner)()


In [None]:
from openai import OpenAI

# Set your OpenAI API key
client = OpenAI(api_key="deepseek_API_KEY", base_url="https://openrouter.ai/api/v1")

def get_dynamic_alpha(question, dense_result, bm25_result):
    prompt = f"""You are an multilingual evaluator assessing the retrieval effectiveness of dense
retrieval (Cosine Distance) and BM25 retrieval for finding the correct product title of the corresponding target language, which is Chinese.

## Task:
Given a question and two top1 search results (one from dense retrieval,
one from BM25 retrieval), score each retrieval method from **0 to 5** based on whether the correct answer is likely to appear in top2, top3, etc.

### **Scoring Criteria:**
1. **Direct hit --> 5 points**
- If the retrieved document directly answers the question, assign **5 points**.
2. **Good wrong result (High likelihood correct answer is nearby) --> 3-4 points**
3. **Bad wrong result (Low likelihood correct answer is nearby) --> 1-2 points**
4. **Completely off-track --> 0 points**

### **Given Data:**
- **Question:** "{question}"

- **dense retrieval Top1 Result:** "{dense_result}"
- **BM25 retrieval Top1 Result:** "{bm25_result}"

### **Output Format:**
Return two integers separated by a space:
- **First number:** dense retrieval score.
- **Second number:** BM25 retrieval score.
"""

    response = client.chat.completions.create(
        model="deepseek/deepseek-chat-v3-0324:free",  
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    output = response.choices[0].message.content.strip()

    try:
        dense_score, bm25_score = map(int, output.split())
    except:
        dense_score = bm25_score = 3  # default fallback

    if dense_score == 5 and bm25_score != 5:
        return 1.0
    elif bm25_score == 5 and dense_score != 5:
        return 0.0
    elif dense_score == 0 and bm25_score == 0:
        return 0.5
    else:
        return dense_score / (dense_score + bm25_score)


In [None]:
# Step 2: Main hybrid retrieval with dynamic alpha (OG code)
def hybrid_search_dat(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # --- BM25 Search ---
    if lang == 'cn':
        bm25_scores = bm25_cn.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # --- Semantic Search (Pinecone) ---
    query_vec = model.encode(query).tolist()
    pinecone_results = index.query(vector=query_vec, top_k=top_k, include_metadata=False)

    # Parse Pinecone results
    pinecone_ids = [int(match['id'].split('-')[1]) for match in pinecone_results['matches']]
    pinecone_scores = [match['score'] for match in pinecone_results['matches']]

    # Get top-1 text from both for alpha calculation
    bm25_top_idx = int(np.argmax(bm25_scores))
    dense_top_idx = pinecone_ids[0]
    bm25_text = chinese_embeddings['title'][bm25_top_idx]
    dense_text = chinese_embeddings['title'][dense_top_idx]

    # --- Get dynamic alpha from GPT ---
    start = time.time()
    alpha = get_dynamic_alpha(query, dense_text, bm25_text)
    print(f"Alpha fetched: {alpha} in {time.time() - start:.2f}s")

    # --- Normalize Scores ---
    scaler = MinMaxScaler()
    bm25_norm = scaler.fit_transform(np.array(bm25_scores).reshape(-1, 1)).flatten()
    pinecone_norm = scaler.fit_transform(np.array(pinecone_scores).reshape(-1, 1)).flatten()

    # --- Combine scores using dynamic alpha ---
    hybrid_results = []
    for idx, semantic_score in zip(pinecone_ids, pinecone_norm):
        final_score = alpha * semantic_score + (1 - alpha) * bm25_norm[idx]
        hybrid_results.append((idx, final_score))

    # Sort by hybrid score
    hybrid_results.sort(key=lambda x: x[1], reverse=True)

    # --- Prepare detailed results ---
    detailed_results = []
    for idx, hybrid_score in hybrid_results[:top_k]:
        bm25_score = round(bm25_norm[idx], 4)
        semantic_score = round(pinecone_norm[pinecone_ids.index(idx)], 4)
        detailed_results.append((idx, round(hybrid_score, 4), bm25_score, semantic_score))

    return detailed_results


In [None]:
results = hybrid_search_dat("I want a blue shirt")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {chinese_embeddings['title'][idx]}")


Alpha fetched: 0.5 in 11.90s
0.5000 | 0.0000 | 1.0000 | Taste Sweet Soft Satin Sleep Shirt Blue - First Love Vibrato 【 Ab02355 - 1 】
0.4695 | 0.0000 | 0.9391 | Polarstar Windproof Warm Jacket Blue Green
0.1271 | 0.0000 | 0.9391 | Polarstar Windproof Warm Jacket Blue Green
0.1233 | 0.0000 | 0.2466 | KIKIKO Action Professional Sport Swimming Goggles (Blue)
0.0000 | 0.0000 | 0.0000 | Panties Cotton Cartoon Triangle Panties 5 Pcs Set - Blue Car 【 A257 】


In [46]:
results = hybrid_search_dat("Woman leather jacket")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {df_embeddings['chinese translation'][idx]}")

Alpha fetched: 0.5 in 6.99s
0.5000 | 0.0000 | 1.0000 | 🎀真皮女用小手提/斜背二用包🎀
0.0966 | 0.0000 | 0.1932 | 正韓 素面質感羽絨棉外套 艾爾莎【TA570100】
0.0720 | 0.0000 | 0.1441 | Montane 女 輕量防潑水羽絨 連帽外套 孔雀藍 FFEDJ
0.0364 | 0.0000 | 1.0000 | 🎀真皮女用小手提/斜背二用包🎀
0.0000 | 0.0000 | 0.0000 | 乐福鞋女鞋春季单鞋女平底英伦风百搭小皮鞋女


## Phase 3: Using LLMs to improve search query (Spanish)

In [11]:
import numpy as np
from langdetect import detect
from sklearn.preprocessing import MinMaxScaler
from rank_bm25 import BM25Okapi
import pandas as pd
import time

In [12]:
df_embeddings = pd.read_pickle("en_to_sp_embeddings.pkl")

In [13]:
df_embeddings

Unnamed: 0,title,title_spanish,english_embedding,spanish_embedding
0,brother genuine high yield toner cartridge tn4...,hermano genuino cartucho tóner de alto rendimi...,"[-0.03431117, 0.025899883, -0.00967014, -0.019...","[0.012239528, 0.02652684, 0.002397126, -0.0288..."
1,fitbit inspire 3 health and fitness tracker wi...,fitbit inspirar 3 seguimiento de salud y fitne...,"[-0.0016011602, -0.002595037, -0.07348455, 0.0...","[-0.011861571, -0.009732766, -0.06545575, -0.0..."
2,mikes hot honey americas 1 brand of hot honey ...,mikes miel caliente américas 1 marca de miel c...,"[-0.0004525112, -0.009976895, -0.015700651, 0....","[-0.031901788, 0.017521167, -0.04371976, 0.039..."
3,krema kréma red fruits 100 recyclable 240g,krema kréma frutos rojos 100 reciclables 240g,"[-0.011189645, 0.033041686, -0.005376764, -0.0...","[-0.013215443, 0.0015486346, -0.020853952, -0...."
4,drsalts calming therapy epsom salts soothing ...,drsalts calmante terapia epsom sales calmantes...,"[0.018024862, -0.015684763, -0.062142983, -0.0...","[0.008137982, 0.009916707, -0.07349886, -0.013..."
...,...,...,...,...
991,ruimen smart watches for men women answermake ...,ruimen relojes inteligentes para hombres mujer...,"[-0.022698322, 0.004262252, -0.06492456, -0.01...","[-0.015060791, 0.010321501, -0.057668064, -0.0..."
992,musicozy sleep headphones bluetooth 54 headban...,auriculares musicozy sueño bluetooth 54 diadem...,"[-0.0110038, 0.028441783, -0.065515295, 0.0328...","[0.006726083, 0.042338137, -0.0548927, 0.00642..."
993,sun ninja pop up beach tent sun shelter upf50 ...,sun ninja pop up playa refugio de sol upf50 co...,"[-0.018024122, -0.008911157, -0.09137453, 0.00...","[-0.0045234896, 0.003032705, -0.079418756, 0.0..."
994,rhino usa trailer hitch pin 2 inch patented 58...,enganche de remolque de rinoceronte usa pin de...,"[-0.011390688, -0.004701349, -0.009233302, 0.0...","[0.023057196, 0.013233271, 0.0004464224, 0.017..."


In [14]:
#Building BM25

english_titles = df_embeddings['title']
spanish_titles = df_embeddings['title_spanish']

tokenized_en = [title.split() for title in english_titles]
tokenized_es = [title.split() for title in spanish_titles]



bm25_en = BM25Okapi(tokenized_en)
bm25_es = BM25Okapi(tokenized_es)

In [None]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=CHERYL_API)

In [16]:
#Initialising Pinecone index
index = pc.Index('entoes')

In [17]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-m3")

In [None]:
from openai import OpenAI

# Set your OpenAI API key
client = OpenAI(api_key="deepseek_API_KEY", base_url="https://openrouter.ai/api/v1")

def get_dynamic_alpha(question, dense_result, bm25_result):
    prompt = f"""You are an multilingual evaluator assessing the retrieval effectiveness of dense
retrieval (Cosine Distance) and BM25 retrieval for finding the correct product title of the corresponding target language, which is italian.

## Task:
Given a question and two top1 search results (one from dense retrieval,
one from BM25 retrieval), score each retrieval method from **0 to 5** based on whether the correct answer is likely to appear in top2, top3, etc.

### **Scoring Criteria:**
1. **Direct hit --> 5 points**
- If the retrieved document directly answers the question, assign **5 points**.
2. **Good wrong result (High likelihood correct answer is nearby) --> 3-4 points**
3. **Bad wrong result (Low likelihood correct answer is nearby) --> 1-2 points**
4. **Completely off-track --> 0 points**

### **Given Data:**
- **Question:** "{question}"

- **dense retrieval Top1 Result:** "{dense_result}"
- **BM25 retrieval Top1 Result:** "{bm25_result}"

### **Output Format:**
Return two integers separated by a space:
- **First number:** dense retrieval score.
- **Second number:** BM25 retrieval score.
"""

    response = client.chat.completions.create(
        model="deepseek/deepseek-chat-v3-0324:free",  
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    output = response.choices[0].message.content.strip()

    try:
        dense_score, bm25_score = map(int, output.split())
    except:
        dense_score = bm25_score = 3  # default fallback

    if dense_score == 5 and bm25_score != 5:
        return 1.0
    elif bm25_score == 5 and dense_score != 5:
        return 0.0
    elif dense_score == 0 and bm25_score == 0:
        return 0.5
    else:
        return dense_score / (dense_score + bm25_score)


In [20]:
# Step 2: Main hybrid retrieval with dynamic alpha
def hybrid_search_dat(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # --- BM25 Search ---
    if lang == 'es':
        bm25_scores = bm25_es.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # --- Semantic Search (Pinecone) ---
    query_vec = model.encode(query).tolist()
    pinecone_results = index.query(vector=query_vec, top_k=top_k, include_metadata=False)

    # Parse Pinecone results
    pinecone_ids = [int(match['id'].split('-')[1]) for match in pinecone_results['matches']]
    pinecone_scores = [match['score'] for match in pinecone_results['matches']]

    # Get top-1 text from both for alpha calculation
    bm25_top_idx = int(np.argmax(bm25_scores))
    dense_top_idx = pinecone_ids[0]
    bm25_text = df_embeddings['title_spanish'][bm25_top_idx]
    dense_text = df_embeddings['title_spanish'][dense_top_idx]

    # --- Get dynamic alpha from GPT ---
    start = time.time()
    alpha = get_dynamic_alpha(query, dense_text, bm25_text)
    print(f"Alpha fetched: {alpha} in {time.time() - start:.2f}s")

    # --- Normalize Scores ---
    scaler = MinMaxScaler()
    bm25_norm = scaler.fit_transform(np.array(bm25_scores).reshape(-1, 1)).flatten()
    pinecone_norm = scaler.fit_transform(np.array(pinecone_scores).reshape(-1, 1)).flatten()

    # --- Combine scores using dynamic alpha ---
    hybrid_results = []
    for idx, semantic_score in zip(pinecone_ids, pinecone_norm):
        final_score = alpha * semantic_score + (1 - alpha) * bm25_norm[idx]
        hybrid_results.append((idx, final_score))

    # Sort by hybrid score
    hybrid_results.sort(key=lambda x: x[1], reverse=True)

    # --- Prepare detailed results ---
    detailed_results = []
    for idx, hybrid_score in hybrid_results[:top_k]:
        bm25_score = round(bm25_norm[idx], 4)
        semantic_score = round(pinecone_norm[pinecone_ids.index(idx)], 4)
        detailed_results.append((idx, round(hybrid_score, 4), bm25_score, semantic_score))

    return detailed_results

In [None]:
# Step 2: Main hybrid retrieval with dynamic alpha (OG code)
def hybrid_search_dat(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # --- BM25 Search ---
    if lang == 'es':
        bm25_scores = bm25_es.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # --- Semantic Search (Pinecone) ---
    query_vec = model.encode(query).tolist()
    pinecone_results = index.query(vector=query_vec, top_k=top_k, include_metadata=False)

    # Parse Pinecone results
    pinecone_ids = [int(match['id'].split('-')[1]) for match in pinecone_results['matches']]
    pinecone_scores = [match['score'] for match in pinecone_results['matches']]

    # Get top-1 text from both for alpha calculation
    bm25_top_idx = int(np.argmax(bm25_scores))
    dense_top_idx = pinecone_ids[0]
    bm25_text = df_embeddings['title'][bm25_top_idx]
    dense_text = df_embeddings['title'][dense_top_idx]

    # --- Get dynamic alpha from GPT ---
    start = time.time()
    alpha = get_dynamic_alpha(query, dense_text, bm25_text)
    print(f"Alpha fetched: {alpha} in {time.time() - start:.2f}s")

    # --- Normalize Scores ---
    scaler = MinMaxScaler()
    bm25_norm = scaler.fit_transform(np.array(bm25_scores).reshape(-1, 1)).flatten()
    pinecone_norm = scaler.fit_transform(np.array(pinecone_scores).reshape(-1, 1)).flatten()

    # --- Combine scores using dynamic alpha ---
    hybrid_results = []
    for idx, semantic_score in zip(pinecone_ids, pinecone_norm):
        final_score = alpha * semantic_score + (1 - alpha) * bm25_norm[idx]
        hybrid_results.append((idx, final_score))

    # Sort by hybrid score
    hybrid_results.sort(key=lambda x: x[1], reverse=True)

    # --- Prepare detailed results ---
    detailed_results = []
    for idx, hybrid_score in hybrid_results[:top_k]:
        bm25_score = round(bm25_norm[idx], 4)
        semantic_score = round(pinecone_norm[pinecone_ids.index(idx)], 4)
        detailed_results.append((idx, round(hybrid_score, 4), bm25_score, semantic_score))

    return detailed_results


In [21]:
results = hybrid_search_dat("I want a blue shirt")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {df_embeddings['title'][idx]}")


Alpha fetched: 0.5 in 6.93s
0.6194 | 0.8298 | 0.4089 | rugshop dublin contemporary abstract stain resistant soft runner rug 2 x 7 blue
0.5000 | 0.0000 | 1.0000 | lindor shaped ball bag dark
0.4883 | 0.9456 | 0.0309 | simplicity creative patterns sleeves for tops vest jackets coats a 10121416182022
0.1717 | 0.0000 | 1.0000 | lindor shaped ball bag dark
0.0000 | 0.0000 | 0.0000 | tiblue insulated lunch bag  leakproof freezable cooler for office school picnics  adjustable strap for adults  kids


In [36]:
results = hybrid_search_dat("Woman leather jacket")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {df_embeddings['title_spanish'][idx]}")

Alpha fetched: 0.5 in 5.53s
1.0000 | 1.0000 | 1.0000 | megagear slr dslr sierra serie auténtica cámara de cuero hombro o correa de cuello negro
0.1814 | 0.0000 | 0.3628 | mochila de almuerzo de artelaris para las mujeres mochila aislante elegante para las mujeres mochila de viaje impermeable mochila de almuerzo de las mujeres mochila más fría mochila de la lonchera mochila para profesor enfermera trabajo picnic book bag
0.1665 | 0.0000 | 0.3330 | znsayotx 1 pieza sillón jacquard slipcovers para salón sillón alto cubre con brazos anti slip mascotas agradable sofá sofá sillón cubierta muebles protector silla gris claro
0.1396 | 0.0000 | 0.2793 | ajuste bolsa de almuerzo fresco para mujeres aisladas bolsa de almuerzo para el trabajo resistente a las manchas caja de almuerzo grande para las mujeres con contenedores vaso de hielo cierre cremallera bolsa de wichita hojas de palma
0.0000 | 0.0000 | 0.0000 | simpleza patrones creativos mangas para tops chaleco chaquetas abrigos un 101214161820

In [34]:
results = hybrid_search_dat("Woman leather jacket")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {df_embeddings['title_spanish'][idx]}")

Alpha fetched: 0.5 in 15.83s
1.0000 | 1.0000 | 1.0000 | megagear slr dslr sierra serie auténtica cámara de cuero hombro o correa de cuello negro
0.1814 | 0.0000 | 0.3628 | mochila de almuerzo de artelaris para las mujeres mochila aislante elegante para las mujeres mochila de viaje impermeable mochila de almuerzo de las mujeres mochila más fría mochila de la lonchera mochila para profesor enfermera trabajo picnic book bag
0.1665 | 0.0000 | 0.3330 | znsayotx 1 pieza sillón jacquard slipcovers para salón sillón alto cubre con brazos anti slip mascotas agradable sofá sofá sillón cubierta muebles protector silla gris claro
0.1396 | 0.0000 | 0.2793 | ajuste bolsa de almuerzo fresco para mujeres aisladas bolsa de almuerzo para el trabajo resistente a las manchas caja de almuerzo grande para las mujeres con contenedores vaso de hielo cierre cremallera bolsa de wichita hojas de palma
0.0000 | 0.0000 | 0.0000 | simpleza patrones creativos mangas para tops chaleco chaquetas abrigos un 10121416182

In [33]:
print(results)

[(180, 0.9242, 0.8485, 1.0), (985, 0.9242, 0.8485, 1.0), (174, 0.5, 1.0, 0.0), (407, 0.2279, 0.0, 0.4559), (407, 0.2279, 0.0, 0.4559)]


Next step is to evaluate the top most result with the input query uusing evaluation metrics

In [30]:
pip install comet-ml sacrebleu


Collecting comet-ml
  Downloading comet_ml-3.49.7-py3-none-any.whl.metadata (4.1 kB)
Collecting dulwich!=0.20.33,>=0.20.6 (from comet-ml)
  Downloading dulwich-0.22.8-cp311-cp311-win_amd64.whl.metadata (5.1 kB)
Collecting everett<3.2.0,>=1.0.1 (from everett[ini]<3.2.0,>=1.0.1->comet-ml)
  Downloading everett-3.1.0-py2.py3-none-any.whl.metadata (17 kB)
Collecting python-box<7.0.0 (from comet-ml)
  Downloading python_box-6.1.0-cp311-cp311-win_amd64.whl.metadata (7.8 kB)
Collecting requests-toolbelt>=0.8.0 (from comet-ml)
  Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)
Collecting rich>=13.3.2 (from comet-ml)
  Downloading rich-14.0.0-py3-none-any.whl.metadata (18 kB)
Collecting semantic-version>=2.8.0 (from comet-ml)
  Downloading semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting simplejson (from comet-ml)
  Downloading simplejson-3.20.1-cp311-cp311-win_amd64.whl.metadata (3.4 kB)
Collecting wurlitzer>=1.0.2 (from comet-ml)
  Downloading

In [38]:
pip install unbabel-comet

Collecting unbabel-comet
  Downloading unbabel_comet-2.2.5-py3-none-any.whl.metadata (19 kB)
Collecting entmax<2.0,>=1.1 (from unbabel-comet)
  Downloading entmax-1.3-py3-none-any.whl.metadata (348 bytes)
Collecting jsonargparse==3.13.1 (from unbabel-comet)
  Downloading jsonargparse-3.13.1-py3-none-any.whl.metadata (55 kB)
Collecting protobuf<5.0.0,>=4.24.4 (from unbabel-comet)
  Downloading protobuf-4.25.6-cp310-abi3-win_amd64.whl.metadata (541 bytes)
Collecting pytorch-lightning<3.0.0,>=2.0.0 (from unbabel-comet)
  Downloading pytorch_lightning-2.5.1-py3-none-any.whl.metadata (20 kB)
Collecting torchmetrics<0.11.0,>=0.10.2 (from unbabel-comet)
  Downloading torchmetrics-0.10.3-py3-none-any.whl.metadata (15 kB)
Collecting torch>=1.6.0 (from unbabel-comet)
  Downloading torch-2.6.0-cp311-cp311-win_amd64.whl.metadata (28 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch-lightning<3.0.0,>=2.0.0->unbabel-comet)
  Downloading lightning_utilities-0.14.3-py3-none-any.whl.metadata (5.

  You can safely remove it manually.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.6.0 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.6.0 which is incompatible.
trankit 1.1.2 requires torch<=2.0.1,>=1.6.0, but you have torch 2.6.0 which is incompatible.


In [37]:
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize

top_idx = results[0][0]  # index of top-1 result

reference_query = "Woman leather jacket"
predicted_title = df_embeddings['title_spanish'][top_idx]

# Tokenize both properly
reference_tokens = word_tokenize(reference_query)
predicted_tokens = word_tokenize(predicted_title)

# Pass the reference as a *list of references* (each is token list)
meteor = meteor_score([reference_tokens], predicted_tokens)

print(f"METEOR Score: {meteor:.4f}")


METEOR Score: 0.0000


In [38]:
from bert_score import score

# Inputs
reference_query = "Woman leather jacket"
predicted_title = df_embeddings['title_spanish'][top_idx]

# BERTScore evaluation
P, R, F1 = score([predicted_title], [reference_query], lang="multilingual", verbose=False)

print(f"BERTScore-F1: {F1[0]:.4f}")


BERTScore-F1: 0.6420


## Phase 3: Using LLMs to improve search query (Italian)

In [36]:
#Basic Imports
import numpy as np
import pandas as pd
import time
from sklearn.preprocessing import MinMaxScaler
#TFIDF and BM25
from langdetect import detect
from rank_bm25 import BM25Okapi
#env
from dotenv import load_dotenv
import os
#pinecone
from pinecone import Pinecone
#sentence transformer model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-m3")

In [None]:
#Calling the APIs
load_dotenv(dotenv_path='../.env')
CHARRAN_API = os.getenv('CHARRAN_API')
CHERYL_API = os.getenv('CHERYL_API')
deepseek_API_KEY = os.getenv('deepseek_API_KEY')

In [24]:
#Reading the Italian Embeddings
italian_embeddings = pd.read_pickle("en_to_it_embeddings.pkl")
italian_embeddings

Unnamed: 0,title,title_italian,english_embedding,italian_embedding
0,zwilling pro 2pc prep knife set,Zwilling pro 2pc set coltello prep,"[-0.07585622, -0.006632321, -0.039237764, 0.04...","[-0.058768444, 0.012960452, -0.029929288, 0.05..."
1,womens slim fit drape wrap tshirt a new day,donne slim fit drappeggio avvolgere tshirt un ...,"[-0.023722176, -0.02756558, -0.07540757, 0.011...","[-0.056372743, -0.038858823, -0.07786548, 0.00..."
2,mens teenage mutant ninja turtles group shot l...,mens adolescente mutante ninja tartarughe grup...,"[-0.02781372, 0.004972987, -0.055929173, 0.013...","[-0.004044311, 0.008419336, -0.05591273, 0.015..."
3,mens wwe triple h the game logo tshirt,mens wwe triplo h il gioco logo tshirt,"[-0.037347108, -0.009183998, -0.082188, 0.0122...","[-0.03912319, -0.015832098, -0.07382396, 0.008..."
4,purina fancy feast grilled gravy delights feas...,purina fantasia festa grigliato sugo delizie f...,"[-0.0551254, 0.024768988, -0.02036258, -0.0108...","[-0.0431343, 0.017949222, -0.023515861, -0.024..."
...,...,...,...,...
868,multi collagen protein powder types i ii ii b...,proteine multi collageno in polvere ii ii ii o...,"[-0.007071648, 0.013024846, -0.026673753, -0.0...","[0.036605842, 0.03628495, -0.027457794, -0.008..."
869,hope henry mens waffle knit pullover sweater,speranza henry uomo waffle maglia pullover mag...,"[-0.026264952, -0.008562797, -0.05641582, -0.0...","[-0.0020557789, -0.0039152885, -0.04709097, -0..."
870,noritake colortrio 16piece coupe dinnerware set,noritake colortio 16 pezzi coupé set per la cena,"[-0.0055267178, -0.03823672, -0.024558328, 0.0...","[0.0011833841, -0.014286156, -0.01681679, 0.03..."
871,hope henry mens fine gauge vneck pullover swe...,speranza henry mens maglione pullover fine gau...,"[-0.03244666, -0.026627203, -0.07740165, 0.004...","[-0.017570777, -0.00990813, -0.071718924, 0.01..."


Step 1: Obtaining the respective BM25 (Sparse) and Dense (Pinecone) Scores

In [43]:
#Building BM25
english_titles = italian_embeddings['title']
italian_titles = italian_embeddings['title_italian']

tokenized_en = [title.split() for title in english_titles]
tokenized_it = [title.split() for title in italian_titles]

bm25_en = BM25Okapi(tokenized_en)
bm25_it = BM25Okapi(tokenized_it)

In [44]:
#BM25 Function for Italian
def BM25(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # Score retrieval
    if lang == 'it':
        bm25_scores = bm25_it.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # Get top-k doc IDs based on raw BM25 scores
    top_k_ids = sorted(range(len(bm25_scores)), key=lambda i: bm25_scores[i], reverse=True)[:top_k]
    
    return [(i, bm25_scores[i]) for i in top_k_ids] #Returns a list of (document ID, score) pairs.

In [105]:
#Testing phase for BM25 alone for Italian. This will not be included in the .py model
results = BM25("giacca da donna")

print("BM25 Score for Italian to English\n")
for idx, score in results:
    print(f"{score:.4f} | {italian_embeddings['title'][idx]}")
print("\n")


#Testing phase for BM25 alone for English. This 
results = BM25("women jacket")

print("BM25 Score for English to Italian\n")
for idx, score in results:
    print(f"{score:.4f} | {italian_embeddings['title_italian'][idx]}")

BM25 Score for Italian to English

7.1452 | womens highrise straight jeans  universal thread
6.4611 | womens linen short sleeve buttondown camp shirt  a new day
6.4048 | hanes comfort fit scrubs womens scrub pants
6.4048 | timberland womens dunstan short sleeve tshirt
6.0893 | womens fitted short sleeve tshirt  universal thread


BM25 Score for English to Italian

5.4298 | ragazze puffer giacca tutto in movimento
5.4298 | ragazze solido puffer giacca classe d'arte
5.1272 | Bambini39 giacca gonfiabile solida tutto in movimento8482
5.1272 | ragazze39 solido giacca trapuntato gatto 38 jack8482
5.1272 | wink pro donne snap giacca di riscaldamento anteriore


From the above the code, we can see that the BM25 is working well for both English and Italian with the scores being relatively high for the top 5 results.

The below 4 cells display the upsertting of Italian embeddings generated using the BAAI BGE-M3 into the pinecone vector database

In [None]:
#Initialising Pinecone index
pc = Pinecone(api_key=CHARRAN_API)
index = pc.Index('italian-db')

In [None]:
# Enhanced batch upsert function with alignment assertion
from tqdm import tqdm
from langdetect import detect

def batch_upsert(index, vectors, expected_total, batch_size=50):
    """
    Upsert vectors to a Pinecone index in batches and assert alignment after completion.

    Parameters:
        index: Pinecone Index object
        vectors (list): List of dicts in format {"id": ..., "values": [...]}
        expected_total (int): Expected total number of vectors in the index after upload
        batch_size (int): Number of vectors per batch (default 50)

    Raises:
        AssertionError if the final count in Pinecone does not match expected_total
    """
    print(f"Uploading {len(vectors)} vectors in batches of {batch_size}...")
    for i in tqdm(range(0, len(vectors), batch_size)):
        # Upsert a batch of vectors to the Pinecone index
        batch = vectors[i:i+batch_size]
        index.upsert(vectors=batch)
    #verify the alignment of the index with the expected total
    # This is done by checking the total number of vectors in the index after upload.
    print("Verifying index alignment...")
    stats = index.describe_index_stats()
    total_vectors = stats.get('total_vector_count', -1)

    # Assert that the total number of vectors in Pinecone matches the expected total, which is the sum of English and Italian embeddings obtained from the dataset.
    assert total_vectors == expected_total, (
        f"Vector count mismatch: expected {expected_total}, found {total_vectors}")
    print("✅ Pinecone index is aligned with BM25 corpus.")


# Generate aligned IDs for English and Italian embeddings
en_ids = [f"en-{i}" for i in range(len(italian_embeddings['english_embedding']))]
it_ids = [f"it-{i}" for i in range(len(italian_embeddings['italian_embedding']))]

# Combine vectors for English and Italian embeddings where embeddings are aligned. ZIP function
combined_vectors = (
    list(zip(en_ids, italian_embeddings['english_embedding'])) +
    list(zip(it_ids, italian_embeddings['italian_embedding']))
)

# Convert to Pinecone format
to_upsert = [{"id": id, "values": vector} for id, vector in combined_vectors]

# Total expected = English + Italian
expected_total = len(en_ids) + len(it_ids)

# Upload and validate
batch_upsert(index, to_upsert, expected_total, batch_size=50)





Uploading 1746 vectors in batches of 50...


100%|██████████| 35/35 [01:39<00:00,  2.85s/it]


Verifying index alignment...
✅ Pinecone index is aligned with BM25 corpus.


In [None]:
# ---- Alignment Verification Utility ----
def test_alignment(index, bm25_en, query, top_k=5):
    lang = detect(query)
    bm25 = bm25_en if lang == "en" else bm25_it

    # Run Pinecone semantic search
    query_vec = model.encode(query).tolist()
    results = index.query(vector=query_vec, top_k=top_k, include_metadata=False)

    for match in results['matches']:
        id_str = match['id']
        idx = int(id_str.split('-')[1])
        prefix = id_str.split('-')[0]

        # Check if BM25 score is accessible
        if prefix == "en":
            try:
                score = bm25.get_scores(query.lower().split())[idx]
                print(f"{id_str} → BM25 score: {score:.4f}")
                print("BM25 title:", italian_embeddings['title'][idx])
            except IndexError:
                print(f"{id_str} is out of range for BM25")
        else:
            print(f"Skipping {id_str} (non-English match)")

In [None]:
# Test the alignment function
test_alignment(index, bm25_en, query="women jacket", top_k=5)


en-360 → BM25 score: 0.0000
BM25 title: girls puffer jacket  all in motion
en-698 → BM25 score: 0.0000
BM25 title: wink pro womens snap front warmup jacket
en-383 → BM25 score: 0.0000
BM25 title: girls39 solid quilted jacket  cat 38 jack8482
en-690 → BM25 score: 0.0000
BM25 title: girls solid puffer jacket  art class
en-59 → BM25 score: 0.0000
BM25 title: jockey generation womens organic cotton stretch cropped tshirt


Step 2: Define the prompt function for dynamic alpha calculation

In [None]:
# Step 2 Improved prompt for dynamic alpha calculation
from openai import OpenAI

# Set your OpenAI API key
client = OpenAI(api_key=deepseek_API_KEY, base_url="https://openrouter.ai/api/v1")

def get_dynamic_alpha(question, dense_result, bm25_result):
    system_prompt = """You are a multilingual evaluator in an Italian e-commerce site assessing the retrieval effectiveness of dense
retrieval (Cosine Distance) and BM25 retrieval for finding the correct Italian product title given an English-language query.

## Task:
Given a query and two top-1 search results (one from dense retrieval, one from BM25 retrieval), score each method from **0 to 5** based on how likely the correct result is retrieved or nearby.

### Scoring Criteria:
1. **Direct hit → 5 points**
   - If the retrieved result directly answers the question.
2. **Good wrong result → 3-4 points**
   - Answer is not exact, but closely related; likely the correct one is nearby.
3. **Bad wrong result → 1-2 points**
   - Loosely related or general, unlikely correct answer is nearby.
4. **Completely off-track → 0 points**
   - Retrieval is unrelated.

### Output Format:
Return two integers separated by a space:
- First number: dense retrieval score.
- Second number: BM25 retrieval score.
"""

    user_prompt = f"""### Given Data:
- Question: "{question}"
- dense retrieval Top1 Result: "{dense_result}"
- BM25 retrieval Top1 Result: "{bm25_result}"
"""

    response = client.chat.completions.create(
        model="deepseek/deepseek-chat-v3-0324:free",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )
    # response.choices[0] means the first choice of the model's response .message.content is the actual content of the response.
    # .strip() removes any leading or trailing whitespace from the response.
    output = response.choices[0].message.content.strip()

    try:
        #The mapping function below does is that it splits the output string into two parts based on whitespace and converts them to integers and assigns them to dense_score and bm25_score respectively.
        dense_score, bm25_score = map(int, output.split())
    except:
        dense_score = bm25_score = 3  # fallback if parsing fails

    # The following conditions are used to determine the final score based on the dense and BM25 scores.
    if dense_score == 5 and bm25_score != 5:
        return 1.0
    elif bm25_score == 5 and dense_score != 5:
        return 0.0
    elif dense_score == 0 and bm25_score == 0:
        return 0.5
    else:
        return dense_score / (dense_score + bm25_score) # This is returned to the hybrid_search function to be used as the alpha value for the hybrid search.


In [None]:
#Old Step 1
from openai import OpenAI

# Set your OpenAI API key
client = OpenAI(api_key="deepseek_API_KEY", base_url="https://openrouter.ai/api/v1")

def get_dynamic_alpha(question, dense_result, bm25_result):
    prompt = f"""You are a multilingual evaluator in a italian ecommerce site assessing the retrieval effectiveness of dense
retrieval (Cosine Distance) and BM25 retrieval for finding the correct italian product title with respect to the english query. 

## Task:
Given a question and two top1 search results (one from dense retrieval,
one from BM25 retrieval), score each retrieval method from **0 to 5** based on whether the correct answer is likely to appear in top2, top3, etc.

### **Scoring Criteria:**
1. **Direct hit --> 5 points**
- If the retrieved document directly answers the question, assign **5 points**.
2. **Good wrong result (High likelihood correct answer is nearby) --> 3-4 points**
3. **Bad wrong result (Low likelihood correct answer is nearby) --> 1-2 points**
4. **Completely off-track --> 0 points**

### **Given Data:**
- **Question:** "{question}"

- **dense retrieval Top1 Result:** "{dense_result}"
- **BM25 retrieval Top1 Result:** "{bm25_result}"

### **Output Format:**
Return two integers separated by a space:
- **First number:** dense retrieval score.
- **Second number:** BM25 retrieval score.
"""

    response = client.chat.completions.create(
        model="deepseek/deepseek-chat-v3-0324:free",  
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )

    output = response.choices[0].message.content.strip()

    try:
        dense_score, bm25_score = map(int, output.split())
    except:
        dense_score = bm25_score = 3  # default fallback

    if dense_score == 5 and bm25_score != 5:
        return 1.0
    elif bm25_score == 5 and dense_score != 5:
        return 0.0
    elif dense_score == 0 and bm25_score == 0:
        return 0.5
    else:
        return dense_score / (dense_score + bm25_score) 


Step 3: Define the new hybrid search function by taking only the top 1 or 2 result from each type of search retrieval and pass it to the prompt function in step 2 to return the score.

The hybrid search function will hen use the scores from the prompting function to compute a dynamic alpha parameter which is then passed into the normalised scores of dense and spare retrievals obtained earlier and eventually sorting the scores and outputting the top 5 results.

In [None]:
# Step 2: Main hybrid retrieval with dynamic alpha by getting top 2 from both BM25 and Pinecone
def hybrid_search_dat(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # --- BM25 Search ---
    if lang == 'it':
        bm25_scores = bm25_it.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # --- Semantic Search (Pinecone) ---
    query_vec = model.encode(query).tolist() #Encoding the query using the BGE-M3 sentence transformer model to obtain a vector representation of the query.
    pinecone_results = index.query(vector=query_vec, top_k=top_k, include_metadata=False) #Querying the Pinecone index for the top-k results based on the query vector.

    # Parse Pinecone results
    pinecone_ids = [int(match['id'].split('-')[1]) for match in pinecone_results['matches']]    #Extracting the IDs from the Pinecone results.
    pinecone_scores = [match['score'] for match in pinecone_results['matches']]     #Extracting the IDs and scores from the Pinecone results.

    # Get top-2 text from both for alpha calculation
    bm25_top_2_idx = np.argsort(bm25_scores)[-2:][::-1]  #  Obtaining the top 2 indices of the BM25 scores using numpy argsort function.
    dense_top_2_idx = pinecone_ids[:2] #obtaining the top 2 indices of the dense scores from the Pinecone results.
    bm25_text = " || ".join(df_embeddings['title'][i] for i in bm25_top_2_idx) #Joining the titles of the top 2 BM25 scores with " || " as a separator.
    dense_text = " || ".join(df_embeddings['title'][i] for i in dense_top_2_idx) #Joining the titles of the top 2 dense scores with " || " as a separator.

    # --- Get dynamic alpha from GPT ---
    start = time.time() #this was added for debugging process to see how long the alpha fetching takes.
    alpha = get_dynamic_alpha(query, dense_text, bm25_text)
    print(f"Alpha fetched: {alpha} in {time.time() - start:.2f}s")

    # --- Normalize Scores ---
    scaler = MinMaxScaler()
    bm25_norm = scaler.fit_transform(np.array(bm25_scores).reshape(-1, 1)).flatten() # Normalizing the BM25 scores using MinMaxScaler to scale the scores between 0 and 1 and reshape the array to be 2D for the scaler and flatten it back to 1D.
    pinecone_norm = scaler.fit_transform(np.array(pinecone_scores).reshape(-1, 1)).flatten()  # Normalize Pinecone semantic scores similarly as above

    # --- Combine scores using dynamic alpha ---
    hybrid_results = []
    for idx, semantic_score in zip(pinecone_ids, pinecone_norm):
        final_score = alpha * semantic_score + (1 - alpha) * bm25_norm[idx] ## Compute weighted score using alpha (higher alpha → more reliance on semantic)
        hybrid_results.append((idx, final_score)) # Append the index and final score to the hybrid_results list.

    # Sort by hybrid score
    hybrid_results.sort(key=lambda x: x[1], reverse=True) ## Sort the hybrid results based on the final score in descending order.

    # --- Prepare detailed results ---
    detailed_results = []
    for idx, hybrid_score in hybrid_results[:top_k]:
        bm25_score = round(bm25_norm[idx], 4)
        semantic_score = round(pinecone_norm[pinecone_ids.index(idx)], 4)
        detailed_results.append((idx, round(hybrid_score, 4), bm25_score, semantic_score))

    return detailed_results

In [None]:
# Step 2: Main hybrid retrieval with dynamic alpha by getting top 1 from both BM25 and Pinecone (From research paper)
def hybrid_search_dat(query, top_k=5):
    lang = detect(query)
    tokens = query.lower().split()

    # --- BM25 Search ---
    if lang == 'it':
        bm25_scores = bm25_it.get_scores(tokens)
    else:
        bm25_scores = bm25_en.get_scores(tokens)

    # --- Semantic Search (Pinecone) ---
    query_vec = model.encode(query).tolist()
    pinecone_results = index.query(vector=query_vec, top_k=top_k, include_metadata=False) ## Submit the dense vector to Pinecone and retrieve the top-k most similar indexed vectors

    # Parse Pinecone results
    pinecone_ids = [int(match['id'].split('-')[1]) for match in pinecone_results['matches']] #Extracting the IDs from the Pinecone results using match['id'] and splitting it to get the index of the document.
    pinecone_scores = [match['score'] for match in pinecone_results['matches']] #Extracting the scores from the Pinecone results using match['score'] and storing them in a list.

    # Get top-1 text from both for alpha calculation
    bm25_top_idx = int(np.argmax(bm25_scores)) #obtaining the index of the top 1 BM25 score using numpy argmax function.
    dense_top_idx = pinecone_ids[0] #obtaining the index of the top 1 dense score from the Pinecone results using the first element of the pinecone_ids list.
    bm25_text = italian_embeddings['title'][bm25_top_idx]
    dense_text = italian_embeddings['title'][dense_top_idx]

    # --- Get dynamic alpha from GPT ---
    start = time.time()
    alpha = get_dynamic_alpha(query, dense_text, bm25_text)
    print(f"Alpha fetched: {alpha} in {time.time() - start:.2f}s")

    # --- Normalize Scores ---
    scaler = MinMaxScaler()
    bm25_norm = scaler.fit_transform(np.array(bm25_scores).reshape(-1, 1)).flatten() 
    pinecone_norm = scaler.fit_transform(np.array(pinecone_scores).reshape(-1, 1)).flatten()

    # --- Combine scores using dynamic alpha ---
    hybrid_results = []
    for idx, semantic_score in zip(pinecone_ids, pinecone_norm):
        final_score = alpha * semantic_score + (1 - alpha) * bm25_norm[idx]
        hybrid_results.append((idx, final_score))

    # Sort by hybrid score
    hybrid_results.sort(key=lambda x: x[1], reverse=True)

    # --- Prepare detailed results ---
    detailed_results = []
    for idx, hybrid_score in hybrid_results[:top_k]:
        bm25_score = round(bm25_norm[idx], 4)
        semantic_score = round(pinecone_norm[pinecone_ids.index(idx)], 4)
        detailed_results.append((idx, round(hybrid_score, 4), bm25_score, semantic_score))

    return detailed_results

Step 4: Results & conclusion section

In [74]:
results = hybrid_search_dat("I want a blue shirt")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {italian_embeddings['title'][idx]}")

Alpha fetched: 0.5 in 6.92s
0.5000 | 0.0000 | 1.0000 | bluey bingo graphic tshirt toddler 
0.4447 | 0.5894 | 0.3000 | boys power rangers blue ranger costume tee tshirt
0.2433 | 0.4867 | 0.0000 | toddler boys toy story friend in me short sleeve graphic tshirt  blue
0.2014 | 0.0000 | 1.0000 | bluey bingo graphic tshirt toddler 
0.1844 | 0.0000 | 0.3687 | boys jaws shark blueprint tshirt


In [73]:
results = hybrid_search_dat("Woman leather jacket")

for idx, hybrid, bm25, semantic in results:
    print(f"{hybrid:.4f} | {bm25:.4f} | {semantic:.4f} | {italian_embeddings['title_italian'][idx]}")

Alpha fetched: 0.5 in 17.54s
0.9248 | 0.8496 | 1.0000 | Donne donne 6 in pelle pizzo stivale ponte caviglia
0.9248 | 0.8496 | 1.0000 | Donne donne 6 in pelle pizzo stivale ponte caviglia
0.5000 | 1.0000 | 0.0000 | lo zaino in pelle sak womens loyola
0.2279 | 0.0000 | 0.4559 | womens leopard print highrise leggings joylab
0.2279 | 0.0000 | 0.4559 | womens leopard print highrise leggings joylab


In [21]:
results = hybrid_search_dat("Men white shirt")

output = []

for idx, hybrid, bm25, semantic in results:
    result_entry = {
        "title_italian": df_embeddings['title_italian'][idx],
        "hybrid_score": round(hybrid, 4),
        "bm25_score": round(bm25, 4),
        "semantic_score": round(semantic, 4)
    }
    output.append(result_entry)

# Print the list nicely
for item in output:
    print(item)


Alpha fetched: 0.5 in 8.08s
{'title_italian': 'samsung qn90d 50 4k neo qled smart tv 2024', 'hybrid_score': 0.5, 'bm25_score': 0.0, 'semantic_score': 1.0}
{'title_italian': 'uomo neve bianco e i sette nani xmas heigh ho tshirt', 'hybrid_score': 0.4605, 'bm25_score': 0.5608, 'semantic_score': 0.3603}
{'title_italian': 'mens rocky sport utility 600g isolato impermeabile stivale', 'hybrid_score': 0.2546, 'bm25_score': 0.0, 'semantic_score': 0.5092}
{'title_italian': 'mens rocky sport utility 600g isolato impermeabile stivale', 'hybrid_score': 0.1801, 'bm25_score': 0.0, 'semantic_score': 0.5092}
{'title_italian': 'ragazzi lilo punto rosso e blu gamer tshirt', 'hybrid_score': 0.0, 'bm25_score': 0.0, 'semantic_score': 0.0}


In [76]:
results = hybrid_search_dat("Men white shirt")

output = []

for idx, hybrid, bm25, semantic in results:
    result_entry = {
        "title_italian": italian_embeddings['title_italian'][idx],
        "hybrid_score": round(hybrid, 4),
        "bm25_score": round(bm25, 4),
        "semantic_score": round(semantic, 4)
    }
    output.append(result_entry)

# Print the list nicely
for item in output:
    print(item)


Alpha fetched: 0.5 in 8.80s
{'title_italian': 'batman ninja nero ninja camminare uomo bianco manica lunga camicia', 'hybrid_score': 0.9761, 'bm25_score': 0.9523, 'semantic_score': 1.0}
{'title_italian': 'uomo neve bianco e i sette nani xmas heigh ho tshirt', 'hybrid_score': 0.5372, 'bm25_score': 0.5498, 'semantic_score': 0.5246}
{'title_italian': 'uomo neve bianco e i sette nani xmas heigh ho tshirt', 'hybrid_score': 0.4651, 'bm25_score': 0.5498, 'semantic_score': 0.5246}
{'title_italian': 'mens slim fit girocollo tshirt 3pk dealworthy bianco', 'hybrid_score': 0.3341, 'bm25_score': 0.6368, 'semantic_score': 0.0314}
{'title_italian': 'mens marvel spiderman maniche corte tshirt grafica bianco', 'hybrid_score': 0.3184, 'bm25_score': 0.6368, 'semantic_score': 0.0}


In [77]:
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize

top_idx = results[0][0]  # index of top-1 result

reference_query = "Woman leather jacket"
predicted_title = italian_embeddings['title_italian'][top_idx]

# Tokenize both properly
reference_tokens = word_tokenize(reference_query)
predicted_tokens = word_tokenize(predicted_title)

# Pass the reference as a *list of references* (each is token list)
meteor = meteor_score([reference_tokens], predicted_tokens)

print(f"METEOR Score: {meteor:.4f}")

METEOR Score: 0.0000


In [79]:
from bert_score import score

# Inputs
reference_query = "Woman leather jacket"
predicted_title = italian_embeddings['title_italian'][top_idx]

# BERTScore evaluation
P, R, F1 = score([predicted_title], [reference_query], lang="multilingual", verbose=False)

print(f"BERTScore-F1: {F1[0]:.4f}")


BERTScore-F1: 0.6655


In [80]:
results = hybrid_search_dat("Women denim jacket")

output = []

for idx, hybrid, bm25, semantic in results:
    result_entry = {
        "title_italian": italian_embeddings['title_italian'][idx],
        "hybrid_score": round(hybrid, 4),
        "bm25_score": round(bm25, 4),
        "semantic_score": round(semantic, 4)
    }
    output.append(result_entry)

# Print the list nicely
for item in output:
    print(item)

Alpha fetched: 0.5 in 7.81s
{'title_italian': 'ragazze puffer giacca tutto in movimento', 'hybrid_score': 1.0, 'bm25_score': 1.0, 'semantic_score': 1.0}
{'title_italian': 'wink pro donne snap giacca di riscaldamento anteriore', 'hybrid_score': 0.9293, 'bm25_score': 0.9443, 'semantic_score': 0.9144}
{'title_italian': 'donne slim fit drappeggio avvolgere tshirt un nuovo giorno', 'hybrid_score': 0.4204, 'bm25_score': 0.0, 'semantic_score': 0.8408}
{'title_italian': 'Donne donne 6 in pelle pizzo stivale ponte caviglia', 'hybrid_score': 0.0, 'bm25_score': 0.0, 'semantic_score': 0.0}
{'title_italian': 'Donne donne 6 in pelle pizzo stivale ponte caviglia', 'hybrid_score': 0.0, 'bm25_score': 0.0, 'semantic_score': 0.0}


In [None]:
results = hybrid_search_dat("Women denim jacket")
#with improved prompting & Top 2 search
output = []

for idx, hybrid, bm25, semantic in results:
    result_entry = {
        "title_italian": italian_embeddings['title_italian'][idx],
        "hybrid_score": round(hybrid, 4),
        "bm25_score": round(bm25, 4),
        "semantic_score": round(semantic, 4)
    }
    output.append(result_entry)

# Print the list nicely
for item in output:
    print(item)

Alpha fetched: 0.5 in 5.72s
{'title_italian': 'ragazze puffer giacca tutto in movimento', 'hybrid_score': 1.0, 'bm25_score': 1.0, 'semantic_score': 1.0}
{'title_italian': 'wink pro donne snap giacca di riscaldamento anteriore', 'hybrid_score': 0.9293, 'bm25_score': 0.9443, 'semantic_score': 0.9144}
{'title_italian': 'donne slim fit drappeggio avvolgere tshirt un nuovo giorno', 'hybrid_score': 0.4204, 'bm25_score': 0.0, 'semantic_score': 0.8408}
{'title_italian': 'Donne donne 6 in pelle pizzo stivale ponte caviglia', 'hybrid_score': 0.0, 'bm25_score': 0.0, 'semantic_score': 0.0}
{'title_italian': 'Donne donne 6 in pelle pizzo stivale ponte caviglia', 'hybrid_score': 0.0, 'bm25_score': 0.0, 'semantic_score': 0.0}


In summary, the impact of hybrid scores on the datasets and the varied weightage of sparse and dense retrieval for each query supports our exploration of the need to have dynamic weightage on the BM25 and the semantic scores obtained from pinecone, as certain queries might lead to the more direct keyword based search whereas certain vague queries might need semantics to help achieve the result.