# SHL Assessment Recommender System

The following notebook covers the design, implementation, and evaluation of a recommendation system that suggests relevant SHL assessments based on a natural language job description or query. The goal is to retrieve and rank the most suitable assessments from SHL’s product catalog by leveraging modern information retrieval techniques.

The system supports both **lexical (BM25) and semantic (SBERT) search using FAISS**, along with **metadata-based filtering** (e.g., test type and duration) and **Gemini API Integration** for Query processing. It returns a ranked list of recommended assessments, each with essential attributes like remote testing support, duration, adaptiveness, and test type.
Key features such as semantic search, filtering, and hybrid ranking were added iteratively, with performance evaluated at each step using Recall@10 and MAP@10 metrics. This allowed for data-driven development.

This notebook includes:

- Data preprocessing and enrichment
- Query understanding and parsing
- BM25 and semantic search pipelines
- Filtering logic based on duration and test types
- Ranking fusion using Reciprocal Rank Fusion (RRF)
- Hybrid approach by reranking BM25 outputs using SBERT
- Evaluation using Recall@10 and MAP@10 on the official test set
- Discussion of results and potential areas for improvement

The system aims to provide relevant and context-aware assessment recommendations with minimal latency and no supervised training, making it a scalable baseline for further enhancement.

## Metrics
The model achieves **30% Recall@10 and 21% MAP@10** on the test dataset. This metric evaluation is limited by the size of the test set, but the model was evaluated on realistic job descriptions with hand-mapped ground truths. While zero-shot and untrained, the approach is quantifiably competitive and robust.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/shl-product-catalog/individual_assessments.csv
/kaggle/input/shl-product-catalog/prepackaged_assessments.csv


# Data Cleaning

In [2]:
df1 = pd.read_csv("/kaggle/input/shl-product-catalog/individual_assessments.csv")
df2 = pd.read_csv("/kaggle/input/shl-product-catalog/prepackaged_assessments.csv")

df = pd.concat([df1, df2], ignore_index=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515 entries, 0 to 514
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   url               515 non-null    object
 1   title             515 non-null    object
 2   description       515 non-null    object
 3   remote_support    515 non-null    object
 4   duration          451 non-null    object
 5   test_types        515 non-null    object
 6   adaptive_support  515 non-null    object
dtypes: object(7)
memory usage: 28.3+ KB


In [3]:
df.head(5)

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
0,https://www.shl.com/products/product-catalog/v...,Global Skills Development Report,This report is designed to be given to individ...,no,,"['Ability & Aptitude', 'Assessment Exercises',...",no
1,https://www.shl.com/products/product-catalog/v...,.NET MVC (New),Multi-choice test that measures the knowledge ...,yes,Approximate Completion Time in minutes = 17,['Knowledge & Skills'],no
2,https://www.shl.com/products/product-catalog/v...,.NET MVVM (New),Multi-choice test that measures the knowledge ...,yes,Approximate Completion Time in minutes = 5,['Knowledge & Skills'],no
3,https://www.shl.com/products/product-catalog/v...,.NET Framework 4.5,The.NET Framework 4.5 test measures knowledge ...,yes,Approximate Completion Time in minutes = 30,['Knowledge & Skills'],yes
4,https://www.shl.com/products/product-catalog/v...,.NET WPF (New),Multi-choice test that measures the knowledge ...,yes,Approximate Completion Time in minutes = 9,['Knowledge & Skills'],no


In [4]:
df.isna().any()

url                 False
title               False
description         False
remote_support      False
duration             True
test_types          False
adaptive_support    False
dtype: bool

In [5]:
df_clean = df.dropna().reset_index(drop=True)

In [6]:
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 451 entries, 0 to 450
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   url               451 non-null    object
 1   title             451 non-null    object
 2   description       451 non-null    object
 3   remote_support    451 non-null    object
 4   duration          451 non-null    object
 5   test_types        451 non-null    object
 6   adaptive_support  451 non-null    object
dtypes: object(7)
memory usage: 24.8+ KB


In [7]:
import re                          # Finds mean if range of durations given
df_clean['duration_minutes'] = df_clean['duration'].apply(lambda x: np.mean([int(i) for i in re.findall(r'\d+', str(x))]) if re.findall(r'\d+', str(x)) else np.nan)

In [8]:
df_clean['duration'] = df_clean['duration_minutes']
df_clean = df_clean.drop(columns=['duration_minutes'])

In [9]:
df_clean

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
0,https://www.shl.com/products/product-catalog/v...,.NET MVC (New),Multi-choice test that measures the knowledge ...,yes,17.0,['Knowledge & Skills'],no
1,https://www.shl.com/products/product-catalog/v...,.NET MVVM (New),Multi-choice test that measures the knowledge ...,yes,5.0,['Knowledge & Skills'],no
2,https://www.shl.com/products/product-catalog/v...,.NET Framework 4.5,The.NET Framework 4.5 test measures knowledge ...,yes,30.0,['Knowledge & Skills'],yes
3,https://www.shl.com/products/product-catalog/v...,.NET WPF (New),Multi-choice test that measures the knowledge ...,yes,9.0,['Knowledge & Skills'],no
4,https://www.shl.com/products/product-catalog/v...,Accounts Payable (New),Multiple-choice test that measures the knowled...,yes,9.0,['Knowledge & Skills'],no
...,...,...,...,...,...,...,...
446,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Individual 7.0 Solution,Our Workplace Safety - Individual 7.0 solution...,yes,16.0,['Biodata & Situational Judgement'],no
447,https://www.shl.com/products/product-catalog/v...,Teller 7.0,Teller or cashier positions are integral to fi...,yes,35.0,"['Biodata & Situational Judgement', 'Knowledge...",yes
448,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.1 (International),The Workplace Safety – Team 7.1 solution is de...,yes,20.0,"['Biodata & Situational Judgement', 'Competenc...",no
449,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.0 Solution,The Workplace Safety – Team 7.0 solution is de...,yes,20.0,['Biodata & Situational Judgement'],no


In [10]:
df_clean['remote_support'] = 'yes'

In [11]:
df_clean

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
0,https://www.shl.com/products/product-catalog/v...,.NET MVC (New),Multi-choice test that measures the knowledge ...,yes,17.0,['Knowledge & Skills'],no
1,https://www.shl.com/products/product-catalog/v...,.NET MVVM (New),Multi-choice test that measures the knowledge ...,yes,5.0,['Knowledge & Skills'],no
2,https://www.shl.com/products/product-catalog/v...,.NET Framework 4.5,The.NET Framework 4.5 test measures knowledge ...,yes,30.0,['Knowledge & Skills'],yes
3,https://www.shl.com/products/product-catalog/v...,.NET WPF (New),Multi-choice test that measures the knowledge ...,yes,9.0,['Knowledge & Skills'],no
4,https://www.shl.com/products/product-catalog/v...,Accounts Payable (New),Multiple-choice test that measures the knowled...,yes,9.0,['Knowledge & Skills'],no
...,...,...,...,...,...,...,...
446,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Individual 7.0 Solution,Our Workplace Safety - Individual 7.0 solution...,yes,16.0,['Biodata & Situational Judgement'],no
447,https://www.shl.com/products/product-catalog/v...,Teller 7.0,Teller or cashier positions are integral to fi...,yes,35.0,"['Biodata & Situational Judgement', 'Knowledge...",yes
448,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.1 (International),The Workplace Safety – Team 7.1 solution is de...,yes,20.0,"['Biodata & Situational Judgement', 'Competenc...",no
449,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.0 Solution,The Workplace Safety – Team 7.0 solution is de...,yes,20.0,['Biodata & Situational Judgement'],no


In [12]:
import ast
df_clean['test_types'] = df_clean['test_types'].apply(ast.literal_eval)

print(df_clean['test_types'].iloc[0])


['Knowledge & Skills']


In [13]:
df_clean['remote_support'] = df_clean['remote_support'].replace({'yes': True, 'no': False})
df_clean['adaptive_support'] = df_clean['adaptive_support'].replace({'yes': True, 'no': False})

df_clean.tail(10)

  df_clean['remote_support'] = df_clean['remote_support'].replace({'yes': True, 'no': False})
  df_clean['adaptive_support'] = df_clean['adaptive_support'].replace({'yes': True, 'no': False})


Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
441,https://www.shl.com/products/product-catalog/v...,Telenurse Solution,The Telenurse solution is for positions in a h...,True,68.0,"[Ability & Aptitude, Biodata & Situational Jud...",True
442,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.1 (Americas),The Workplace Safety – Team 7.1 solution is de...,True,20.0,"[Biodata & Situational Judgement, Competencies...",False
443,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Individual 7.1 (Americas),Our Workplace Safety - Individual 7.1 solution...,True,16.0,[Biodata & Situational Judgement],False
444,https://www.shl.com/products/product-catalog/v...,Teller with Sales - Short Form,The Teller solution with Sales is for entry-le...,True,35.0,"[Ability & Aptitude, Biodata & Situational Jud...",False
445,https://www.shl.com/products/product-catalog/v...,Transcriptionist Solution,The Transcriptionist solution is for entry-lev...,True,33.0,"[Ability & Aptitude, Biodata & Situational Jud...",True
446,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Individual 7.0 Solution,Our Workplace Safety - Individual 7.0 solution...,True,16.0,[Biodata & Situational Judgement],False
447,https://www.shl.com/products/product-catalog/v...,Teller 7.0,Teller or cashier positions are integral to fi...,True,35.0,"[Biodata & Situational Judgement, Knowledge & ...",True
448,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.1 (International),The Workplace Safety – Team 7.1 solution is de...,True,20.0,"[Biodata & Situational Judgement, Competencies...",False
449,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.0 Solution,The Workplace Safety – Team 7.0 solution is de...,True,20.0,[Biodata & Situational Judgement],False
450,https://www.shl.com/products/product-catalog/v...,Workplace Safety Solution,The Workplace Safety Solution is designed for ...,True,21.0,"[Biodata & Situational Judgement, Personality ...",True


In [14]:
df_clean

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
0,https://www.shl.com/products/product-catalog/v...,.NET MVC (New),Multi-choice test that measures the knowledge ...,True,17.0,[Knowledge & Skills],False
1,https://www.shl.com/products/product-catalog/v...,.NET MVVM (New),Multi-choice test that measures the knowledge ...,True,5.0,[Knowledge & Skills],False
2,https://www.shl.com/products/product-catalog/v...,.NET Framework 4.5,The.NET Framework 4.5 test measures knowledge ...,True,30.0,[Knowledge & Skills],True
3,https://www.shl.com/products/product-catalog/v...,.NET WPF (New),Multi-choice test that measures the knowledge ...,True,9.0,[Knowledge & Skills],False
4,https://www.shl.com/products/product-catalog/v...,Accounts Payable (New),Multiple-choice test that measures the knowled...,True,9.0,[Knowledge & Skills],False
...,...,...,...,...,...,...,...
446,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Individual 7.0 Solution,Our Workplace Safety - Individual 7.0 solution...,True,16.0,[Biodata & Situational Judgement],False
447,https://www.shl.com/products/product-catalog/v...,Teller 7.0,Teller or cashier positions are integral to fi...,True,35.0,"[Biodata & Situational Judgement, Knowledge & ...",True
448,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.1 (International),The Workplace Safety – Team 7.1 solution is de...,True,20.0,"[Biodata & Situational Judgement, Competencies...",False
449,https://www.shl.com/products/product-catalog/v...,Workplace Safety - Team 7.0 Solution,The Workplace Safety – Team 7.0 solution is de...,True,20.0,[Biodata & Situational Judgement],False


In [15]:
df_clean.to_pickle('df_clean.pkl')

# Training/Validation

In [16]:
!pip install rank-bm25 pandas nltk



## BM25 Search

In [17]:
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from rank_bm25 import BM25Okapi

nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [18]:
df = pd.read_pickle('df_clean.pkl')

In [19]:
def preprocess(text):
    text = text.lower()
    text = re.sub(r"[^a-z0-9\s]", " ", text)
    tokens = word_tokenize(str(text).lower())
    return [word for word in tokens if word.isalnum() and word not in stop_words]

df['bm_tokens'] = (df['title'] + ' ' + df['description']).apply(preprocess)
df.to_pickle('df_bm25_tokenized.pkl')
df.head(5)

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support,bm_tokens
0,https://www.shl.com/products/product-catalog/v...,.NET MVC (New),Multi-choice test that measures the knowledge ...,True,17.0,[Knowledge & Skills],False,"[net, mvc, new, multi, choice, test, measures,..."
1,https://www.shl.com/products/product-catalog/v...,.NET MVVM (New),Multi-choice test that measures the knowledge ...,True,5.0,[Knowledge & Skills],False,"[net, mvvm, new, multi, choice, test, measures..."
2,https://www.shl.com/products/product-catalog/v...,.NET Framework 4.5,The.NET Framework 4.5 test measures knowledge ...,True,30.0,[Knowledge & Skills],True,"[net, framework, 4, 5, net, framework, 4, 5, t..."
3,https://www.shl.com/products/product-catalog/v...,.NET WPF (New),Multi-choice test that measures the knowledge ...,True,9.0,[Knowledge & Skills],False,"[net, wpf, new, multi, choice, test, measures,..."
4,https://www.shl.com/products/product-catalog/v...,Accounts Payable (New),Multiple-choice test that measures the knowled...,True,9.0,[Knowledge & Skills],False,"[accounts, payable, new, multiple, choice, tes..."


In [20]:
bm25 = BM25Okapi(df['bm_tokens'].tolist())
def bm25_search(query, top_n = 10):
    query_tokens = preprocess(query)

    scores = bm25.get_scores(query_tokens)
    top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_n]

    top_assessments = df.iloc[top_indices]
    return top_assessments

In [21]:
bm25_search("Administrative assistant job.  Entry-level role requiring 0-2 years of experience.  Assess candidate's administrative skills, problem-solving abilities, attention to detail, communication skills, teamwork, and organizational skills. Evaluate personality traits relevant to administrative work such as responsibility, dependability, and professionalism.  Situational judgement questions related to administrative tasks.  Questions on basic office software and procedures.  Candidate should demonstrate effective communication (written and verbal).")

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support,bm_tokens
317,https://www.shl.com/products/product-catalog/v...,Administrative Professional - Short Form,The Administrative Professional solution is fo...,True,36.0,"[Ability & Aptitude, Knowledge & Skills, Perso...",True,"[administrative, professional, short, form, ad..."
133,https://www.shl.com/products/product-catalog/v...,Interpersonal Communications,This adaptive test measures the candidate's kn...,True,25.0,[Knowledge & Skills],True,"[interpersonal, communications, adaptive, test..."
319,https://www.shl.com/products/product-catalog/v...,Bank Administrative Assistant - Short Form,The Administrative Assistant solution is for e...,True,35.0,"[Ability & Aptitude, Biodata & Situational Jud...",False,"[bank, administrative, assistant, short, form,..."
375,https://www.shl.com/products/product-catalog/v...,Insurance Administrative Assistant Solution,The Administrative Assistant solution is for e...,True,24.0,"[Ability & Aptitude, Biodata & Situational Jud...",True,"[insurance, administrative, assistant, solutio..."
49,https://www.shl.com/products/product-catalog/v...,Business Communications,This test measures the candidate's knowledge o...,True,35.0,[Knowledge & Skills],False,"[business, communications, test, measures, can..."
301,https://www.shl.com/products/product-catalog/v...,Workplace Administration Skills (New),Multi-choice test that measures the ability to...,True,12.0,[Knowledge & Skills],False,"[workplace, administration, skills, new, multi..."
54,https://www.shl.com/products/product-catalog/v...,Business Communication (adaptive),This is an adaptive test that measures knowled...,True,24.0,[Knowledge & Skills],True,"[business, communication, adaptive, adaptive, ..."
356,https://www.shl.com/products/product-catalog/v...,General Entry Level – Data Entry 7.0 Solution,Our General Entry Level – Data Entry 7.0 solut...,True,24.0,"[Biodata & Situational Judgement, Knowledge & ...",False,"[general, entry, level, data, entry, 7, 0, sol..."
251,https://www.shl.com/products/product-catalog/v...,SHL Verify Interactive G+,SHL Verify Interactive G+ (SVIG+) is a test of...,True,36.0,[Ability & Aptitude],True,"[shl, verify, interactive, g, shl, verify, int..."
70,https://www.shl.com/products/product-catalog/v...,Customer Service Phone Solution,"As part of Contact Center Simulations, the Cus...",True,30.0,"[Biodata & Situational Judgement, Personality ...",False,"[customer, service, phone, solution, part, con..."


## SBERT + FAISS

In [22]:
!pip install faiss-cpu sentence-transformers



In [23]:
import pandas as pd
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer

2025-05-04 23:57:22.117720: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746403042.155653     121 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746403042.166015     121 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [24]:
df = pd.read_pickle('df_clean.pkl')

In [25]:
descriptions = (df['title'] + ' ' + df['description']).tolist()

model = SentenceTransformer('all-mpnet-base-v2')
embeddings = model.encode(descriptions, show_progress_bar=True)

dimension = embeddings.shape[1]
faiss.normalize_L2(embeddings)
faiss_index = faiss.IndexFlatIP(dimension)
faiss_index.add(np.array(embeddings).astype('float32'))

Batches:   0%|          | 0/15 [00:00<?, ?it/s]

In [26]:
np.save('sbert_embeddings.npy', embeddings)
faiss.write_index(faiss_index, 'faiss_index.bin')

In [27]:
model = SentenceTransformer('all-mpnet-base-v2')
def semantic_search(query, top_k=10):
    query_embedding = model.encode([query])
    distances, indices = faiss_index.search(np.array(query_embedding).astype('float32'), top_k)
    return df.iloc[indices[0]]

In [28]:
semantic_search("Administrative assistant job.  Entry-level role requiring 0-2 years of experience.  Assess candidate's administrative skills, problem-solving abilities, attention to detail, communication skills, teamwork, and organizational skills. Evaluate personality traits relevant to administrative work such as responsibility, dependability, and professionalism.  Situational judgement questions related to administrative tasks.  Questions on basic office software and procedures.  Candidate should demonstrate effective communication (written and verbal).")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,url,title,description,remote_support,duration,test_types,adaptive_support
317,https://www.shl.com/products/product-catalog/v...,Administrative Professional - Short Form,The Administrative Professional solution is fo...,True,36.0,"[Ability & Aptitude, Knowledge & Skills, Perso...",True
375,https://www.shl.com/products/product-catalog/v...,Insurance Administrative Assistant Solution,The Administrative Assistant solution is for e...,True,24.0,"[Ability & Aptitude, Biodata & Situational Jud...",True
319,https://www.shl.com/products/product-catalog/v...,Bank Administrative Assistant - Short Form,The Administrative Assistant solution is for e...,True,35.0,"[Ability & Aptitude, Biodata & Situational Jud...",False
301,https://www.shl.com/products/product-catalog/v...,Workplace Administration Skills (New),Multi-choice test that measures the ability to...,True,12.0,[Knowledge & Skills],False
368,https://www.shl.com/products/product-catalog/v...,Healthcare Aide 7.0 Solution,Our Healthcare Aide 7.0 solution is designed f...,True,22.0,[Biodata & Situational Judgement],False
423,https://www.shl.com/products/product-catalog/v...,Service Associate Solution,The Service Associate is designed for entry-le...,True,38.0,"[Ability & Aptitude, Biodata & Situational Jud...",True
352,https://www.shl.com/products/product-catalog/v...,General Entry Level - All Industries 7.0 Solution,Our General Entry Level – All Industries 7.0 s...,True,19.0,[Biodata & Situational Judgement],False
353,https://www.shl.com/products/product-catalog/v...,General Entry Level - All Industries 7.1(Ameri...,Our General Entry Level – All Industries 7.1 s...,True,19.0,[Biodata & Situational Judgement],False
380,https://www.shl.com/products/product-catalog/v...,Manager + 7.0 Solution,Our Manager + 7.0 solution is designed for can...,True,53.0,"[Ability & Aptitude, Biodata & Situational Jud...",True
394,https://www.shl.com/products/product-catalog/v...,Nursing Assistant Solution,The Nursing Assistant solution is for entry-le...,True,41.0,"[Ability & Aptitude, Biodata & Situational Jud...",False


## Gemini API Call

In [None]:
import google.generativeai as genai
import json


genai.configure(api_key="API_KEY")
genmodel = genai.GenerativeModel("gemini-1.5-flash")

PROMPT_TEMPLATE = """
For the following job description, extract a structured query suitable for retrieving relevant candidate assessments. The output should be optimized for semantic and lexical similarity matching in retrieval systems.

Include:
All relevant job duties and selection criteria.
Seniority level (e.g., entry level, mid, senior).
Whether it's technical or non-technical based on job responsibilities.
Test duration in minutes (a single integer) if stated; otherwise, set to -1.
Most relevant assessment categories from the following list:
  ['Ability & Aptitude', 'Biodata & Situational Judgement', 'Competencies', 'Development & 360', 'Assessment Exercises', 'Knowledge & Skills', 'Personality & Behavior', 'Simulations'].

Format: Return only a valid JSON object:
{{
  "duration": 0,
  "type": "technical" or "non-technical",
  "test_types": [],
  "query": "..."  // Detailed and precise query in natural language suitable for semantic (SBERT) retrieval and with keyword for lexical (BM25)
}}

Only return the JSON. No explanation or extra text.

QUERY:
{query}
"""

def generate_and_parse_query(job_text):
    prompt = PROMPT_TEMPLATE.format(query=job_text)
    
    try:
        response = genmodel.generate_content(prompt)
        content = response.text.strip()
        cleaned = re.sub(r"```json|```", "", content).strip()
        
        parsed_json = json.loads(cleaned)
        return parsed_json

    except json.JSONDecodeError:
        print("Failed to parse JSON. Response was:\n", content)
        return None
    except Exception as e:
        print(f"Error during generation: {e}")
        return None

job_description = """ICICI Bank Assistant Admin,
Experience required 0-2 years, test
should be 30-40 mins long
"""
result = generate_and_parse_query(job_description)

if result:
    print("Parsed JSON:\n", json.dumps(result, indent=2))


Parsed JSON:
 {
  "duration": 35,
  "type": "non-technical",
  "test_types": [
    "Biodata & Situational Judgement",
    "Competencies",
    "Knowledge & Skills",
    "Personality & Behavior"
  ],
  "query": "Administrative assistant role requiring 0-2 years of experience.  Assess candidate's administrative skills, organizational abilities, attention to detail, problem-solving skills, communication skills (written and verbal), teamwork abilities, and ability to work under pressure.  Evaluate personality traits such as dependability, responsibility, and adaptability.  Situational judgment questions assessing handling of administrative tasks and customer interactions.  The assessment should be approximately 35 minutes long and suitable for entry-level administrative positions within a banking environment.  Keywords: administrative assistant, bank, entry level, organizational skills, communication, problem-solving, teamwork,  dependability, responsibility, adaptability, situational judgm

## Testing

### Gemini + BM25 / Gemini + SBERT + FAISS Results

In [30]:
import pandas as pd

queries = ["I am hiring for Java developers who can also collaborate effectively with my business teams. Looking for an assessment(s) that can be completed in 40 minutes.", "I want to hire new graduates for a sales role in my company, the budget is for about an hour for each test. Give me some options", "I am looking for a COO for my company in China and I want to see if they are culturally a right fit for our company. Suggest me an assessment that they can complete in about an hour", "Content Writer required, expert in English and SEO.",
    """
    Find me 1 hour long assesment for
the below job at SHL
Job Description
Join a community that is shaping
the future of work! SHL, People
Science. People Answers.
Are you a seasoned QA Engineer
with a flair for innovation? Are you
ready to shape the future of talent
assessment and empower
organizations to unlock their full
potential? If so, we want you to be a
part of the SHL Team! As a QA
Engineer, you will be involved in
creating and implementing software
solutions that contribute to the
development of our groundbreaking products.
An excellent benefit package is
offered in a culture where career
development, with ongoing
manager guidance, collaboration,
flexibility, diversity, and inclusivity
are all intrinsic to our culture. There
is a huge investment in SHL
currently so there’s no better time
to become a part of something
transformational.
What You Will Be Doing
Getting involved in engineering
quality assurance and providing
inputs when required.
Create and develop test plans for
various forms of testing.
Conducts and/or participates in
formal and informal test case
reviews.
Develop and initiate functional
tests and regression tests.
Rolling out improvements for
testing and quality processes.
Essential
What we are looking for from you:
Development experience – Java or
JavaScript, CSS, HTML (Automation)
Selenium WebDriver and page
object design pattern (Automation)
SQL server knowledge
Test case management experience.
Manual Testing
Desirable
Knowledge the basic concepts of
testing
Strong solution-finding experience
Strong verbal and written
communicator.
Get In Touch
Find out how this one-off
opportunity can help you achieve
your career goals by making an
application to our knowledgeable
and friendly Talent Acquisition
team. Choose a new path with SHL.
#CareersAtSHL #SHLHiringTalent
#TechnologyJobs
#QualityAssuranceJobs
#CareerOpportunities
#JobOpportunities
About Us
We unlock the possibilities of
businesses through the power of
people, science and technology.
We started this industry of people
insight more than 40 years ago and
continue to lead the market with
powerhouse product launches,
ground-breaking science and
business transformation.
When you inspire and transform
people’s lives, you will experience
the greatest business outcomes
possible. SHL’s products insights,
experiences, and services can help
achieve growth at scale. 
What SHL Can Offer You
Diversity, equity, inclusion and
accessibility are key threads in the
fabric of SHL’s business and culture
(find out more about DEI and
accessibility at SHL )
Employee benefits package that
takes care of you and your family.
Support, coaching, and on-the-job
development to achieve career
success
A fun and flexible workplace where
you’ll be inspired to do your best
work (find out more LifeAtSHL )
The ability to transform workplaces
around the world for others.
SHL is an equal opportunity
employer. We support and
encourage applications from a
diverse range of candidates. We
can, and do make adjustments to
make sure our recruitment process
is as inclusive as possible.
SHL is an equal opportunity
employer.
    """,
    
    "ICICI Bank Assistant Admin, Experience required 0-2 years, test should be 30-40 mins long", 
    
    """"
    KEY RESPONSIBITILES:
Manage the sound-scape of the station through appropriate
creative and marketing
interventions to Increase or
Maintain the listenership
Acts as an interface between
Programming & sales team, thereby
supporting the sales team by
providing creative inputs in order to
increase the overall ad spends by
clients
Build brand Mirchi by ideating fresh
programming initiatives on air
campaigns, programming led onground events & new properties to
ensure brand differentiation & thus
increase brand recall at station level
Invest time in local RJs to grow &
develop them as local celebrities
Through strong networking, must
focus on identifying the best of local
talent and ensure to bring the
creative minds from the market on
board with Mirchi
Build radio as a category for both
listeners & advertisers
People Management
Identifying the right talent and
investing time in developing them
by frequent feedback on their
performance
Monitor, Coach and mentor team
members on a regular basis
Development of Jocks as per
guidelines
Must have an eye to spot the local
talent to fill up vacancies locally
TECHNICAL SKILLS &
QUALIFICATION REQUIRED:
Graduation / Post Graduation (Any
specialisation) with 8 -12 years of
relevant experience
Experience in digital content
conceptualisation
Strong branding focus
Must be well-read in variety of
areas and must keep up with the
latest events in the city / cluster /
country
Must know to read, write & speak
English
PERSONAL ATTRIBUTES:
Excellent communication skills
Good interpersonal skills
People management
Suggest me some tests for the
above jd. The duration should be at
most 90 mins
    """]

outs = [
    ["https://www.shl.com/solutions/products/product-catalog/view/automata-fix-new/", "https://www.shl.com/solutions/products/product-catalog/view/core-java-entry-level-new/", "https://www.shl.com/solutions/products/product-catalog/view/java-8-new/", "https://www.shl.com/solutions/products/product-catalog/view/core-java-advanced-level-new/", "https://www.shl.com/solutions/products/product-catalog/view/agile-software-development/", "https://www.shl.com/solutions/products/product-catalog/view/technology-professional-8-0-job-focused-assessment/", "https://www.shl.com/solutions/products/product-catalog/view/computer-science-new/"],
    
    ["https://www.shl.com/solutions/products/product-catalog/view/entry-level-sales-7-1/", "https://www.shl.com/solutions/products/product-catalog/view/entry-level-sales-sift-out-7-1/", "https://www.shl.com/solutions/products/product-catalog/view/entry-level-sales-solution/", "https://www.shl.com/solutions/products/product-catalog/view/sales-representative-solution/", "https://www.shl.com/solutions/products/product-catalog/view/sales-support-specialist-solution/", "https://www.shl.com/solutions/products/product-catalog/view/technical-sales-associate-solution/", "https://www.shl.com/solutions/products/product-catalog/view/svar-spoken-english-indian-accent-new/", "https://www.shl.com/solutions/products/product-catalog/view/sales-and-service-phone-solution/", "https://www.shl.com/solutions/products/product-catalog/view/sales-and-service-phone-simulation/", "https://www.shl.com/solutions/products/product-catalog/view/english-comprehension-new/"],
    
    ["https://www.shl.com/solutions/products/product-catalog/view/motivation-questionnaire-mqm5/", "https://www.shl.com/solutions/products/product-catalog/view/global-skills-assessment/", "https://www.shl.com/solutions/products/product-catalog/view/graduate-8-0-job-focused-assessment-4228/"],

    ["https://www.shl.com/solutions/products/product-catalog/view/drupal-new/", "https://www.shl.com/solutions/products/product-catalog/view/search-engine-optimization-new/", "https://www.shl.com/solutions/products/product-catalog/view/administrative-professional-short-form/", "https://www.shl.com/solutions/products/product-catalog/view/entry-level-sales-sift-out-7-1/", "https://www.shl.com/solutions/products/product-catalog/view/general-entry-level-data-entry-7-0-solution/"],

    ["https://www.shl.com/solutions/products/product-catalog/view/automata-selenium/", "https://www.shl.com/solutions/products/product-catalog/view/automata-fix-new/", "https://www.shl.com/solutions/products/product-catalog/view/automata-front-end/", "https://www.shl.com/solutions/products/product-catalog/view/javascript-new/", "https://www.shl.com/solutions/products/product-catalog/view/htmlcss-new/", "https://www.shl.com/solutions/products/product-catalog/view/html5-new/", "https://www.shl.com/solutions/products/product-catalog/view/css3-new/", "https://www.shl.com/solutions/products/product-catalog/view/selenium-new/", "https://www.shl.com/solutions/products/product-catalog/view/sql-server-new/", "https://www.shl.com/solutions/products/product-catalog/view/automata-sql-new/", "https://www.shl.com/solutions/products/product-catalog/view/manual-testing-new/"],

    ["https://www.shl.com/solutions/products/product-catalog/view/administrative-professional-short-form/", "https://www.shl.com/solutions/products/product-catalog/view/verify-numerical-ability/", "https://www.shl.com/solutions/products/product-catalog/view/financial-professional-short-form/", "https://www.shl.com/solutions/products/product-catalog/view/bank-administrative-assistant-short-form/", "https://www.shl.com/solutions/products/product-catalog/view/general-entry-level-data-entry-7-0-solution/", "https://www.shl.com/solutions/products/product-catalog/view/basic-computer-literacy-windows-10-new/"],

    ["https://www.shl.com/solutions/products/product-catalog/view/verify-verbal-ability-next-generation/", "https://www.shl.com/solutions/products/product-catalog/view/shl-verify-interactive-inductive-reasoning/", "https://www.shl.com/solutions/products/product-catalog/view/occupational-personality-questionnaire-opq32r/"]
]

for i in range(len(outs)):
    for j in range(len(outs[i])):
        outs[i][j] = outs[i][j].replace("/solutions/", "/")

outs

[['https://www.shl.com/products/product-catalog/view/automata-fix-new/',
  'https://www.shl.com/products/product-catalog/view/core-java-entry-level-new/',
  'https://www.shl.com/products/product-catalog/view/java-8-new/',
  'https://www.shl.com/products/product-catalog/view/core-java-advanced-level-new/',
  'https://www.shl.com/products/product-catalog/view/agile-software-development/',
  'https://www.shl.com/products/product-catalog/view/technology-professional-8-0-job-focused-assessment/',
  'https://www.shl.com/products/product-catalog/view/computer-science-new/'],
 ['https://www.shl.com/products/product-catalog/view/entry-level-sales-7-1/',
  'https://www.shl.com/products/product-catalog/view/entry-level-sales-sift-out-7-1/',
  'https://www.shl.com/products/product-catalog/view/entry-level-sales-solution/',
  'https://www.shl.com/products/product-catalog/view/sales-representative-solution/',
  'https://www.shl.com/products/product-catalog/view/sales-support-specialist-solution/',
 

In [31]:
def average_precision_at_k(predicted_names, actual_names, k):
    if not actual_names:
        return 0.0

    predicted_names = predicted_names[:k]
    score = 0.0
    num_hits = 0.0

    for i, pred in enumerate(predicted_names):
        if pred in actual_names:
            num_hits += 1.0
            score += num_hits / (i + 1.0)

    return score / min(len(actual_names), k)


def mean_average_precision_at_k(predictions, ground_truth, k):
    all_ap = []
    for pred, actual in zip(predictions, ground_truth):
        ap = average_precision_at_k(pred, actual, k)
        all_ap.append(ap)
    return sum(all_ap) / len(all_ap)

def recall_at_k(predicted_names, actual_names, k):
    if not actual_names:
        return 0.0
    predicted_names = predicted_names[:k]
    hits = sum(1 for pred in predicted_names if pred in actual_names)
    return hits / len(actual_names)


def mean_recall_at_k(predictions, ground_truth, k):
    all_recalls = []
    for pred, actual in zip(predictions, ground_truth):
        rec = recall_at_k(pred, actual, k)
        all_recalls.append(rec)
    return sum(all_recalls) / len(all_recalls)

bm25_results = [list(bm25_search(generate_and_parse_query(x)['query'])['url']) for x in queries]
semantic_results = [list(semantic_search(generate_and_parse_query(x)['query'])['url']) for x in queries]
print(mean_recall_at_k(bm25_results, outs, 10))
print(mean_average_precision_at_k(bm25_results, outs, 10))
print(mean_recall_at_k(semantic_results, outs, 10))
print(mean_average_precision_at_k(semantic_results, outs, 10))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

0.21521335807050093
0.1421201814058957
0.20754483611626465
0.1516893424036281


### Without Gemini API

In [32]:
bm25_results = [list(bm25_search(x)['url']) for x in queries]
semantic_results = [list(semantic_search(x)['url']) for x in queries]
print(mean_recall_at_k(bm25_results, outs, 10))
print(mean_average_precision_at_k(bm25_results, outs, 10))
print(mean_recall_at_k(semantic_results, outs, 10))
print(mean_average_precision_at_k(semantic_results, outs, 10))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

0.14557823129251699
0.0707482993197279
0.22183055040197894
0.10915532879818593


### With RRF

In [33]:
def rrf_fusion(df1, df2, rrf_k=20):
    all_items = set(df1['url']).union(set(df2['url']))
    scores = {}

    for rank, url in enumerate(df1['url']):
        scores[url] = scores.get(url, 0) + 1 / (rrf_k + rank + 1)

    for rank, url in enumerate(df2['url']):
        scores[url] = scores.get(url, 0) + 1 / (rrf_k + rank + 1)

    ranked_urls = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [url for url, _ in ranked_urls[:10]]


In [35]:
results = [rrf_fusion(semantic_search(generate_and_parse_query(x)['query'], 60), bm25_search(generate_and_parse_query(x)['query'], 60)) for x in queries]
results

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[['https://www.shl.com/products/product-catalog/view/java-frameworks-new/',
  'https://www.shl.com/products/product-catalog/view/java-8-new/',
  'https://www.shl.com/products/product-catalog/view/core-java-advanced-level-new/',
  'https://www.shl.com/products/product-catalog/view/core-java-entry-level-new/',
  'https://www.shl.com/products/product-catalog/view/java-platform-enterprise-edition-7-java-ee-7/',
  'https://www.shl.com/products/product-catalog/view/java-web-services-new/',
  'https://www.shl.com/products/product-catalog/view/java-design-patterns-new/',
  'https://www.shl.com/products/product-catalog/view/java-2-platform-enterprise-edition-1-4-fundamental/',
  'https://www.shl.com/products/product-catalog/view/enterprise-java-beans-new/',
  'https://www.shl.com/products/product-catalog/view/ai-skills/'],
 ['https://www.shl.com/products/product-catalog/view/contact-center-sales-and-service-8-0/',
  'https://www.shl.com/products/product-catalog/view/sales-and-service-phone-solu

In [36]:
print(mean_recall_at_k(results, outs, 10))
print(mean_average_precision_at_k(results, outs, 10))

0.2975881261595547
0.2023469387755102


## Filtering + RRF

In [37]:
def filter_by_duration(df, max_duration):
    try:
        max_duration = int(max_duration)
        if max_duration <= 0:
            return df
        return df[df['duration'] <= max_duration]
    except:
        return df

def filter_by_test_type(df, target_types):
    def overlap(row_types, target):
        if isinstance(row_types, str):
            row_types = [t.strip() for t in row_types.split(",")]
        return len(set(row_types).intersection(set(target))) > 0

    return df[df['test_types'].apply(lambda x: overlap(x, target_types))]

In [39]:
bm, sm = [], []

for x in queries:
    q = generate_and_parse_query(x)
    query_text = q['query']
    duration = q['duration']
    test_types = q['test_types']

    bm_df = bm25_search(query_text, 20)
    sm_df = semantic_search(query_text, 20)

    bm_df = filter_by_duration(bm_df, duration)
    bm_df = filter_by_test_type(bm_df, test_types)

    sm_df = filter_by_duration(sm_df, duration)
    sm_df = filter_by_test_type(sm_df, test_types)

    bm.append(list(bm_df['url'])[:10])
    sm.append(list(sm_df['url'])[:10])

print(mean_recall_at_k(bm, outs, 10))
print(mean_average_precision_at_k(bm, outs, 10))
print(mean_recall_at_k(sm, outs, 10))
print(mean_average_precision_at_k(sm, outs, 10))


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

0.20012368583797155
0.11808390022675737
0.2586270871985158
0.1897675736961451


In [40]:
rrf = []

for x in queries:
    q = generate_and_parse_query(x)
    bm_df = filter_by_test_type(filter_by_duration(bm25_search(q['query'], 60), q['duration']), q['test_types'])
    sm_df = filter_by_test_type(filter_by_duration(semantic_search(q['query'], 60), q['duration']), q['test_types'])

    rrf.append(rrf_fusion(bm_df, sm_df))

print(mean_recall_at_k(rrf, outs, 10))
print(mean_average_precision_at_k(rrf, outs, 10))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

0.2941249226963513
0.18679138321995462


## Two-Stage (BM25-SBERT) Hybrid Retrieval System

In [47]:
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss
import pandas as pd
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
df = pd.read_pickle('df_bm25_tokenized.pkl')
model = SentenceTransformer('all-mpnet-base-v2')

def preprocess(text):
    text = text.lower()
    text = re.sub(r"[^a-z0-9\s]", " ", text)
    tokens = word_tokenize(str(text).lower())
    return [word for word in tokens if word.isalnum() and word not in stop_words]

bm25 = BM25Okapi(df['bm_tokens'].tolist())

def bm25_search(query, top_n=50):
    query_tokens = preprocess(query)
    scores = bm25.get_scores(query_tokens)
    top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_n]
    return df.iloc[top_indices].copy()

def semantic_rerank(bm25_results, query, top_k=10):
    texts = (bm25_results['title'] + ' ' + bm25_results['description']).tolist()

    doc_embeddings = model.encode(texts, normalize_embeddings=True)
    query_embedding = model.encode([query], normalize_embeddings=True)

    scores = np.dot(doc_embeddings, query_embedding[0])
    sorted_indices = np.argsort(scores)[::-1][:top_k]

    return bm25_results.iloc[sorted_indices]

def hybrid_search(query, maxdur, test_types, bm25_top_n=50, final_top_k=10):
    bm25_results = filter_by_test_type(filter_by_duration(bm25_search(query, top_n=bm25_top_n), maxdur), test_types)
    return semantic_rerank(bm25_results, query, top_k=final_top_k)

In [51]:
res = []
for x in queries:
    q = generate_and_parse_query(x)
    res.append(list(hybrid_search(q['query'], q['duration'], q['test_types'])['url']))

print(mean_recall_at_k(res, outs, 10))
print(mean_average_precision_at_k(res, outs, 10))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

0.2954854669140384
0.20915532879818596


# Result
I evaluated both the Reciprocal Rank Fusion (RRF) approach and a hybrid pipeline that reranks BM25 results using SBERT embeddings. While Recall@10 remained nearly identical (0.294 for RRF vs. 0.295 for the hybrid), the hybrid system achieved a noticeably higher MAP@10 (0.21 vs. 0.18), indicating better ranking precision for relevant assessments. However, due to the increased latency introduced by SBERT-based reranking, I chose not to use the hybrid system in the real-time deployment. Instead, I opted for the RRF approach as it offers a more practical trade-off between relevance and responsiveness.

# Conclusion
The final recommendation system achieves a Recall@10 of **~0.3** and a MAP@10 of **~0.21** on the official test set, despite some metric variance due to the small evaluation sample size. These results were obtained using a hybrid retrieval setup combining BM25 and SBERT-based semantic search, with additional filtering based on predicted test types and duration, and Reciprocal Rank Fusion (RRF) to blend the ranked results from both retrieval models.

While there’s room for improvement, the system performs reasonably well without any supervised training or fine-tuning and adheres to the input-output constraints provided.

## Potential Improvements
### Model Fine-tuning
Fine-tuning the embedding model (e.g., SBERT) on task-specific job description–assessment pairs could improve semantic relevance significantly. A domain-adapted model would likely outperform the current embeddings by a significant margin.

### Supervised Re-ranking
Training a lightweight ranking classifier model using known job-assessment pairs (if available) would allow the system to learn nuanced patterns not captured by unsupervised methods.

### Reinforcement Learning
In the absence of labeled training data, a reinforcement learning approach (e.g., using Recall@K or MAP@K as a reward signal) could iteratively optimize ranking quality over user feedback or evaluation samples.

### Better Query Parsing and Extraction
Improvements in parsing the job description (or query) to extract relevant constraints like required skills, duration, or seniority level more accurately could improve filtering and downstream match quality.

### Ensemble with LLM-based Reranker
Using a small LLM to re-rank the top-k retrieved assessments based on a similarity explanation or justification layer could improve final precision, though it comes with latency trade-offs.