# SHL – Assessment Recommendation Engine (Option 1)

This notebook builds a **hybrid recommender** that suggests the most relevant SHL assessments for a given **job title / skills / job description**.

**Outputs**
- Top-K recommended assessments
- Similarity score
- Short explanation (“why recommended”)



## 0) Install & imports

In [1]:
# If running locally, install deps once:
# !pip install pandas numpy scikit-learn requests beautifulsoup4 lxml

import re
import json
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


### Add category + skills (light manual tagging)

Even if you scrape name/description, **add `category` and `skills` yourself**.
This boosts recommendation quality and shows you understand SHL use-cases.



Manual CSV (recommended if you want full control)

Create `data/catalog.csv` with 30–80 assessments. Use SHL Product Catalog to pick items.
Then load it here.


In [3]:
import os

# Path where you'll keep catalog.csv
CATALOG_PATH = "data/catalog.csv"

os.makedirs("data", exist_ok=True)

# If you built 'df' from scraping, save it as starter CSV
if 'df' in globals() and isinstance(df, pd.DataFrame) and len(df)>0:
    df.to_csv(CATALOG_PATH, index=False)
    print("Saved starter catalog to", CATALOG_PATH)

# Load catalog
catalog = pd.read_csv(CATALOG_PATH)
catalog.head()


Unnamed: 0,assessment_id,name,url,description,job_levels,category,skills
0,A001,Realistic Job and Culture Previews (RJP),https://www.shl.com/products/assessments/behav...,Scenario-based previews that help candidates u...,All Levels,Behavioral,"role preview, culture fit, engagement, expecta..."
1,A002,Situational Judgement Tests (SJT),https://www.shl.com/products/assessments/behav...,Interactive scenarios to assess judgement and ...,All Levels,Behavioral,"situational judgement, decision making, behavi..."
2,A003,Universal Competency Framework (UCF),https://www.shl.com/products/assessments/behav...,Framework mapping role requirements to behavio...,All Levels,Behavioral,"competency framework, behavioral competencies,..."
3,A004,Virtual Assessment & Development Centers,https://www.shl.com/products/assessments/asses...,End-to-end digital assessment and development ...,All Levels,Virtual Assessment Center,"assessment center, simulations, dashboards, ca..."
4,A005,Occupational Personality Questionnaire (OPQ),https://www.shl.com/products/assessments/perso...,Personality assessment measuring working prefe...,All Levels,Personality,"work preferences, behavioral style, teamwork, ..."


## 2) Build the recommender (TF‑IDF + cosine similarity + small rule boosts)

In [4]:
def normalize_text(s: str) -> str:
    s = s or ""
    s = re.sub(r"\s+", " ", s.strip())
    return s

# Combine fields into one searchable document
catalog = catalog.fillna("")
catalog["doc"] = (
    catalog["name"].astype(str) + " \n" +
    catalog["description"].astype(str) + " \n" +
    catalog["skills"].astype(str) + " \n" +
    catalog["category"].astype(str) + " \n" +
    catalog["job_levels"].astype(str)
).map(normalize_text)

vectorizer = TfidfVectorizer(
    stop_words="english",
    ngram_range=(1,2),
    min_df=1,
    max_features=40000
)

X = vectorizer.fit_transform(catalog["doc"])  # matrix of assessments

def rule_boost(query: str, row: pd.Series) -> float:
    """Small additive boost based on simple heuristics."""
    q = (query or "").lower()
    cat = (row.get("category", "") or "").lower()
    skills = (row.get("skills", "") or "").lower()
    lvl = (row.get("job_levels", "") or "").lower()
    
    boost = 0.0
    
    # Leadership / seniority hints
    if any(k in q for k in ["manager", "lead", "leadership", "stakeholder", "strategy"]):
        if any(k in cat for k in ["behavior", "sjt", "personality"]):
            boost += 0.03
    
    # Data / analytics hints
    if any(k in q for k in ["data", "analytics", "sql", "python", "statistics", "ml", "machine learning"]):
        if any(k in skills for k in ["problem", "reasoning", "analytical", "numerical", "data"]):
            boost += 0.03
        if "cognitive" in cat:
            boost += 0.02
    
    # Communication / language hints
    if any(k in q for k in ["english", "communication", "writing", "grammar"]):
        if "language" in cat or "english" in skills:
            boost += 0.05
    
    # Entry level hints
    if any(k in q for k in ["fresher", "entry", "junior", "graduate"]):
        if "entry" in lvl.lower():
            boost += 0.03
    
    return boost

def top_contributing_terms(query: str, row_doc: str, top_n: int = 6):
    """Explain recommendation via overlapping TF-IDF terms (simple, fast)."""
    q_vec = vectorizer.transform([query])
    r_vec = vectorizer.transform([row_doc])
    # elementwise product -> contributions
    contrib = q_vec.multiply(r_vec)
    if contrib.nnz == 0:
        return []
    # get top indices by weight
    coo = contrib.tocoo()
    pairs = sorted(zip(coo.col, coo.data), key=lambda x: x[1], reverse=True)[:top_n]
    terms = [vectorizer.get_feature_names_out()[i] for i,_ in pairs]
    return terms

def recommend(job_title: str = "", skills=None, job_description: str = "", top_k: int = 10):
    skills = skills or []
    query = " ".join([job_title, job_description, " ".join(skills)]).strip()
    if not query:
        raise ValueError("Provide at least job_title or skills or job_description.")
    
    q_vec = vectorizer.transform([query])
    sims = cosine_similarity(q_vec, X).ravel()
    
    # Apply rule boosts
    boosts = np.array([rule_boost(query, catalog.iloc[i]) for i in range(len(catalog))])
    final = sims + boosts
    
    idx = np.argsort(-final)[:top_k]
    out = catalog.iloc[idx][["assessment_id","name","url","category","job_levels","skills","description"]].copy()
    out["score"] = final[idx]
    out["why"] = [", ".join(top_contributing_terms(query, catalog.iloc[i]["doc"])) for i in idx]
    return out.reset_index(drop=True)

# Quick sanity check
recommend(job_title="Data Analyst Intern", skills=["SQL","Excel","Python","statistics"], job_description="" , top_k=5)


Unnamed: 0,assessment_id,name,url,category,job_levels,skills,description,score,why
0,A008,Technical Skills Assessments,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"technical skills, cloud, data engineering, devops",MCQ-based assessments covering 200+ technical ...,0.17412,data
1,A007,SHL Cognitive Assessments,https://www.shl.com/products/assessments/cogni...,Cognitive,All Levels,"cognitive ability, reasoning, learning potential",Measures learning potential reasoning and abil...,0.05,
2,A009,Coding Simulations,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"coding, algorithms, problem solving",Real-life coding problems solved in an online ...,0.03,
3,A018,Job-Focused Assessment – Professional Roles,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Mid Level,"professional skills, problem solving, reskilling",Evaluates job-specific skills and reskilling p...,0.03,
4,A001,Realistic Job and Culture Previews (RJP),https://www.shl.com/products/assessments/behav...,Behavioral,All Levels,"role preview, culture fit, engagement, expecta...",Scenario-based previews that help candidates u...,0.0,


## 3) Demo queries (put 3–5 of these in your README screenshots)

In [5]:
queries = [
    dict(job_title="Research AI Intern", skills=["NLP","machine learning","python","experimentation"],
         job_description="Build models, evaluate trade-offs, explain results", top_k=10),
    dict(job_title="Customer Support Associate", skills=["English communication","typing","attention to detail"],
         job_description="Handle chats and emails, quality writing", top_k=10),
    dict(job_title="Sales Executive", skills=["communication","negotiation","customer handling"],
         job_description="Meet targets, handle objections", top_k=10)
]

for q in queries:
    print("\n===", q['job_title'], "===")
    display(recommend(**q).head(5))



=== Research AI Intern ===


Unnamed: 0,assessment_id,name,url,category,job_levels,skills,description,score,why
0,A007,SHL Cognitive Assessments,https://www.shl.com/products/assessments/cogni...,Cognitive,All Levels,"cognitive ability, reasoning, learning potential",Measures learning potential reasoning and abil...,0.226355,learning
1,A017,Job-Focused Assessment – Graduate Roles,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"learning agility, potential, adaptability",Measures learning agility and future potential...,0.193801,learning
2,A010,Language Proficiency Tests,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"spoken english, written english, grammar, fluency",AI-powered assessments of spoken and written l...,0.111999,ai
3,A008,Technical Skills Assessments,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"technical skills, cloud, data engineering, devops",MCQ-based assessments covering 200+ technical ...,0.03,
4,A009,Coding Simulations,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"coding, algorithms, problem solving",Real-life coding problems solved in an online ...,0.03,



=== Customer Support Associate ===


Unnamed: 0,assessment_id,name,url,category,job_levels,skills,description,score,why
0,A010,Language Proficiency Tests,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"spoken english, written english, grammar, fluency",AI-powered assessments of spoken and written l...,0.192152,english
1,A014,Job-Focused Assessment – Contact Center,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer service, resilience, communication",Predicts success in contact center roles using...,0.12845,"communication, customer"
2,A016,Job-Focused Assessment – Manufacturing,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"safety, reliability, quality focus",Assesses reliability safety awareness and qual...,0.120558,quality
3,A011,Call Center Simulations,https://www.shl.com/products/assessments/skill...,Skills & Simulations,Entry Level,"contact center, customer service, task simulation",Simulated contact center environment to assess...,0.105242,"quality, customer"
4,A015,Job-Focused Assessment – Retail,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer focus, adaptability, teamwork",Evaluates customer focus and adaptability for ...,0.097993,customer



=== Sales Executive ===


Unnamed: 0,assessment_id,name,url,category,job_levels,skills,description,score,why
0,A014,Job-Focused Assessment – Contact Center,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer service, resilience, communication",Predicts success in contact center roles using...,0.211262,"communication, customer"
1,A015,Job-Focused Assessment – Retail,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer focus, adaptability, teamwork",Evaluates customer focus and adaptability for ...,0.16117,customer
2,A011,Call Center Simulations,https://www.shl.com/products/assessments/skill...,Skills & Simulations,Entry Level,"contact center, customer service, task simulation",Simulated contact center environment to assess...,0.077528,customer
3,A010,Language Proficiency Tests,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"spoken english, written english, grammar, fluency",AI-powered assessments of spoken and written l...,0.05,
4,A017,Job-Focused Assessment – Graduate Roles,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"learning agility, potential, adaptability",Measures learning agility and future potential...,0.0,


## 4) Export results (optional)

Some recruiters like a downloadable CSV of recommendations for sample roles.


In [6]:
out = recommend(job_title="Data Analyst Intern", skills=["SQL","Excel","Python","statistics"], top_k=10)
out.to_csv("sample_recommendations.csv", index=False)
print("Saved: sample_recommendations.csv")
out


Saved: sample_recommendations.csv


Unnamed: 0,assessment_id,name,url,category,job_levels,skills,description,score,why
0,A008,Technical Skills Assessments,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"technical skills, cloud, data engineering, devops",MCQ-based assessments covering 200+ technical ...,0.17412,data
1,A007,SHL Cognitive Assessments,https://www.shl.com/products/assessments/cogni...,Cognitive,All Levels,"cognitive ability, reasoning, learning potential",Measures learning potential reasoning and abil...,0.05,
2,A009,Coding Simulations,https://www.shl.com/products/assessments/skill...,Skills & Simulations,All Levels,"coding, algorithms, problem solving",Real-life coding problems solved in an online ...,0.03,
3,A018,Job-Focused Assessment – Professional Roles,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Mid Level,"professional skills, problem solving, reskilling",Evaluates job-specific skills and reskilling p...,0.03,
4,A001,Realistic Job and Culture Previews (RJP),https://www.shl.com/products/assessments/behav...,Behavioral,All Levels,"role preview, culture fit, engagement, expecta...",Scenario-based previews that help candidates u...,0.0,
5,A017,Job-Focused Assessment – Graduate Roles,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"learning agility, potential, adaptability",Measures learning agility and future potential...,0.0,
6,A016,Job-Focused Assessment – Manufacturing,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"safety, reliability, quality focus",Assesses reliability safety awareness and qual...,0.0,
7,A015,Job-Focused Assessment – Retail,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer focus, adaptability, teamwork",Evaluates customer focus and adaptability for ...,0.0,
8,A014,Job-Focused Assessment – Contact Center,https://www.shl.com/solutions/talent-acquisiti...,Job Focused,Entry Level,"customer service, resilience, communication",Predicts success in contact center roles using...,0.0,
9,A013,Job-Focused Assessments (JFA),https://www.shl.com/products/assessments/job-f...,Job Focused,All Levels,"job readiness, role fit, performance prediction",Short job-relevant assessments predicting role...,0.0,


## 5) (Optional but strong) FastAPI service
Automated tests often hit an API endpoint. You can deploy to Render/Railway.
If you skip deployment, still include this file in repo for completeness.


In [1]:
# This cell writes a minimal FastAPI app to src/api.py
# Run locally:
#   uvicorn src.api:app --reload --port 8000

api_code = r'''
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd
import numpy as np
import re

import faiss
from sentence_transformers import SentenceTransformer
from pathlib import Path

# ---------- Paths ----------
ROOT = Path(__file__).resolve().parents[1]
INDEX_PATH = ROOT / "data" / "catalog.index"
META_PATH  = ROOT / "data" / "catalog.pkl"

# ---------- App ----------
app = FastAPI(title="SHL Assessment Recommendation Engine (RAG)")

class RecommendRequest(BaseModel):
    job_title: str = ""
    skills: list[str] = []
    job_description: str = ""
    top_k: int = 10

def normalize_text(s: str) -> str:
    s = s or ""
    return re.sub(r"\s+", " ", s.strip())

# ---------- Load catalog + embedder + FAISS ----------
catalog = pd.read_pickle(META_PATH).fillna("")
embedder = SentenceTransformer("all-MiniLM-L6-v2")
index = faiss.read_index(str(INDEX_PATH))

def rule_boost(query: str, row: pd.Series) -> float:
    q = (query or "").lower()
    cat = (row.get("category", "") or "").lower()
    skills = (row.get("skills", "") or "").lower()
    lvl = (row.get("job_levels", "") or "").lower()

    boost = 0.0

    # ---------- Leadership / manager intent ----------
    if any(k in q for k in ["manager", "lead", "leadership", "stakeholder", "strategy"]):
        if any(k in cat for k in ["behavior", "sjt", "personality", "job focused"]):
            boost += 0.08
        if any(k in lvl for k in ["senior", "manager"]):
            boost += 0.06

    # ---------- AI / Tech intent ----------
    ai_intent = any(k in q for k in ["ai", "ml", "machine learning", "nlp", "deep learning", "data scientist", "research"])
    tech_intent = any(k in q for k in ["python", "sql", "coding", "programming", "developer", "engineer", "data", "analytics", "statistics"])

    if ai_intent or tech_intent:
        # Strong boosts: what we WANT on top for AI intern roles
        if any(k in skills for k in ["coding", "python", "algorithms", "machine learning", "data science", "data engineering", "problem solving"]):
            boost += 0.35

        if "cognitive" in cat:
            boost += 0.25

        if ("skills" in cat) or ("simulation" in cat):
            boost += 0.15

        if any(k in lvl for k in ["entry", "graduate", "intern", "junior"]):
            boost += 0.12

        # Strong penalties: what we DON'T want on top for AI intern roles
        if any(k in skills for k in ["business skills", "computer literacy", "workplace productivity"]):
            boost -= 0.45

        if any(k in cat for k in ["behavioral", "personality", "virtual assessment center"]):
            boost -= 0.30

        if any(k in lvl for k in ["senior", "manager"]):
            boost -= 0.60

        # Language tests: only if explicitly asked
        if ("language" in cat) or ("english" in skills):
            if not any(k in q for k in ["english", "communication", "writing", "grammar", "spoken"]):
                boost -= 0.35

    # ---------- Explicit language intent ----------
    if any(k in q for k in ["english", "communication", "writing", "grammar", "spoken"]):
        if ("language" in cat) or ("english" in skills):
            boost += 0.20

    # ---------- Entry-level intent ----------
    if any(k in q for k in ["fresher", "entry", "junior", "graduate", "intern"]):
        if "entry" in lvl:
            boost += 0.06
        if "graduate" in lvl:
            boost += 0.05

    return boost


@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/recommend")
def recommend(req: RecommendRequest):
    query = " ".join([req.job_title, req.job_description, " ".join(req.skills)]).strip()
    if not query:
        return {"error": "Provide job_title or job_description or skills"}

    # Embed query (cosine similarity via normalized embeddings + inner product index)
    q_emb = embedder.encode([query], normalize_embeddings=True)
    q_emb = np.asarray(q_emb, dtype="float32")

    k = max(1, int(req.top_k))
    scores, idxs = index.search(q_emb, k)  # scores: (1,k), idxs: (1,k)

    results = []
    for rank, i in enumerate(idxs[0], 1):
        row = catalog.iloc[int(i)].to_dict()

        boost = rule_boost(query, pd.Series(row))
        final_score = float(scores[0][rank - 1] + boost)

        desc = row.get("description", "") or ""
        evidence = desc[:220] + ("..." if len(desc) > 220 else "")

        results.append({
            "rank": rank,
            "assessment_id": row.get("assessment_id",""),
            "name": row.get("name",""),
            "url": row.get("url",""),
            "category": row.get("category",""),
            "job_levels": row.get("job_levels",""),
            "skills": row.get("skills",""),
            "score": round(final_score, 4),
            "evidence": evidence
        })
        results = sorted(results, key=lambda x: x["score"], reverse=True)
        for j, r in enumerate(results, 1):
            r["rank"] = j

    return {"query": query, "results": results}

@app.post("/recommend/pretty")
def recommend_pretty(req: RecommendRequest):
    out = recommend(req)
    if "results" not in out:
        return out

    lines = []
    for r in out["results"]:
        lines.append(
            f"{r['rank']}. {r['name']} ({r['category']} | {r['job_levels']})\n"
            f"   Evidence: {r.get('evidence','')}\n"
            f"   Link: {r['url']}\n"
        )

    return {
        "query": out["query"],
        "summary": "\n".join(lines),
        "results": out["results"]
    }

    
'''
import os
os.makedirs("src", exist_ok=True)
with open("src/api.py", "w", encoding="utf-8") as f:
    f.write(api_code)

print("Wrote src/api.py")


Wrote src/api.py


In [5]:
#Note run this code after running the server
import pandas as pd
import requests

#INPUT_XLSX = "Gen_AI Dataset.xlsx"      # agar notebook root me hai
INPUT_XLSX = "data/Gen_AI Dataset.xlsx"

API_URL = "http://127.0.0.1:8000/recommend"
TOP_K = 3

df = pd.read_excel(INPUT_XLSX)

out_rows = []

for q in df["Query"].astype(str).tolist():
    payload = {
        "job_title": q,
        "skills": [],
        "job_description": "",
        "top_k": TOP_K
    }
    resp = requests.post(API_URL, json=payload, timeout=60)
    data = resp.json()

    results = data.get("results", [])[:TOP_K]

    # ensure always 3 rows per query
    while len(results) < TOP_K:
        results.append({"url": ""})

    for r in results:
        out_rows.append({
            "Query": q,
            "Assessment_url": r.get("url", "")
        })

submission_df = pd.DataFrame(out_rows)
submission_df.to_csv("submission.csv", index=False)

submission_df.head(10)


Unnamed: 0,Query,Assessment_url
0,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
1,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/behav...
2,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
3,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
4,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/behav...
5,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
6,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
7,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/behav...
8,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...
9,I am hiring for Java developers who can also c...,https://www.shl.com/products/assessments/skill...


## 6) README checklist (copy-paste)

- Problem: Recommend SHL assessments for a role/JD.
- Data: Public SHL product catalog URLs → structured catalogue CSV.
- Method: TF‑IDF + cosine similarity + light rule boosts.
- Explainability: top overlapping terms + skills match.
- How to run:
  - notebook
  - (optional) API: uvicorn src.api:app --reload
- Example outputs (screenshots / sample JSON)
- Limitations + future work:
  - personalization with feedback
  - better embeddings
  - fairness considerations
