## ***PG-AGI : AI/ML Intern Assignment (Pre-Hiring)-***



### **Assignment Overview -**

*You are tasked with developing an intelligent Hiring Assistant chatbot for "TalentScout," a fictional recruitment agency specializing in technology placements. The chatbot should assist in the initial screening of candidates by gathering essential information and posing relevant
technical questions based on the candidate's declared tech stack. This project will allow you to demonstrate your understanding of LLM’s*

----

### **Purpose of Prompting -**
The candidate will design prompts that guide the underlying language model to:

1. Gather Initial Candidate Information: Collect essential details such as name, contact
information, years of experience, and desired positions.
2. Generate Technical Questions: Based on the candidate’s specified tech stack (e.g.,
programming languages, frameworks, tools), generate relevant technical questions to
assess their proficiency.
3. Ensure Coherent and Context-Aware Interactions: Maintain the flow of conversation
and context to provide a seamless user experience.


---




#### *Prototype built-up by Deepak Kaura*

## **Importing Libraries/Frameworks**

In [1]:
!pip install txtai

Collecting txtai
  Downloading txtai-9.0.0-py3-none-any.whl.metadata (31 kB)
Collecting faiss-cpu>=1.7.1.post2 (from txtai)
  Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Downloading txtai-9.0.0-py3-none-any.whl (288 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m288.2/288.2 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu, txtai
Successfully installed faiss-cpu-1.12.0 txtai-9.0.0


In [None]:
!pip install langchain



In [None]:
!pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-core<2.0.0,>=0.3.75 (from langchain-community)
  Downloading langchain_core-0.3.75-py3-none-any.whl.metadata (5.7 kB)
Collecting requests<3,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading mypy_extensions-1.1.0-py3-none-any.w

In [None]:
!pip install googletrans

Collecting googletrans
  Downloading googletrans-4.0.2-py3-none-any.whl.metadata (10 kB)
Downloading googletrans-4.0.2-py3-none-any.whl (18 kB)
Installing collected packages: googletrans
Successfully installed googletrans-4.0.2


### *Loading the Quantize LLM model*

In [None]:

from txtai import LLM as TxtaiLLM

llm = TxtaiLLM("muranAI/gemma-3n-E4B-it-GGUF/gemma-3n-e4b-it-q2_k.gguf")



[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


gemma-3n-e4b-it-q2_k.gguf:   0%|          | 0.00/2.76G [00:00<?, ?B/s]

llama_kv_cache_unified_iswa: using full-size SWA cache (ref: https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)


## ***Deployment Gradio : TalentScout - Hiring Assistant Chatbot (Console Version)***

In [None]:
# =========================
# TalentScout - Hiring Assistant (English + Hindi + Sentiment Summary + Overall Impression)
# =========================

import re
import gradio as gr
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.schema import StrOutputParser
from txtai.pipeline import LLM as TxtaiLLM
from langchain.llms.base import LLM as LangChainLLM

# -------------------------
# Sentiment Analysis
# -------------------------
try:
    from transformers import pipeline
    _sentiment_pipeline = pipeline("sentiment-analysis")
    SENTIMENT_ENABLED = True
except Exception:
    _sentiment_pipeline = None
    SENTIMENT_ENABLED = False

def analyze_sentiment(text: str):
    if _sentiment_pipeline is None:
        return "NEUTRAL", 0.0
    try:
        res = _sentiment_pipeline(text)[0]
        return (res.get("label", "NEUTRAL"), float(res.get("score", 0.0)))
    except Exception:
        return "NEUTRAL", 0.0

def map_sentiment_label(label, lang="en"):
    if lang == "hi":
        mapping = {
            "POSITIVE": "आत्मविश्वासी",
            "NEGATIVE": "अनिश्चित",
            "NEUTRAL": "तटस्थ"
        }
    else:
        mapping = {
            "POSITIVE": "Confident",
            "NEGATIVE": "Uncertain",
            "NEUTRAL": "Neutral"
        }
    return mapping.get(label.upper(), "Neutral" if lang=="en" else "तटस्थ")

# -------------------------
# Wrap txtai LLM
# -------------------------
class TxtaiLangChainLLM(LangChainLLM):
    def _call(self, prompt: str, stop: list[str] | None = None) -> str:
        return llm(prompt)
llm = TxtaiLangChainLLM()

# -------------------------
# Prompt Templates
# -------------------------
tech_prompt_en = PromptTemplate.from_template("""
You are TalentScout Hiring Assistant.

Candidate's Tech Stack: {tech_stack}

TASK:
- For EACH technology in {tech_stack}, generate ONE unique, practical interview question.
- KEEP technology names in English.
- Write the question text in English.
- STRICTLY use this format:
<Index>. <Tech Name> → <Question>
Do not use ":" or "-" separators. Only use "→".
Do not add explanations or extra text.
""")

tech_prompt_hi = PromptTemplate.from_template("""
आप TalentScout Hiring Assistant हैं।

उम्मीदवार की टेक स्टैक: {tech_stack}

कार्य:
- {tech_stack} की प्रत्येक तकनीक के लिए एक अद्वितीय और व्यावहारिक इंटरव्यू प्रश्न बनाइए।
- टेक्नोलॉजी के नाम अंग्रेज़ी में ही रखें।
- प्रश्न हिंदी में लिखें।
- केवल इस प्रारूप का उपयोग करें:
<Index>. <Tech Name> → <Question>
":" या "-" का उपयोग न करें। केवल "→" का उपयोग करें।
अतिरिक्त व्याख्या या टेक्स्ट न जोड़ें।
""")

tech_chain_en = LLMChain(llm=llm, prompt=tech_prompt_en, output_parser=StrOutputParser())
tech_chain_hi = LLMChain(llm=llm, prompt=tech_prompt_hi, output_parser=StrOutputParser())

# -------------------------
# Sanitizer for consistency
# -------------------------
def enforce_one_question_per_tech(tech_stack: str, llm_output: str) -> str:
    techs = [t.strip() for t in re.split(r"[;,]+", tech_stack) if t.strip()]
    lines = [ln.strip() for ln in llm_output.splitlines() if ln.strip()]
    results = []
    for tech in techs:
        q = None
        for ln in lines:
            if (("→" in ln) or (":" in ln)) and tech.lower() in ln.lower():
                q = re.split(r"[→:]", ln, 1)[1].strip()
                break
        if not q:
            q = "What key skill demonstrates proficiency with this technology?"
        results.append(f"{tech} → {q}")
    return "\n".join(f"{i+1}. {row}" for i, row in enumerate(results))

def generate_questions(tech_stack: str, lang: str) -> str:
    if not tech_stack.strip():
        return "⚠️ No tech stack provided."
    if lang == "hi":
        raw = tech_chain_hi.run({"tech_stack": tech_stack})
    else:
        raw = tech_chain_en.run({"tech_stack": tech_stack})
    return enforce_one_question_per_tech(tech_stack, raw)

# -------------------------
# Sentiment Summary + Overall Impression
# -------------------------
def summarize_sentiments(sentiment_log, lang="en"):
    if not sentiment_log:
        return ""

    counts = {"Confident": 0, "Neutral": 0, "Uncertain": 0}

    if lang == "hi":
        summary = "📊 उम्मीदवार के उत्तरों का संक्षिप्त सारांश:\n\n"
        for q, label in sentiment_log:
            mapped = map_sentiment_label(label, "hi")
            summary += f"- {q} → {mapped}\n"
            if mapped == "आत्मविश्वासी": counts["Confident"] += 1
            elif mapped == "तटस्थ": counts["Neutral"] += 1
            else: counts["Uncertain"] += 1
    else:
        summary = "📊 Candidate Response Summary:\n\n"
        for q, label in sentiment_log:
            mapped = map_sentiment_label(label, "en")
            summary += f"- {q} → {mapped}\n"
            counts[mapped] += 1

    # Determine overall impression
    overall = max(counts, key=counts.get)
    if lang == "hi":
        if overall == "Confident":
            impression = "समग्र धारणा: मजबूत और आत्मविश्वासी"
        elif overall == "Neutral":
            impression = "समग्र धारणा: संतुलित"
        else:
            impression = "समग्र धारणा: स्पष्टता की आवश्यकता है"
    else:
        if overall == "Confident":
            impression = "Overall Impression: Strong & Confident"
        elif overall == "Neutral":
            impression = "Overall Impression: Balanced"
        else:
            impression = "Overall Impression: Needs Clarity"

    summary += f"\n📌 {impression}"
    return summary

# -------------------------
# Hiring Flow Questions
# -------------------------
QUESTIONS_EN = [
    "Full Name",
    "Email",
    "Phone",
    "Years of Experience",
    "Desired Position",
    "Location",
    "Tech Stack"
]

QUESTIONS_HI = [
    "पूरा नाम",
    "ईमेल",
    "फ़ोन नंबर",
    "अनुभव (सालों में)",
    "इच्छित पदवी",
    "स्थान",
    "टेक स्टैक"
]

WELCOME_LANG = (
    "👋 Hi! Welcome to TalentScout.\n\n"
    "Before we begin, please choose your preferred language:\n"
    "👉 English\n👉 Hindi"
)

# -------------------------
# Chatbot Logic
# -------------------------
def chatbot(user_text, history, candidate_info, idx, done, chat_lang, sentiment_log):
    # Step 1: Language choice
    if chat_lang is None:
        choice = user_text.strip().lower()
        if "hindi" in choice or "हिन्दी" in choice:
            chat_lang = "hi"
        else:
            chat_lang = "en"

        reply = "✅ Got it! We’ll continue in your preferred language.\n\n" + (
            f"📋 Please provide your {QUESTIONS_EN[0]}:" if chat_lang == "en" else f"📋 कृपया अपना {QUESTIONS_HI[0]} बताइए:"
        )
        history.append((user_text, reply))
        return history, candidate_info, 1, done, chat_lang, sentiment_log

    # Step 2: Exit
    if user_text.lower().strip() in {"exit", "quit", "bye"}:
        reply = "👋 Conversation ended. Thank you!"
        history.append((user_text, reply))
        return history, candidate_info, idx, True, chat_lang, sentiment_log

    # Step 3: Collect answers + sentiment
    if 1 <= idx <= len(QUESTIONS_EN):
        q_label = QUESTIONS_EN[idx - 1] if chat_lang == "en" else QUESTIONS_HI[idx - 1]
        candidate_info[q_label] = user_text.strip()
        label, _ = analyze_sentiment(user_text)
        sentiment_log.append((q_label, label))

    # Step 4: Ask next question
    if idx < len(QUESTIONS_EN):
        next_q = QUESTIONS_EN[idx] if chat_lang == "en" else QUESTIONS_HI[idx]
        idx += 1
        reply = f"📋 Please provide your {next_q}:" if chat_lang == "en" else f"📋 कृपया अपना {next_q} बताइए:"
        history.append((user_text, reply))
        return history, candidate_info, idx, done, chat_lang, sentiment_log

    # Step 5: Generate interview questions + summary
    if not done:
        done = True
        tech_stack = candidate_info.get(
            QUESTIONS_EN[-1] if chat_lang == "en" else QUESTIONS_HI[-1], ""
        )
        qs = generate_questions(tech_stack, chat_lang)
        summary = summarize_sentiments(sentiment_log, chat_lang)

        reply = (
            f"✅ Thank you {candidate_info.get(QUESTIONS_EN[0] if chat_lang=='en' else QUESTIONS_HI[0],'Candidate')}!\n\n"
            f"Here are your tailored interview questions:\n\n{qs}\n\n{summary}\n\n"
            "👋 We will contact you soon with the next steps."
        )
        history.append((user_text, reply))
        return history, candidate_info, idx, done, chat_lang, sentiment_log

    # Step 6: Already done
    reply = "✅ Session completed. Please click **Restart** for a new candidate."
    history.append((user_text, reply))
    return history, candidate_info, idx, done, chat_lang, sentiment_log

# -------------------------
# Reset
# -------------------------
def reset_chat():
    return {}, 0, False, [(None, WELCOME_LANG)], None, []

# -------------------------
# Gradio UI
# -------------------------
with gr.Blocks(
    theme="default",
    css="""
      .chat-container {max-width: 750px; margin: auto;}
      .chatbot-box {height: 520px !important;}
    """
) as demo:
    with gr.Column(elem_classes="chat-container"):
        gr.Markdown("## 🎯 TalentScout Hiring Assistant")
        gr.Markdown(f"**Features:** Sentiment: {'ON' if SENTIMENT_ENABLED else 'OFF'} · Multilingual: English & Hindi")

        chat = gr.Chatbot(elem_classes="chatbot-box")
        msg = gr.Textbox(placeholder="Type your answer here…", label="Your Response")
        with gr.Row():
            submit = gr.Button("Submit", variant="primary")
            restart = gr.Button("🔄 Restart")

        candidate_info = gr.State({})
        idx = gr.State(0)
        done = gr.State(False)
        chat_lang = gr.State(None)
        sentiment_log = gr.State([])

        def on_submit(user_text, history, candidate_info, idx, done, chat_lang, sentiment_log):
            history, candidate_info, idx, done, chat_lang, sentiment_log = chatbot(
                user_text, history, candidate_info, idx, done, chat_lang, sentiment_log
            )
            return history, "", candidate_info, idx, done, chat_lang, sentiment_log

        demo.load(
            fn=reset_chat,
            inputs=None,
            outputs=[candidate_info, idx, done, chat, chat_lang, sentiment_log]
        )

        submit.click(
            on_submit,
            [msg, chat, candidate_info, idx, done, chat_lang, sentiment_log],
            [chat, msg, candidate_info, idx, done, chat_lang, sentiment_log]
        )

        restart.click(
            reset_chat,
            outputs=[candidate_info, idx, done, chat, chat_lang, sentiment_log]
        ).then(lambda: "", None, msg)

if __name__ == "__main__":
    demo.launch()


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu
llama_kv_cache_unified_iswa: using full-size SWA cache (ref: https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
  tech_chain_en = LLMChain(llm=llm, prompt=tech_prompt_en, output_parser=StrOutputParser())
  chat = gr.Chatbot(elem_classes="chatbot-box")


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://e22374bd5766bcfbde.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


e.g for Hindi purpose :-

Name - चमन कुमार,  

Email - चा@अमेज़न.कॉम,  

Mobile Number - १२३४५६७,

Year of Experience - १.६,

Desired position - एआई डेवलपर,

Your Location - दिल्ली-एनसीआर,

Tech Stack - साइकीट-लर्न, एक्सजीबूस्ट, लाइटजीबीएम, कैटबूस्ट, पायकैरेट-ऑटोएमएल, फेसबुक प्रोफेट, बर्ट, लैंगचेन, टेक्स्टएआई