# **Multi-Level Prompting Experiments for AI Resume Screening**

## **Overview**
This notebook explores **multi-level automation in AI resume screening** using **Meta-Prompting** and **Multi-Step Iterative Evaluation**. The objective is to enhance **fairness, accuracy, and structured decision-making** in AI-driven hiring.

## **Experimental Setup**
The notebook tests **two key multi-level prompt engineering techniques**:
1. **Meta-Prompting (Level-1 Automation)** – AI **generates an optimized prompt** before performing resume evaluation.
2. **Multi-Step Iterative Resume Evaluation (Level-2 Automation)** – AI **performs multiple evaluation cycles**, refining hiring decisions.

### **Key Features of This Experiment**
- **Automated prompt generation**: Meta-Prompting ensures AI **receives structured guidance** before analyzing resumes.
- **Iterative resume evaluation**: AI **self-refines decisions** using gap analysis and structured scoring.
- **Optimized for performance**: Tests **three models** (Mistral, Llama3, Phi3) and **multiple parameter variations**.

## **Optimized Parameters**
To ensure **efficiency and reliability**, we use:
- **Temperature**: `{0.5, 1.0}` – Testing **low and high randomness**.
- **Context Size**: `1024` – Optimized for **processing detailed resumes**.
- **Output Length**: `300` tokens – Ensures AI **generates structured recommendations**.

## **Expected Insights**
- **How does AI-generated prompt quality impact hiring decisions?**
- **Can multi-step evaluations improve accuracy over a single-step analysis?**
- **Which configurations balance efficiency and hiring decision accuracy?**

**Next Steps:** Analyze test results to identify **optimal AI resume screening strategies**.

In [1]:
import time
import pandas as pd
from _pipeline import create_payload, model_req

# Define models to test (covering different architectures)
models_to_test = ["mistral", "llama3", "phi3"]

# Parameter variations
temperature_values = [0.5, 1.0]  # Test low and high randomness
context_sizes = [1024]  # Single, optimal context size
num_predict_values = [300]  # Single, optimal output length

# **Level-1 Automation: Meta-Prompting for Resume Screening**
meta_prompt = """
    You are an AI prompt engineer. Before performing resume screening, generate the **best possible prompt** 
    for an AI system to analyze and evaluate a candidate's resume based on a given job description.

    **Example Prompt Format:**  
    "Analyze the following job description and resume. Extract key job requirements, candidate qualifications, 
    compare them, and provide a structured match score along with hiring recommendations."

    Now, generate a high-quality prompt for resume screening and evaluation for: 
    **{job_title}**
"""

# **Level-2 Automation: Multi-Step Iterative Resume Evaluation**
multi_step_prompt = """
    You are an AI hiring assistant performing **iterative resume screening**. 

    **Step 1: Initial Candidate Evaluation**
    - Extract key job requirements.
    - Extract candidate qualifications.
    - Identify skill matches and gaps.

    **Step 2: Self-Reflection & Gap Analysis**
    - Identify missing or vague areas.
    - Determine if further clarifications are needed (e.g., missing certifications, unclear experience level).

    **Step 3: Resume Refinement & Decision Making**
    - Improve initial analysis by filling gaps.
    - Optimize scoring for accuracy and fairness.
    - Provide a structured match score (0-100%) and hiring recommendation.

    **Job Description:** {job_description}
    **Candidate Resume:** {resume_text}
"""

# Define test cases
test_cases = [
    {
        "name": "Meta-Prompting - AI Generates Resume Evaluation Prompt",
        "prompt": meta_prompt.format(job_title="Senior Data Scientist - AI & Machine Learning"),
    },
    {
        "name": "Multi-Step Iterative Resume Evaluation",
        "prompt": multi_step_prompt.format(
            job_description="""
                We are hiring a Senior Data Scientist with expertise in AI, Machine Learning, and Python. 
                The candidate must have at least 5 years of experience and strong statistical modeling skills. 
                Experience with cloud platforms (AWS, GCP) and NLP is a plus.
            """,
            resume_text="""
                John Doe is a Data Scientist with 6 years of experience. Skilled in Python, Machine Learning, and AI. 
                He has worked on predictive modeling, NLP applications, and statistical analysis. 
                Familiar with AWS, but lacks experience with GCP.
            """
        ),
    }
]

# Function to run test cases across multiple models & parameter variations
def run_tests(test_cases, models, temperatures, context_sizes, num_predict_values):
    results = []

    for test in test_cases:
        for model in models:
            for temp in temperatures:
                for ctx_size in context_sizes:
                    for num_predict in num_predict_values:
                        print(f"\n🚀 Running test: {test['name']} | Model: {model} | Temp: {temp} | Context: {ctx_size} | Predict: {num_predict}...\n")

                        # Create payload
                        payload = create_payload(
                            target="ollama",
                            model=model,
                            prompt=test["prompt"],
                            temperature=temp,
                            num_ctx=ctx_size,
                            num_predict=num_predict
                        )

                        # Measure execution time
                        start_time = time.time()
                        time_taken, response = model_req(payload=payload)
                        end_time = time.time()

                        # Store results
                        results.append({
                            "Test Case": test["name"],
                            "Model": model,
                            "Temperature": temp,
                            "Context Size": ctx_size,
                            "Num Predict": num_predict,
                            "Time Taken (s)": round(end_time - start_time, 3),
                            "Response Sample": response
                        })

                        # Print result summary
                        print(f"\n✅ **Test Completed: {test['name']} | Model: {model}**")
                        print(f"🕒 **Time Taken:** {round(end_time - start_time, 3)} seconds")
                        print(f"📌 **Response:** {response}\n")  # Print full response

    return results

# Run all test cases
test_results = run_tests(test_cases, models_to_test, temperature_values, context_sizes, num_predict_values)
test_results


🚀 Running test: Meta-Prompting - AI Generates Resume Evaluation Prompt | Model: mistral | Temp: 0.5 | Context: 1024 | Predict: 300...

{'model': 'mistral', 'prompt': '\n    You are an AI prompt engineer. Before performing resume screening, generate the **best possible prompt** \n    for an AI system to analyze and evaluate a candidate\'s resume based on a given job description.\n\n    **Example Prompt Format:**  \n    "Analyze the following job description and resume. Extract key job requirements, candidate qualifications, \n    compare them, and provide a structured match score along with hiring recommendations."\n\n    Now, generate a high-quality prompt for resume screening and evaluation for: \n    **Senior Data Scientist - AI & Machine Learning**\n', 'stream': False, 'options': {'temperature': 0.5, 'num_ctx': 1024, 'num_predict': 300}}

✅ **Test Completed: Meta-Prompting - AI Generates Resume Evaluation Prompt | Model: mistral**
🕒 **Time Taken:** 36.067 seconds
📌 **Response:**  "

[{'Test Case': 'Meta-Prompting - AI Generates Resume Evaluation Prompt',
  'Model': 'mistral',
  'Temperature': 0.5,
  'Context Size': 1024,
  'Num Predict': 300,
  'Time Taken (s)': 36.067,
  'Response Sample': ' "Evaluate the provided Senior Data Scientist - AI & Machine Learning job description and candidate\'s resume. Identify crucial job requirements such as technical skills (e.g., programming languages, machine learning frameworks), academic background, years of experience, specific projects or achievements relevant to AI/ML, and any other essential qualifications. Compare these requirements with the candidate\'s qualifications and experiences. Provide a structured comparison in the form of a match score for each requirement (e.g., 100% match, 80%, 50%) and offer hiring recommendations based on the overall match score and any other relevant factors."'},
 {'Test Case': 'Meta-Prompting - AI Generates Resume Evaluation Prompt',
  'Model': 'mistral',
  'Temperature': 1.0,
  'Context 