# JobSage: AI Mock Interview Coach with Granular Feedback & Job Recommendations

**Problem:** Standard interview prep is often generic, lacks deep, actionable feedback, and can feel unengaging.
**Solution:** This notebook implements JobSage, an AI-powered mock interview simulator designed for roles like FAANG Data Science. It provides granular feedback (content, clarity, depth), dynamic follow-up questions, skill tracking, and **gamification (points, badges)**. Users receive benchmarking against simulated norms, targeted study recommendations, potential **resume tweak suggestions**, and relevant job matches based on CV analysis and performance.

**Capstone Project for Gen AI Intensive Course 2025Q1**

In [1]:
# ==================================================
# Cell 1: Setup & Configuration (Gemini-Only Version)
# ==================================================

# --- 1. Installations ---
print("Installing necessary libraries...")
!pip install sentence-transformers pandas numpy scipy google-generativeai --quiet
print("Libraries installed.")

# --- 2. Imports ---
print("Importing libraries...")
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util
from scipy.stats import norm
import json
import os
import warnings
from kaggle_secrets import UserSecretsClient
import google.generativeai as genai
print("Libraries imported.")

# --- 3. API Key Setup ---
print("Setting up API keys...")
gemini_model = None
embedding_model = None

try:
    user_secrets = UserSecretsClient()
    # Using the exact name you specified: "GOOGLE_API_KEY"
    # Ensure this secret name in Kaggle Secrets contains your Gemini key.
    GEMINI_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")
    print("Gemini API key retrieved successfully from Kaggle Secrets.")

    genai.configure(api_key=GEMINI_API_KEY)
    # Consider trying 'gemini-pro' if 'gemini-1.5-flash' gives suboptimal results
    gemini_model = genai.GenerativeModel('gemini-1.5-flash')
    print("Gemini client configured and model 'gemini-1.5-flash' initialized.")

except Exception as e:
    print(f"ERROR retrieving GOOGLE_API_KEY or initializing Gemini client: {e}")
    print("Please ensure 'GOOGLE_API_KEY' is added as a Kaggle Secret with the correct key.")
    print("AI features requiring Gemini will fail.")

# --- 4. Initialize Embedding Model ---
print("Loading embedding model (all-MiniLM-L6-v2)... This may take a minute.")
try:
    embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
    print("Embedding model loaded successfully.")
except Exception as e:
    print(f"ERROR loading embedding model: {e}")
    embedding_model = None

# --- 5. Constants ---
FAANG_MEAN_SCORE = 7.5
FAANG_STD_DEV = 1.0
PASSING_THRESHOLD = 7.0
print("Constants defined.")

print("-" * 30)
print("SETUP CELL COMPLETE (Gemini Only)")
print("-" * 30)

Installing necessary libraries...
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.5/207.5 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.1/21.1 MB[0m [31m72.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. Th

2025-04-11 18:50:31.108418: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744397431.378993      13 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744397431.457754      13 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Libraries imported.
Setting up API keys...
Gemini API key retrieved successfully from Kaggle Secrets.
Gemini client configured and model 'gemini-1.5-flash' initialized.
Loading embedding model (all-MiniLM-L6-v2)... This may take a minute.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding model loaded successfully.
Constants defined.
------------------------------
SETUP CELL COMPLETE (Gemini Only)
------------------------------


## 2. Data Loading / Preparation

This section sets up the core data needed for JobSage to function. We'll create pandas DataFrames (structured tables) to hold:

1.  **Questions (`questions_df`):** Contains interview questions, hints about ideal answers (keywords/points), and associated skill tags. This powers the mock interview content.
2.  **Job Listings (`jobs_df`):** A sample list of job openings with titles, companies, required skills, descriptions, and application links. This is used for the job recommendation feature. We will also pre-calculate embeddings for the job descriptions here for faster lookup later.
3.  **Study Recommendations (`study_rec_df`):** Maps specific skills to actionable study advice. Used to provide targeted guidance after the interview.
4.  **Leaderboard (`leaderboard_df`):** A simulated leaderboard with fictional user scores to provide context for the user's performance (gamification).

In [2]:
# ==================================================
# Cell 4: Data Loading / Preparation & Pre-computation
# ==================================================
# Re-import pandas & numpy just in case kernel restarted, though usually not needed
import pandas as pd
import numpy as np
try:
    from IPython.display import display
except ImportError:
    display = print

print("Creating placeholder DataFrames...")

# --- 1. Placeholder Question Bank ---
# Using structure compatible with our functions
questions_data = {
    'question_id': [1, 2, 3, 4, 5],
    'question_text': [
        "Explain the difference between L1 and L2 regularization in machine learning models.",
        "How would you handle missing data in a dataset?",
        "Design an A/B testing framework for a new feature on an e-commerce website.",
        "Explain the bias-variance tradeoff in machine learning.",
        "Write a SQL query to find the top 3 departments with the highest average salary."
    ],
    'ideal_answer_points': [ # Using the text descriptions as ideal points for embedding/LLM eval
        "L1 regularization (Lasso) adds absolute coefficient values to loss, promoting sparsity/feature selection. L2 regularization (Ridge) adds squared coefficient values, shrinking coefficients but keeping most non-zero, handling multicollinearity.",
        "Methods include Deletion (listwise/pairwise if MCAR/small loss), Imputation (mean/median/mode - simple, affects variance; regression/KNN - more accurate), using algorithms tolerant to NaNs, or creating indicator variables. Choice depends on missingness pattern (MCAR/MAR/MNAR) and data amount.",
        "Define clear metrics (conversion, revenue). Calculate sample size via power analysis. Randomize users into control/treatment. Implement tracking. Run for fixed duration (e.g., 2 weeks). Analyze with statistical tests (t-test/chi-square). Consider novelty effects, seasonality, contamination.",
        "Tradeoff between model simplicity (high bias/underfitting - misses patterns) and complexity (high variance/overfitting - fits noise). Goal is optimal balance minimizing total error (bias^2 + variance + irreducible error) for good generalization.",
        "SELECT d.department_name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.department_id GROUP BY d.department_name ORDER BY avg_salary DESC LIMIT 3;"
    ],
    'skill_tags': [ # Aligning with categories used later if possible
        "Machine Learning,Regularization,Model Tuning",
        "Data Preprocessing,Data Cleaning",
        "Experimental Design,Statistics,A/B Testing",
        "Machine Learning,Model Tuning,Evaluation",
        "SQL,Data Analysis"
    ],
     'category': [ # Adding category from your example
        "Machine Learning", "Data Preprocessing", "Experimental Design", "Machine Learning", "SQL"
     ],
     'difficulty': [ # Adding difficulty from your example
        "Medium", "Easy", "Hard", "Medium", "Medium"
     ]
}
questions_df = pd.DataFrame(questions_data)
print("\n--- Questions DataFrame ---")
display(questions_df)

# --- 2. Placeholder Job Listings ---
# Using structure compatible with our functions
jobs_data = {
    'job_id': [1, 2, 3, 4, 5],
    'title': ["Data Scientist", "ML Engineer", "Data Analyst", "Research Scientist", "Data Engineer"],
    'company': ["TechCorp", "AIStartup", "FinTech Inc", "PharmaLabs", "BigData Co"],
    'description': [ # Used for embedding
        "Looking for a data scientist with strong ML skills to work on recommendation algorithms. Requires Python, SQL, Machine Learning, Statistics, A/B Testing.",
        "Develop and deploy machine learning models for computer vision applications using PyTorch, TensorFlow, MLOps, and Python.",
        "Analyze financial data to provide insights and create dashboards using SQL, Excel, Tableau, Statistics, and Financial Analysis skills.",
        "Apply advanced statistical methods to analyze clinical trial data using R, Statistics, Experimental Design, Causal Inference.",
        "Design and implement data pipelines for big data processing using Spark, Hadoop, SQL, Python, and Cloud Platforms (AWS/GCP)."
    ],
     'link': ["techcorp.com/careers", "aistartup.com/jobs", "fintech.com/apply", "pharmalabs.com/research", "bigdataco.com/careers"],
     # 'skills_required' list can be generated from description or kept separate
     # Let's keep it separate for clarity, matching your example structure
     'skills_required': [
         ["Python", "SQL", "Machine Learning", "Statistics", "A/B Testing"],
         ["PyTorch", "TensorFlow", "Computer Vision", "MLOps", "Python"],
         ["SQL", "Excel", "Tableau", "Statistics", "Financial Analysis"],
         ["R", "Statistics", "Experimental Design", "Causal Inference"],
         ["Spark", "Hadoop", "SQL", "Python", "AWS", "GCP"]
     ]
}
jobs_df = pd.DataFrame(jobs_data)
print("\n--- Jobs DataFrame ---")
display(jobs_df)

# --- 3. Pre-compute Job Embeddings ---
print("\nPre-computing embeddings for job descriptions...")
jobs_df['embeddings'] = [None] * len(jobs_df) # Initialize column

# Use globals().get() for safer access in case cell execution order changes
loaded_embedding_model = globals().get('embedding_model')

if loaded_embedding_model is not None:
    if not jobs_df.empty:
        if 'description' in jobs_df.columns:
            jobs_df['desc_safe'] = jobs_df['description'].fillna('')
            # Use the embedding model directly here for simplicity
            try:
                jobs_df['embeddings'] = jobs_df['desc_safe'].apply(lambda x: loaded_embedding_model.encode(x) if x else None)
                print(f"Job embeddings computed for {jobs_df['embeddings'].notna().sum()} jobs.")
            except Exception as e:
                 print(f"ERROR computing job embeddings: {e}")
            finally:
                 # Drop the temporary column even if embedding fails
                 jobs_df = jobs_df.drop(columns=['desc_safe'], errors='ignore')
        else:
            print("Warning: 'description' column not found. Cannot compute embeddings.")
    else:
         print("Warning: jobs_df is empty. Cannot compute embeddings.")
else:
    print("Warning: Embedding model not loaded (check Cell 2). Cannot compute job embeddings.")


# --- 4. Placeholder Study Recommendations ---
# Using structure compatible with our functions, slightly adapted from your example
study_rec_data_list = []
temp_study_data = [ # From your example structure
    {"category": "Machine Learning", "resources": [{"title": "ML Mastery", "url": "...", "type": "Blog"}, {"title": "PRML Book", "url": "...", "type": "Book"}]},
    {"category": "SQL", "resources": [{"title": "SQL for DS", "url": "...", "type": "Course"}, {"title": "Leetcode SQL", "url": "...", "type": "Practice"}]},
    {"category": "Data Preprocessing", "resources": [{"title": "Feature Eng Book", "url": "...", "type": "Book"}, {"title": "Kaggle Data Cleaning", "url": "...", "type": "Practice"}]},
    {"category": "Experimental Design", "resources": [{"title": "Trustworthy Exp Book", "url": "...", "type": "Book"}, {"title": "Udacity A/B Course", "url": "...", "type": "Course"}]},
     {"category": "Statistics", "resources": [{"title": "Stats Thinking Course", "url": "...", "type": "Course"}, {"title": "Practical Stats Book", "url": "...", "type": "Book"}]}
]
# Convert to our target structure (skill_tag, recommendation_text)
for item in temp_study_data:
     category = item['category']
     # Create a combined recommendation text
     rec_text = f"Explore resources for {category}: "
     rec_text += ", ".join([f"{res['title']} ({res['type']})" for res in item['resources']])
     study_rec_data_list.append({'skill_tag': category, 'recommendation_text': rec_text})

study_rec_df = pd.DataFrame(study_rec_data_list)
print("\n--- Study Recommendations DataFrame ---")
display(study_rec_df)


# --- 5. Placeholder Leaderboard ---
# Using our previous structure
leaderboard_data = {
    'Rank': [1, 2, 3, 4, 5],
    'User': ["AI_Legend", "CodeNinja", "DataGuru", "StatsWizard", "ProbSolver"],
    'Score': [9.8, 9.5, 9.1, 8.8, 8.5], # Example overall score / 10
    'Badges': ["Passed!,Top Performer", "Passed!,Top Performer", "Passed!", "Passed!", "Passed!"]
}
leaderboard_df = pd.DataFrame(leaderboard_data).set_index('Rank')
print("\n--- Simulated Leaderboard ---")
display(leaderboard_df)


# --- 6. Placeholder Negotiation Scenarios ---
# Using our previous structure
negotiation_data = {
    'scenario_id': [1, 2, 3, 4],
    'role_type': ['FAANG', 'Startup', 'FAANG', 'Startup'],
    'recruiter_statement': [
        "We're prepared to offer you a base salary of $120,000.",
        "Our standard offer for this role includes a base of $90,000 and 0.1% equity.",
        "Based on your experience, the salary band allows us to offer $135,000.",
        "We can offer $100,000 base salary, plus standard benefits."
    ],
    'good_user_counter_example': [
        "Thank you for the offer! Based on my research for similar roles at this level and my competing opportunities, I was expecting a base closer to $135,000. Is there flexibility?",
        "I appreciate the offer and the equity component. Given my experience and the market rate, I'd be looking for a base salary around $105,000. Can we discuss the equity vesting schedule as well?",
        "That's a strong offer, thank you. Considering the cost of living and market data, I believe my value aligns more with the $145,000-$150,000 range. Are sign-on bonuses or performance bonuses negotiable?",
        "Thanks! Could you provide details on the bonus structure and typical total compensation? I'm aiming for a total package value around $120,000."
    ],
    'feedback_hint': [
        "User countered reasonably, referenced market/other offers, asked about flexibility.",
        "User acknowledged equity, provided target base, asked clarifying question.",
        "User expressed thanks, justified higher range, opened discussion on other compensation components.",
        "User asked clarifying questions about total compensation instead of just focusing on base."
    ]
}
negotiation_scenarios_df = pd.DataFrame(negotiation_data)
print("\n--- Negotiation Scenarios DataFrame ---")
display(negotiation_scenarios_df)

# --- 7. Placeholder Benchmarks ---
# Using the structure from your provided code
benchmarks_data = [
    {"company": "Google", "role": "Data Scientist", "avg_score": 16.8, "min_score": 14.5, "max_score": 19.0},
    {"company": "Facebook", "role": "Data Scientist", "avg_score": 16.2, "min_score": 14.0, "max_score": 18.5},
    # Add other benchmarks...
     {"company": "Google", "role": "ML Engineer", "avg_score": 17.0, "min_score": 15.0, "max_score": 19.2},
]
benchmarks_df = pd.DataFrame(benchmarks_data)
print("\n--- Benchmarks DataFrame ---")
display(benchmarks_df)


print("-" * 30)
print("DATA PREPARATION CELL COMPLETE")
print("-" * 30)

Creating placeholder DataFrames...

--- Questions DataFrame ---


Unnamed: 0,question_id,question_text,ideal_answer_points,skill_tags,category,difficulty
0,1,Explain the difference between L1 and L2 regul...,L1 regularization (Lasso) adds absolute coeffi...,"Machine Learning,Regularization,Model Tuning",Machine Learning,Medium
1,2,How would you handle missing data in a dataset?,Methods include Deletion (listwise/pairwise if...,"Data Preprocessing,Data Cleaning",Data Preprocessing,Easy
2,3,Design an A/B testing framework for a new feat...,"Define clear metrics (conversion, revenue). Ca...","Experimental Design,Statistics,A/B Testing",Experimental Design,Hard
3,4,Explain the bias-variance tradeoff in machine ...,Tradeoff between model simplicity (high bias/u...,"Machine Learning,Model Tuning,Evaluation",Machine Learning,Medium
4,5,Write a SQL query to find the top 3 department...,"SELECT d.department_name, AVG(e.salary) as avg...","SQL,Data Analysis",SQL,Medium



--- Jobs DataFrame ---


Unnamed: 0,job_id,title,company,description,link,skills_required
0,1,Data Scientist,TechCorp,Looking for a data scientist with strong ML sk...,techcorp.com/careers,"[Python, SQL, Machine Learning, Statistics, A/..."
1,2,ML Engineer,AIStartup,Develop and deploy machine learning models for...,aistartup.com/jobs,"[PyTorch, TensorFlow, Computer Vision, MLOps, ..."
2,3,Data Analyst,FinTech Inc,Analyze financial data to provide insights and...,fintech.com/apply,"[SQL, Excel, Tableau, Statistics, Financial An..."
3,4,Research Scientist,PharmaLabs,Apply advanced statistical methods to analyze ...,pharmalabs.com/research,"[R, Statistics, Experimental Design, Causal In..."
4,5,Data Engineer,BigData Co,Design and implement data pipelines for big da...,bigdataco.com/careers,"[Spark, Hadoop, SQL, Python, AWS, GCP]"



Pre-computing embeddings for job descriptions...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Job embeddings computed for 5 jobs.

--- Study Recommendations DataFrame ---


Unnamed: 0,skill_tag,recommendation_text
0,Machine Learning,Explore resources for Machine Learning: ML Mas...
1,SQL,Explore resources for SQL: SQL for DS (Course)...
2,Data Preprocessing,Explore resources for Data Preprocessing: Feat...
3,Experimental Design,Explore resources for Experimental Design: Tru...
4,Statistics,Explore resources for Statistics: Stats Thinki...



--- Simulated Leaderboard ---


Unnamed: 0_level_0,User,Score,Badges
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,AI_Legend,9.8,"Passed!,Top Performer"
2,CodeNinja,9.5,"Passed!,Top Performer"
3,DataGuru,9.1,Passed!
4,StatsWizard,8.8,Passed!
5,ProbSolver,8.5,Passed!



--- Negotiation Scenarios DataFrame ---


Unnamed: 0,scenario_id,role_type,recruiter_statement,good_user_counter_example,feedback_hint
0,1,FAANG,We're prepared to offer you a base salary of $...,Thank you for the offer! Based on my research ...,"User countered reasonably, referenced market/o..."
1,2,Startup,Our standard offer for this role includes a ba...,I appreciate the offer and the equity componen...,"User acknowledged equity, provided target base..."
2,3,FAANG,"Based on your experience, the salary band allo...","That's a strong offer, thank you. Considering ...","User expressed thanks, justified higher range,..."
3,4,Startup,"We can offer $100,000 base salary, plus standa...",Thanks! Could you provide details on the bonus...,User asked clarifying questions about total co...



--- Benchmarks DataFrame ---


Unnamed: 0,company,role,avg_score,min_score,max_score
0,Google,Data Scientist,16.8,14.5,19.0
1,Facebook,Data Scientist,16.2,14.0,18.5
2,Google,ML Engineer,17.0,15.0,19.2


------------------------------
DATA PREPARATION CELL COMPLETE
------------------------------


## 3. Core Logic Functions

This section defines the core Python functions that implement JobSage's intelligence. These functions handle tasks like:

*   Generating relevant interview questions, attempting few-shot prompting with the **Gemini API** and allowing for **difficulty** selection.
*   Converting text (answers, job descriptions, CVs) into numerical embeddings using **Sentence Transformers** to understand semantic meaning.
*   Evaluating user answers based on content similarity (embeddings) and AI assessment of clarity/depth using the **Gemini API**, potentially adjusting scores based on question **difficulty**.
*   Generating dynamic follow-up questions using the **Gemini API**.
*   Calculating the user's performance benchmark against simulated norms.
*   Recommending specific study topics based on identified weak areas (using simple RAG).
*   Extracting potential skills from CV text (using keywords or potentially Gemini).
*   Recommending relevant jobs based on CV skills and performance (using RAG/vector search).
*   Simulating a basic **salary negotiation** scenario using the **Gemini API**.

In [3]:
# ==================================================
# Cell 5: Core Logic Function Definitions
# (Gemini-Only, Difficulty, Negotiation & Resume Tweaks Added)
# ==================================================
print("Defining core logic functions...")
import re
import numpy as np
import pandas as pd
from sentence_transformers import util
import json # For parsing Gemini JSON output
import random # For random choices
from sklearn.metrics.pairwise import cosine_similarity # Alternative for similarity calculation

# Ensure access to global variables defined in setup/data prep
# These lines ensure the functions can find the models/dataframes loaded earlier
# If running cells out of order, this might grab None, hence the checks inside functions
gemini_model = globals().get('gemini_model')
embedding_model = globals().get('embedding_model')
questions_df = globals().get('questions_df', pd.DataFrame())
jobs_df = globals().get('jobs_df', pd.DataFrame())
study_rec_df = globals().get('study_rec_df', pd.DataFrame())
negotiation_scenarios_df = globals().get('negotiation_scenarios_df', pd.DataFrame()) # Added

# Constants from setup
FAANG_MEAN_SCORE = globals().get('FAANG_MEAN_SCORE', 7.5) # Score out of 10
FAANG_STD_DEV = globals().get('FAANG_STD_DEV', 1.0)
MAX_POINTS_PER_QUESTION = globals().get('MAX_POINTS_PER_QUESTION', 20)
PASSING_THRESHOLD = globals().get('PASSING_THRESHOLD', 7.0) # Score out of 10

# --- Embedding Function ---
def get_text_embedding(text):
    """Generates sentence embedding for given text using the loaded model."""
    global embedding_model # Explicitly use global
    if embedding_model is None:
        print("Error: Embedding model not available.")
        return None
    if not isinstance(text, str): text = str(text)
    if not text.strip():
         print("Warning: Embedding empty string.")
         # Depending on model, might return zero vector or a default embedding
         # return np.zeros(embedding_model.get_sentence_embedding_dimension())
         pass # Let model handle it
    try:
        embedding = embedding_model.encode(text)
        return embedding
    except Exception as e:
        print(f"Error during embedding encoding: {e}")
        return None

# --- Question Generation (Gemini Few-Shot Attempt + Predefined Selection + Difficulty) ---
def generate_question(category=None, difficulty=None, excluded_ids=[], max_examples=3):
    """
    Selects a predefined question matching category/difficulty, avoiding excluded IDs.
    Optionally attempts Gemini few-shot generation for demo purposes.

    Args:
        category (str, optional): Target category. Defaults to None.
        difficulty (str, optional): Target difficulty ('Easy', 'Medium', 'Hard'). Defaults to None.
        excluded_ids (list, optional): List of question_ids to exclude. Defaults to [].
        max_examples (int, optional): Max few-shot examples for Gemini prompt. Defaults to 3.

    Returns:
        tuple: (question_text, ideal_answer_points, skill_tags, question_id, difficulty)
               Returns error strings and codes (-2, -3) on failure.
    """
    global questions_df, gemini_model # Explicitly use globals

    if questions_df.empty:
        return "Error: Questions DataFrame is empty.", "N/A", "N/A", -3, "N/A"

    # --- Filter predefined questions ---
    filtered_questions = questions_df.copy()
    if category:
        # Case-insensitive category matching
        filtered_questions = filtered_questions[filtered_questions['category'].str.lower() == category.lower()]
    if difficulty:
        # Case-insensitive difficulty matching
        filtered_questions = filtered_questions[filtered_questions['difficulty'].str.lower() == difficulty.lower()]
    if excluded_ids:
        filtered_questions = filtered_questions[~filtered_questions['question_id'].isin(excluded_ids)]

    if filtered_questions.empty:
        # Fallback logic if no exact match found
        print(f"Warning: No questions found for Cat='{category}', Diff='{difficulty}'. Broadening search...")
        filtered_questions = questions_df.copy() # Start fresh
        if category: # Try keeping category if possible
             filtered_questions = filtered_questions[filtered_questions['category'].str.lower() == category.lower()]
        if excluded_ids: # Always exclude asked questions
             filtered_questions = filtered_questions[~filtered_questions['question_id'].isin(excluded_ids)]
        # If still empty, try ignoring category but keep difficulty
        if filtered_questions.empty and difficulty:
             print(f"Broadening further: Ignoring category, keeping Diff='{difficulty}'...")
             filtered_questions = questions_df.copy()
             filtered_questions = filtered_questions[filtered_questions['difficulty'].str.lower() == difficulty.lower()]
             if excluded_ids:
                  filtered_questions = filtered_questions[~filtered_questions['question_id'].isin(excluded_ids)]
        # Final fallback: Any available question
        if filtered_questions.empty:
             print("Broadening further: Using any available question...")
             filtered_questions = questions_df[~questions_df['question_id'].isin(excluded_ids)]
             if filtered_questions.empty:
                  return "No more unique questions available in the bank.", "End.", "End", -2, "End"

    # --- Select the question we WILL return ---
    chosen_q = filtered_questions.sample(1).iloc[0]
    print(f"Selected predefined question ID {chosen_q['question_id']} (Category: {chosen_q['category']}, Difficulty: {chosen_q['difficulty']}).")

    # --- Attempt Gemini generation *only if client exists* (for capability demo) ---
    if gemini_model:
        print("Attempting Gemini candidate generation (for demo purposes)...")
        examples_for_prompt = questions_df.sample(min(max_examples, len(questions_df)))
        few_shot_prompt_text = "Example technical interview questions:\n\n"
        for _, row in examples_for_prompt.iterrows():
             few_shot_prompt_text += f"---\nCategory: {row['category']}\nDifficulty: {row['difficulty']}\nQuestion: {row['question_text']}\n---\n"

        target_cat_gen = category if category else random.choice(questions_df['category'].unique())
        target_diff_gen = difficulty if difficulty else random.choice(questions_df['difficulty'].unique())
        prompt = few_shot_prompt_text + f"\nGenerate one new, unique data science interview question.\nDesired Category: '{target_cat_gen}'\nDesired Difficulty: '{target_diff_gen}'\nRespond ONLY with the question text."

        try:
            safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
            gen_response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
            generated_text = gen_response.text.strip()
            print(f"Gemini generated candidate: '{generated_text}' (Candidate NOT used in flow).")
        except Exception as e:
            print(f"Gemini candidate generation failed: {e}.")
            try: # Check for block reason
                if gen_response and gen_response.prompt_feedback and gen_response.prompt_feedback.block_reason:
                     print(f"Generation blocked. Reason: {gen_response.prompt_feedback.block_reason}")
            except: pass
    else:
        print("Gemini model not available, skipping candidate generation demo.")

    # Return the reliably selected predefined question details
    return chosen_q['question_text'], chosen_q['ideal_answer_points'], chosen_q['skill_tags'], chosen_q['question_id'], chosen_q['difficulty']

# --- LLM Score Parsing Helper ---
def parse_llm_score(response_text, scale=10):
    """Helper function to extract a numerical score from LLM text output."""
    if not isinstance(response_text, str): return 0.0
    # Priority 1: Look for X/Scale patterns (more specific first)
    match = re.search(r'(\d+(\.\d+)?)\s*/\s*' + str(int(scale)), response_text)
    if match:
        try: return max(0.0, min(float(scale), float(match.group(1))))
        except ValueError: pass
    # Priority 2: Look for Score: X / Scale
    match = re.search(r'score[:\s]*?(\d+(\.\d+)?)\s*/\s*' + str(int(scale)), response_text, re.IGNORECASE)
    if match:
        try: return max(0.0, min(float(scale), float(match.group(1))))
        except ValueError: pass
    # Priority 3: Look for Score: X or Rating: X etc.
    match = re.search(r'\b(?:score|rating|value)[:\s]*?(\d+(\.\d+)?)\b', response_text, re.IGNORECASE)
    if match:
         try: return max(0.0, min(float(scale), float(match.group(1))))
         except ValueError: pass
    # Priority 4: Look for standalone number, possibly near keywords
    match = re.search(r'\b(\d+(\.\d+)?)\b', response_text) # Find any number
    if match:
         try: return max(0.0, min(float(scale), float(match.group(1))))
         except ValueError: pass

    print(f"Warning: Could not parse score (scale/{scale}) from LLM response snippet: '{response_text[:100]}...'")
    return 0.0 # Default score

# --- Answer Evaluation (Embeddings + Gemini LLM + Difficulty Scaling) ---
def evaluate_answer(question_text, user_answer, ideal_answer_points, difficulty="Medium"):
    """Evaluates user answer using embeddings and Gemini LLM, scaling points by difficulty."""
    global embedding_model, gemini_model, MAX_POINTS_PER_QUESTION # Explicitly use globals

    # Initialize defaults
    similarity_score = 0.0
    content_score_llm = 0.0
    clarity_score_llm = 0.0
    depth_score_llm = 0.0
    qualitative_feedback = "Evaluation unavailable."
    points = 0.0

    # 1. Similarity Score (Embeddings)
    if embedding_model:
        user_embedding = get_text_embedding(user_answer)
        ideal_embedding = get_text_embedding(ideal_answer_points)
        if user_embedding is not None and ideal_embedding is not None:
            try:
                similarity_score = util.pytorch_cos_sim(user_embedding, ideal_embedding).item() * 10
                similarity_score = max(0.0, min(10.0, similarity_score))
            except Exception as e: print(f"Error calculating embedding similarity: {e}")
    else: print("Error: Embedding model not available for similarity scoring.")

    # 2. Use Gemini for Content, Clarity, Depth Scores & Feedback
    if gemini_model:
        try:
            prompt = f"""Task: Evaluate the user's interview answer based on Content, Clarity, and Depth.
Instructions:
1. Content Score: Evaluate accuracy/correctness against expected points. Score 0-10. Provide 1 sentence rationale then 'Content Score: X/10'.
2. Clarity Score: Evaluate clear phrasing/structure (ignore accuracy). Score 0-10. Provide 1 sentence rationale then 'Clarity Score: X/10'.
3. Depth Score: Evaluate thoroughness/nuance beyond basics. Score 0-10. Provide 1 sentence rationale then 'Depth Score: X/10'.
4. Overall Feedback: Provide 2-3 sentences of concise, constructive overall feedback. Start with 'Overall Feedback:'.

Question: "{question_text}"
Expected Answer Points/Keywords: "{ideal_answer_points}"
Candidate's Answer: "{user_answer}"

Response Format (Use EXACT keywords 'Content Score:', 'Clarity Score:', 'Depth Score:', 'Overall Feedback:'):
Content Score: [score_float]/10. [Rationale]
Clarity Score: [score_float]/10. [Rationale]
Depth Score: [score_float]/10. [Rationale]
Overall Feedback: [Feedback text]
"""
            print("Requesting evaluation from Gemini...")
            safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
            response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
            response_text = response.text

            # Parse scores using keywords and the helper function
            content_match = re.search(r'Content Score[:\s]*?(\d+(\.\d+)?)\s*/\s*10', response_text, re.IGNORECASE)
            if content_match: content_score_llm = max(0.0, min(10.0, float(content_match.group(1))))
            else: content_score_llm = parse_llm_score(response_text, 10) # Fallback parsing

            clarity_match = re.search(r'Clarity Score[:\s]*?(\d+(\.\d+)?)\s*/\s*10', response_text, re.IGNORECASE)
            if clarity_match: clarity_score_llm = max(0.0, min(10.0, float(clarity_match.group(1))))
            else: clarity_score_llm = parse_llm_score(response_text, 10)

            depth_match = re.search(r'Depth Score[:\s]*?(\d+(\.\d+)?)\s*/\s*10', response_text, re.IGNORECASE)
            if depth_match: depth_score_llm = max(0.0, min(10.0, float(depth_match.group(1))))
            else: depth_score_llm = parse_llm_score(response_text, 10)

            # Extract qualitative feedback
            feedback_match = re.search(r'Overall Feedback[:\s]*(.*)', response_text, re.IGNORECASE | re.DOTALL)
            if feedback_match: qualitative_feedback = feedback_match.group(1).strip()
            else: qualitative_feedback = response_text # Fallback to full text if keyword not found

            print("Gemini evaluation successful.")

        except Exception as e:
            error_msg = f"Error during Gemini evaluation: {e}"
            print(error_msg)
            qualitative_feedback = error_msg
            try: # Check block reason
                if response and response.prompt_feedback and response.prompt_feedback.block_reason:
                     qualitative_feedback += f" (Block Reason: {response.prompt_feedback.block_reason})"
            except: pass
    else:
        print("Warning: Gemini model not available for evaluation.")
        qualitative_feedback = "LLM evaluation skipped (model unavailable)."
        # Use similarity score for basic feedback if LLM fails
        if similarity_score >= 8.0: qualitative_feedback += "\nContent similarity to expected answer is high."
        elif similarity_score >= 5.0: qualitative_feedback += "\nContent similarity is moderate."
        else: qualitative_feedback += "\nContent similarity to expected answer is low."
        # Use similarity score as proxy for content score if LLM failed
        content_score_llm = similarity_score


    # 3. Calculate Final Points (Weighted Average + Difficulty Scaling)
    # Weighting: 50% content, 25% clarity, 25% depth (Scores / 10)
    base_points_ratio = (content_score_llm / 10 * 0.50 +
                         clarity_score_llm / 10 * 0.25 +
                         depth_score_llm / 10 * 0.25)

    # Apply difficulty multiplier
    difficulty_multiplier = {'easy': 0.9, 'medium': 1.0, 'hard': 1.1}
    difficulty_key = difficulty.lower() if isinstance(difficulty, str) else 'medium'
    scaled_points_ratio = base_points_ratio * difficulty_multiplier.get(difficulty_key, 1.0)

    points = scaled_points_ratio * MAX_POINTS_PER_QUESTION
    points = round(max(0, min(MAX_POINTS_PER_QUESTION, points)), 1)

    return {
        'similarity_score': round(similarity_score, 1),
        'content_score': round(content_score_llm, 1),
        'clarity_score': round(clarity_score_llm, 1),
        'depth_score': round(depth_score_llm, 1),
        'qualitative_feedback': qualitative_feedback,
        'points': points
    }

# --- Follow-up Question Generation (Gemini) ---
def generate_follow_up(question_text, user_answer):
    """Generates a relevant follow-up question using Gemini."""
    global gemini_model # Explicitly use global

    if gemini_model is None:
        print("Info: Gemini model not available for follow-up question generation.")
        return None

    prompt = f"""Based on the original interview question and the user's answer provided below, ask ONE relevant and concise follow-up question (ending with '?'). Probe deeper or clarify a specific point. Ask only the question itself, without any preamble.

Original Question: "{question_text}"
User's Answer: "{user_answer}"

Follow-up Question:"""
    try:
        print("Requesting follow-up question from Gemini...")
        safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
        response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
        follow_up_question = response.text.strip()

        if not follow_up_question or len(follow_up_question) < 10: return None # Basic validation
        if not follow_up_question.endswith('?'): follow_up_question += "?"
        # Avoid returning instructions or refusals
        if "question:" in follow_up_question.lower() or "sorry" in follow_up_question.lower(): return None

        print(f"Gemini generated follow-up: {follow_up_question}")
        return follow_up_question
    except Exception as e:
        print(f"Error generating follow-up question with Gemini: {e}")
        try: # Check block reason
            if response and response.prompt_feedback and response.prompt_feedback.block_reason:
                 print(f"Gemini follow-up blocked. Reason: {response.prompt_feedback.block_reason}")
        except: pass
        return None

# --- Benchmarking Calculation (Score out of 10) ---
def calculate_benchmark(overall_session_score_10, mean=FAANG_MEAN_SCORE, std_dev=FAANG_STD_DEV):
    """Calculates user's percentile benchmark against simulated norms (Score out of 10)."""
    if not isinstance(overall_session_score_10, (int, float)): return 0
    try:
        cdf_value = norm.cdf(overall_session_score_10, loc=mean, scale=std_dev)
        percentile_top = (1 - cdf_value) * 100
        return int(max(0, min(100, percentile_top)))
    except Exception as e: print(f"Error calculating benchmark: {e}"); return 0

# --- Study Topic Recommendation (RAG) ---
def recommend_study_topics(weakest_skills):
    """Recommends study topics based on weakest skills using RAG lookup on study_rec_df."""
    global study_rec_df # Explicitly use global
    recommendations = []
    if study_rec_df.empty: return recommendations
    if not isinstance(weakest_skills, list): return recommendations

    try: # Ensure lowercase column exists for matching
        if 'skill_tag_lower' not in study_rec_df.columns:
             study_rec_df['skill_tag_lower'] = study_rec_df['skill_tag'].str.lower()
    except Exception as e: print(f"Error adding lowercase skill column: {e}"); return []

    for skill in weakest_skills:
        if not isinstance(skill, str): continue
        match = study_rec_df[study_rec_df['skill_tag_lower'] == skill.lower().strip()]
        if not match.empty:
            recommendations.append(match['recommendation_text'].iloc[0])
    # Don't drop column here if function might be called again
    # study_rec_df = study_rec_df.drop(columns=['skill_tag_lower'], errors='ignore')
    return list(set(recommendations))

# --- CV Skill Extraction Fallback (Keywords) ---
def extract_cv_skills_keyword_fallback(cv_text):
    """Fallback: Extracts skills using a predefined keyword list."""
    print("Using keyword fallback for skill extraction.")
    if not isinstance(cv_text, str): return []
    keywords = [ "python", "sql", "java", "c++", "c#", "r", "scala", "javascript", "typescript", "php", "swift", "kotlin", "go", "ruby", "perl", "bash", "powershell", "pandas", "numpy", "scipy", "matplotlib", "seaborn", "plotly", "bokeh", "scikit-learn", "sklearn", "tensorflow", "keras", "pytorch", "torch", "jax", "theano", "caffe", "xgboost", "lightgbm", "catboost", "statsmodels", "nltk", "spacy", "gensim", "hugging face", "transformers", "opencv", "pillow", "aws", "azure", "gcp", "google cloud", "amazon web services", "cloud computing", "hadoop", "spark", "pyspark", "mapreduce", "hive", "pig", "impala", "kafka", "rabbitmq", "flink", "storm", "docker", "kubernetes", "k8s", "openshift", "terraform", "ansible", "ci/cd", "jenkins", "gitlab ci", "postgresql", "mysql", "sqlite", "sql server", "oracle", "mongodb", "cassandra", "redis", "neo4j", "elasticsearch", "nosql", "database design", "data modeling", "data warehousing", "etl", "tableau", "power bi", "qlik", "looker", "d3.js", "excel", "statistics", "probability", "econometrics", "calculus", "linear algebra", "discrete math", "machine learning", "ml", "deep learning", "dl", "artificial intelligence", "ai", "natural language processing", "nlp", "computer vision", "cv", "speech recognition", "reinforcement learning", "rl", "data analysis", "data mining", "predictive modeling", "forecasting", "optimization", "operations research", "a/b testing", "experiment design", "causal inference", "algorithms", "data structures", "object-oriented programming", "oop", "functional programming", "system design", "distributed systems", "microservices", "api design", "rest", "graphql", "communication", "presentation", "leadership", "teamwork", "collaboration", "problem-solving", "critical thinking", "agile", "scrum", "project management", "product management" ]
    cv_lower = cv_text.lower()
    found_skills = [k for k in keywords if re.search(r'\b' + re.escape(k) + r'\b', cv_lower)]
    return sorted(list(set(found_skills)))

# --- CV Skill Extraction (Using Gemini) ---
def extract_cv_skills(cv_text):
    """Extracts skills from a CV using Gemini API."""
    global gemini_model # Explicitly use global
    if gemini_model is None:
        print("Error: Gemini model not available for skill extraction.")
        return extract_cv_skills_keyword_fallback(cv_text) # Fallback
    if not isinstance(cv_text, str) or not cv_text.strip(): return []

    prompt = f"""Analyze the following CV text and extract a comprehensive list of professional skills. Include programming languages, libraries, frameworks, tools, platforms (cloud), methodologies, core concepts (e.g., machine learning, statistics), and relevant soft skills (e.g., communication).
Format the output strictly as a JSON array of unique strings. Example: ["Python", "SQL", "Scikit-learn", "AWS", "Agile", "Communication"]

CV Text:
---
{cv_text}
---

JSON Skill Array:"""
    try:
        print("Requesting CV skill extraction from Gemini...")
        safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
        response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
        cleaned_response_text = re.sub(r'```json\s*([\s\S]*?)\s*```', r'\1', response.text, flags=re.IGNORECASE)
        skills = json.loads(cleaned_response_text)
        if isinstance(skills, list) and all(isinstance(s, str) for s in skills):
            print(f"Gemini extracted {len(skills)} skills.")
            return sorted(list(set(skills)))
        else:
            print("Warning: Gemini skill extraction did not return a valid list of strings.")
            return extract_cv_skills_keyword_fallback(cv_text)
    except json.JSONDecodeError as e:
         print(f"Error parsing Gemini JSON skill response: {e}")
         print(f"Raw Response: {response.text[:500] if 'response' in locals() else 'N/A'}")
         return extract_cv_skills_keyword_fallback(cv_text)
    except Exception as e:
        print(f"Error extracting skills with Gemini: {e}")
        try: # Check block reason
            if response and response.prompt_feedback and response.prompt_feedback.block_reason:
                 print(f"Skill extraction blocked. Reason: {response.prompt_feedback.block_reason}")
        except: pass
        return extract_cv_skills_keyword_fallback(cv_text)


# --- Job Recommendation (RAG via Embedding Similarity) ---
def recommend_jobs(cv_skills_embedding, jobs_df, top_n=3):
    """Recommends jobs based on cosine similarity between CV embedding and pre-computed job embeddings."""
    # Note: No need for global jobs_df as it's passed as argument
    if cv_skills_embedding is None: return []
    if jobs_df.empty or 'embeddings' not in jobs_df.columns or jobs_df['embeddings'].isnull().all(): return []

    valid_jobs_df = jobs_df.dropna(subset=['embeddings']).copy()
    if valid_jobs_df.empty: return []

    try: job_embeddings = np.stack(valid_jobs_df['embeddings'].values)
    except Exception as e: print(f"Error stacking job embeddings: {e}"); return []

    try:
        if not isinstance(cv_skills_embedding, np.ndarray): cv_skills_embedding = np.array(cv_skills_embedding)
        similarities = util.pytorch_cos_sim(cv_skills_embedding, job_embeddings)[0].numpy()
        valid_jobs_df['similarity_score'] = similarities
        actual_top_n = min(top_n, len(valid_jobs_df))
        top_jobs = valid_jobs_df.nlargest(actual_top_n, 'similarity_score')
        recommended_jobs = top_jobs[['title', 'company', 'link', 'similarity_score']].to_dict('records')
        for job in recommended_jobs: job['similarity_score'] = round(job['similarity_score'], 3)
        return recommended_jobs
    except Exception as e: print(f"Error during job recommendation calc: {e}"); return []

# --- Negotiation Simulator (Using Gemini) ---
def simulate_negotiation(job_title, years_experience, current_salary):
    """Simulates a salary negotiation conversation using Gemini."""
    global gemini_model # Explicitly use global
    if gemini_model is None: return "Negotiation simulation unavailable (Model not loaded)."

    try: # Input validation
        years_exp_float = float(years_experience)
        current_salary_float = float(current_salary)
        if not isinstance(job_title, str) or not job_title.strip(): raise ValueError("Job Title missing")
    except (ValueError, TypeError) as e: return f"Error: Invalid input. Details: {e}"

    # Basic salary estimation
    base_multiplier = 1.2 if isinstance(job_title, str) and job_title.lower() in ["data scientist", "ml engineer", "research scientist"] else 1.0
    experience_factor = min(years_exp_float * 0.1, 0.5)
    target_min = int(current_salary_float * (1.1 + base_multiplier * experience_factor))
    target_max = int(current_salary_float * (1.2 + base_multiplier * experience_factor))

    prompt = f"""Simulate a brief, realistic salary negotiation dialogue (4-6 conversational turns total) for a '{job_title}' position.

Candidate Profile:
- Current approximate salary: ${current_salary_float:,.0f}
- Years of relevant experience: {years_exp_float:.1f}
- Reasonable target salary range based on profile: ${target_min:,.0f} - ${target_max:,.0f}

Instructions for Dialogue:
1. Start with the Hiring Manager making an initial verbal offer.
2. Include realistic back-and-forth. The 'Candidate' should express gratitude, possibly counter-offer, ask clarifying questions, or discuss other benefits.
3. Conclude the dialogue.

Instructions for Feedback:
4. After the dialogue, provide exactly 3 specific, actionable feedback points analyzing the CANDIDATE's negotiation strategy shown ONLY in the dialogue you generated. Start this section clearly with '--- FEEDBACK ---'.

Dialogue & Feedback:"""
    try:
        print(f"Requesting negotiation simulation from Gemini (Target: ${target_min:,.0f}-${target_max:,.0f})...")
        safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
        response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
        if response.text and len(response.text) > 70 and "FEEDBACK" in response.text:
             print("Negotiation simulation generated.")
             return response.text.strip()
        else:
             block_reason = "Response invalid/short"
             try:
                 if response and response.prompt_feedback and response.prompt_feedback.block_reason: block_reason = response.prompt_feedback.block_reason
             except: pass
             print(f"Negotiation simulation from Gemini seems invalid or blocked (Reason: {block_reason}). Response: {response.text[:100] if response.text else 'Empty'}")
             return "Error: Could not generate a valid negotiation simulation."
    except Exception as e: print(f"Error during negotiation simulation: {e}"); return f"Error generating negotiation simulation: {e}"


# --- Resume Tweak Suggestion (Using Gemini) ---
def suggest_resume_tweaks(weakest_skills, cv_text):
    """Suggests potential resume improvements based on identified weak skills using Gemini."""
    global gemini_model # Explicitly use global
    if gemini_model is None: return ["Resume tweak suggestions unavailable (Model not loaded)."]
    if not weakest_skills: return []
    if not isinstance(cv_text, str) or len(cv_text) < 50: return ["CV text too short or invalid for analysis."]

    skills_to_focus = weakest_skills[:2] # Focus on top 1-2
    tweaks = []
    print(f"Requesting resume tweak suggestions from Gemini for skills: {skills_to_focus}...")
    try:
        for skill in skills_to_focus:
            prompt = f"""Analyze the following CV. The candidate showed weakness in the skill '{skill}' during a mock interview.
Provide ONE specific, actionable suggestion (1-2 sentences) for how the candidate could improve their CV to better showcase potential experience or knowledge related to '{skill}'. If the skill seems completely absent and unaddressable, state that politely.

CV Text:
---
{cv_text}
---

Suggestion for '{skill}':"""
            safety_settings_med = [ {"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
            response = gemini_model.generate_content(prompt, safety_settings=safety_settings_med)
            tweak = response.text.strip()
            if tweak and len(tweak) > 10: tweaks.append(tweak)
            else: print(f"Gemini provided no valid tweak for {skill}. Response: {tweak[:100]}")
        print("Resume tweak suggestions generated.")
        return tweaks
    except Exception as e:
        print(f"Error generating resume tweaks with Gemini: {e}")
        try: # Check block reason
            if response and response.prompt_feedback and response.prompt_feedback.block_reason:
                 print(f"Tweak generation blocked. Reason: {response.prompt_feedback.block_reason}")
        except: pass
        return ["Error occurred while generating resume suggestions."]


print("-" * 30)
print("CORE LOGIC FUNCTIONS DEFINED (Gemini-Only, Difficulty, Negotiation, Resume Tweaks)")
print("-" * 30)

Defining core logic functions...
------------------------------
CORE LOGIC FUNCTIONS DEFINED (Gemini-Only, Difficulty, Negotiation, Resume Tweaks)
------------------------------


## 4. Interview Simulation Flow / Demo

Now that the core functions are defined, this section simulates a user going through a mock interview session with JobSage. It demonstrates how the different components work together:

1.  **Initialization:** Sets up variables to track the session state (points, history, skills). Simulates CV input and processes it (skill extraction, embedding).
2.  **Interview Loop:** Iterates through a set number of questions:
    *   Determines the **difficulty** for the next question (potentially adapting based on performance).
    *   Generates a question of the target difficulty (using `generate_question`).
    *   Simulates a user's answer.
    *   Evaluates the answer using embeddings and the Gemini API, considering question **difficulty** (calling `evaluate_answer`).
    *   Calculates and accumulates points (gamification).
    *   Tracks performance against skills tagged for the question.
    *   Potentially generates a follow-up question (using `generate_follow_up`).
3.  **Post-Interview Analysis:** After the loop, it calculates and displays:
    *   Overall performance score (e.g., average content similarity).
    *   Total points earned.
    *   Badges awarded based on performance.
    *   Benchmark comparison against simulated norms (using `calculate_benchmark`).
    *   A summary of performance per skill, identifying weak areas.
    *   Targeted study recommendations based on weak skills (using `recommend_study_topics`).
    *   **Resume tweak suggestions** based on weak skills and CV content (using `suggest_resume_tweaks`).
    *   Relevant job recommendations if the passing threshold is met (using `recommend_jobs`).

In [4]:
# ==================================================
# Cell 7: Interview Simulation Implementation
# (Adaptive Difficulty & Resume Tweaks Added)
# ==================================================
import numpy as np
import random # Keep for potential future randomization

# Ensure access to global variables/functions defined earlier
questions_df = globals().get('questions_df', pd.DataFrame())
jobs_df = globals().get('jobs_df', pd.DataFrame())
leaderboard_df = globals().get('leaderboard_df', pd.DataFrame())
# Functions from Cell 5
generate_question = globals().get('generate_question')
evaluate_answer = globals().get('evaluate_answer')
generate_follow_up = globals().get('generate_follow_up')
calculate_benchmark = globals().get('calculate_benchmark')
recommend_study_topics = globals().get('recommend_study_topics')
extract_cv_skills = globals().get('extract_cv_skills')
recommend_jobs = globals().get('recommend_jobs')
suggest_resume_tweaks = globals().get('suggest_resume_tweaks')
# Constants
MAX_POINTS_PER_QUESTION = globals().get('MAX_POINTS_PER_QUESTION', 20)
PASSING_THRESHOLD = globals().get('PASSING_THRESHOLD', 7.0) # Score out of 10

print("Starting Interview Simulation with Adaptive Difficulty & Resume Tweaks...\n")

# --- 4.1 User CV Input (Simulated) ---
# Using the detailed example CV
cv_text = """
John Doe - Data Scientist
Email: john.doe@email.com | Phone: 123-456-7890 | LinkedIn: /in/johndoe | GitHub: /johndoe

Summary:
Data Scientist with 3+ years of experience leveraging machine learning, statistical analysis, and data visualization to drive business insights. Proven ability to build end-to-end predictive models and communicate complex findings clearly. Seeking challenging roles in AI development.

Experience:
Data Scientist | Tech Solutions Inc. | 2022 - Present
- Developed customer churn prediction models using Python (Scikit-learn, XGBoost), improving retention by 15%.
- Performed A/B testing and statistical analysis to optimize marketing campaigns.
- Built interactive dashboards using Tableau and SQL to track key business metrics.

Junior Data Analyst | Data Corp | 2020 - 2022
- Cleaned and analyzed large datasets using Pandas and NumPy.
- Created reports and visualizations to support business decisions.

Education:
M.S. in Data Science | University of Advanced Tech | 2020
B.S. in Statistics | State University | 2018

Skills:
Programming: Python (Expert), SQL (Expert), R (Intermediate)
Libraries/Frameworks: Pandas, NumPy, Scikit-learn, TensorFlow, Keras, PyTorch, XGBoost, Matplotlib, Seaborn, Plotly
Tools: Tableau, Power BI, Excel, Git, Docker
Concepts: Machine Learning, Deep Learning, Statistics, Probability, A/B Testing, Data Visualization, ETL, Algorithms, Communication
Cloud: Basic experience with AWS Sagemaker
"""
print("--- CV Input ---")
print("Simulated CV provided (snippet):")
print(cv_text[:200] + "...\n")

# --- 4.2 Initial Skill Extraction & Embedding ---
print("--- CV Processing ---")
extracted_skills = []
cv_embedding = None
if extract_cv_skills: # Check if function exists
    extracted_skills = extract_cv_skills(cv_text)
    print(f"Extracted Skills from CV: {extracted_skills}")
else:
    print("Warning: extract_cv_skills function not defined.")

if get_text_embedding: # Check if function exists
    cv_embedding = get_text_embedding(cv_text)
    if cv_embedding is not None:
        print(f"CV Embedding generated (shape: {cv_embedding.shape})")
    else:
        print("Warning: Could not generate CV embedding.")
else:
     print("Warning: get_text_embedding function not defined.")


# --- 4.3 Interview Loop Initialization ---
print("\n--- Starting Mock Interview Loop ---")
interview_history = []
session_skills_performance = {} # Tracks similarity score per skill
num_questions = 4 # Number of questions
asked_question_ids = []
total_session_points = 0.0
earned_badges = set()
current_difficulty = "Medium" # Start difficulty

# --- Check if necessary functions are defined ---
if not all([generate_question, evaluate_answer, generate_follow_up, calculate_benchmark, recommend_study_topics, recommend_jobs, suggest_resume_tweaks]):
     print("\nERROR: One or more core logic functions are not defined. Please ensure Cell 5 ran correctly. Aborting simulation.")
     # Set interview_history to empty to trigger the skip condition later
     interview_history = []
else:
    # --- Interview Loop ---
    for i in range(num_questions):
        print(f"\n--- Question {i+1} of {num_questions} ---")
        print(f"(Targeting Difficulty: {current_difficulty})")

        # 1. Generate Question
        q_text, q_ideal, q_skills, q_id, q_difficulty = "Error", "N/A", "N/A", -1, "N/A" # Defaults
        try:
            q_text, q_ideal, q_skills, q_id, q_difficulty = generate_question(
                difficulty=current_difficulty,
                excluded_ids=asked_question_ids
            )
            if q_id == -2: print("No more unique questions available."); break
            if q_id == -3: print("Question bank error."); break
            asked_question_ids.append(q_id)
            print(f"Question (ID: {q_id}, Actual Diff: {q_difficulty}): {q_text}")
            print(f"(Relevant Skills: {q_skills})")
        except Exception as e:
            print(f"Error generating question {i+1}: {e}")
            continue # Skip to next question

        # 2. Simulate User Answer
        placeholder_answers = [
            "L1 adds an L1 penalty (sum of absolute weights) forcing some weights to zero for feature selection. L2 adds an L2 penalty (sum of squared weights), shrinking weights but keeping them non-zero, good for multicollinearity.",
            "Analyze missingness pattern (MCAR/MAR/MNAR). Can delete if MCAR & small %. Imputation (mean/median/mode, regression, KNN) preserves data but check distortions. Indicator variables or algorithms handling NaNs are other options.",
            "Define metric (e.g., conversion). Power analysis for sample size. Random assignment (A/B). Track results. Run fixed duration. Analyze with t-test/chi-square. Check validity.",
            "Bias is error from wrong assumptions (underfitting). Variance is error from sensitivity to training data (overfitting). Aim for minimum total error for generalization."
        ]
        user_answer = placeholder_answers[i % len(placeholder_answers)]
        print(f"\nYour Answer:\n{user_answer}")

        # 3. Evaluate Answer
        print("\n--- Evaluating Answer ---")
        evaluation = {'points': 0.0, 'similarity_score': 0.0} # Default
        try:
            evaluation = evaluate_answer(q_text, user_answer, q_ideal, q_difficulty)
            print(f"Evaluation Results (Difficulty: {q_difficulty}):")
            print(f"  - Content Score (LLM):  {evaluation.get('content_score', 'N/A')}/10")
            print(f"  - Clarity Score (LLM):  {evaluation.get('clarity_score', 'N/A')}/10")
            print(f"  - Depth Score (LLM):    {evaluation.get('depth_score', 'N/A')}/10")
            print(f"  - Similarity (Embed):   {evaluation.get('similarity_score', 'N/A')}/10")
            print(f"  - Points Earned (Scaled): {evaluation.get('points', 'N/A')} / {MAX_POINTS_PER_QUESTION}")
            print(f"\n  - Qualitative Feedback:\n{evaluation.get('qualitative_feedback', 'N/A')}")
        except Exception as e:
            print(f"Error evaluating answer {i+1}: {e}")

        # 4. Update Session State
        interview_history.append({'question': q_text, 'answer': user_answer, 'difficulty': q_difficulty, 'evaluation': evaluation})
        total_session_points += evaluation.get('points', 0.0)

        # Track skill performance using similarity score
        current_q_sim_score = evaluation.get('similarity_score', 0.0)
        if q_skills and isinstance(q_skills, str):
            for skill in q_skills.split(','):
                skill = skill.strip().lower() # Use lowercase for consistency
                if not skill: continue
                if skill not in session_skills_performance: session_skills_performance[skill] = []
                session_skills_performance[skill].append(current_q_sim_score)

        # 5. Generate Follow-up?
        if evaluation.get('points', MAX_POINTS_PER_QUESTION) < (MAX_POINTS_PER_QUESTION * 0.75):
            try:
                print("\n--- Checking for Follow-up Question ---")
                follow_up = generate_follow_up(q_text, user_answer)
                if follow_up: print(f"\nFollow-up Question: {follow_up}")
                else: print("No follow-up question generated for this answer.")
            except Exception as e: print(f"Error generating follow-up: {e}")

        # 6. Adapt Difficulty for Next Question
        last_q_points = evaluation.get('points', 0.0)
        # Define point thresholds relative to max points
        upper_threshold = MAX_POINTS_PER_QUESTION * 0.8
        lower_threshold = MAX_POINTS_PER_QUESTION * 0.5
        print(f"(Points: {last_q_points:.1f}, UpperThr: {upper_threshold:.1f}, LowerThr: {lower_threshold:.1f})") # Debug print

        if last_q_points >= upper_threshold: # Good score -> Increase difficulty
            if current_difficulty == "Easy": current_difficulty = "Medium"; print("--> Difficulty increased to Medium")
            elif current_difficulty == "Medium": current_difficulty = "Hard"; print("--> Difficulty increased to Hard")
            # If already Hard, stay Hard
        elif last_q_points < lower_threshold: # Low score -> Decrease difficulty
            if current_difficulty == "Hard": current_difficulty = "Medium"; print("--> Difficulty decreased to Medium")
            elif current_difficulty == "Medium": current_difficulty = "Easy"; print("--> Difficulty decreased to Easy")
            # If already Easy, stay Easy
        else: # Keep current difficulty
             print(f"--> Difficulty remains {current_difficulty}")


# --- Check if interview loop ran ---
if not interview_history:
    print("\n--- Interview loop did not execute any questions. Skipping Analysis. ---")
else:
    # --- 4.4 Post-Interview Analysis ---
    print("\n" + "="*30)
    print("--- Interview Complete: Analysis ---")
    print("="*30 + "\n")

    # Calculate overall score (Avg Similarity Score - 0-10 scale)
    all_similarity_scores = [item['evaluation'].get('similarity_score', 0.0) for item in interview_history]
    overall_score = np.mean(all_similarity_scores) if all_similarity_scores else 0.0
    print(f"Overall Performance Score (Avg. Content Similarity): {overall_score:.1f}/10")

    # Display Gamification Results
    print(f"Total Points Earned this Session: {total_session_points:.1f}")
    # Award Badges
    if overall_score >= PASSING_THRESHOLD: earned_badges.add("Passed!")
    if total_session_points >= (num_questions * MAX_POINTS_PER_QUESTION * 0.8): earned_badges.add("High Scorer!")
    # Add badge for consistency? e.g. if min score > threshold
    min_points = min([item['evaluation'].get('points', 0.0) for item in interview_history]) if interview_history else 0
    if min_points > (MAX_POINTS_PER_QUESTION * 0.6): earned_badges.add("Consistent Performer")
    print(f"Badges Earned: {', '.join(sorted(list(earned_badges))) if earned_badges else 'None this session'}")

    # Benchmark Calculation (uses 0-10 overall_score)
    benchmark_percentile = calculate_benchmark(overall_score)
    print(f"\nPerformance Benchmark: Estimated Top {benchmark_percentile}% (compared to simulated FAANG norms)")

    # Leaderboard Context
    print("\n--- Simulated Leaderboard Context ---")
    try:
        if not leaderboard_df.empty:
             display(leaderboard_df)
             user_rank_info = "Below Top 5"
             # Compare user's 0-10 score to leaderboard's 0-10 score
             for rank_index, leaderboard_score in enumerate(leaderboard_df['Score'], 1):
                 if overall_score >= leaderboard_score:
                     user_rank_info = f"Comparable to Rank {rank_index} or higher!"
                     break
             print(f"Your score ({overall_score:.1f}/10) places you: {user_rank_info}")
        else: print("Leaderboard data not loaded.")
    except Exception as e: print(f"Error displaying leaderboard context: {e}")

    # Skill Performance Analysis
    print("\n--- Skill Performance Summary ---")
    weakest_skills = []
    if session_skills_performance:
        print("Average Score (Similarity) per Skill Area:")
        sorted_skills = sorted(session_skills_performance.keys())
        skill_avg_scores = {} # Store avg scores for later use
        for skill in sorted_skills:
            scores = session_skills_performance[skill]
            avg_score = np.mean(scores) if scores else 0.0
            skill_avg_scores[skill] = avg_score
            print(f"  - {skill.capitalize()}: {avg_score:.1f}/10 (from {len(scores)} question(s))")
        # Identify weakest based on threshold (e.g., < 6.5 out of 10)
        weakest_skills = [skill for skill, avg_score in skill_avg_scores.items() if avg_score < 6.5]
        if not weakest_skills: print("No specific weak skill areas identified (all areas >= 6.5 avg score).")
    else: print("No skill performance data tracked.")

    # Study Recommendations
    if weakest_skills:
        print("\n--- Recommended Study Areas ---")
        # Ensure recommend_study_topics function exists
        if recommend_study_topics:
            study_recommendations = recommend_study_topics(weakest_skills)
            if study_recommendations:
                for rec in study_recommendations: print(f"  - {rec}")
            else: print("Could not find specific study recommendations.")
        else: print("Warning: recommend_study_topics function not defined.")

    # Resume Tweak Suggestions <<<<< CALL ADDED
    if weakest_skills:
        print("\n--- Resume Tweak Suggestions ---")
        # Ensure suggest_resume_tweaks function exists
        if suggest_resume_tweaks:
            resume_tweaks = suggest_resume_tweaks(weakest_skills, cv_text)
            if resume_tweaks:
                for tweak in resume_tweaks: print(f"  - {tweak}")
            else: print("No specific resume tweaks suggested based on analysis.")
        else: print("Warning: suggest_resume_tweaks function not defined.")


    # Job Recommendations
    print("\n--- Job Recommendations ---")
    # Ensure recommend_jobs function exists
    if recommend_jobs:
        if overall_score >= PASSING_THRESHOLD:
            if cv_embedding is not None:
                print(f"Congrats on meeting the threshold ({PASSING_THRESHOLD:.1f}/10)! Searching for job recommendations...")
                recommended_jobs = recommend_jobs(cv_embedding, jobs_df, top_n=3)
                if recommended_jobs:
                    print("\nTop Job Matches Found:")
                    for i, job in enumerate(recommended_jobs):
                        print(f"  {i+1}. {job['title']} @ {job['company']}")
                        print(f"     Link: {job['link']}")
                        print(f"     (Match Score: {job['similarity_score']:.3f})") # Show match score
                else: print("No suitable job recommendations found in our current database.")
            else: print("Cannot generate job recommendations (CV embedding failed).")
        else: print(f"Score ({overall_score:.1f}/10) below threshold ({PASSING_THRESHOLD:.1f}/10). Keep practicing!")
    else: print("Warning: recommend_jobs function not defined.")

print("\n--- Simulation Complete ---")

Starting Interview Simulation with Adaptive Difficulty & Resume Tweaks...

--- CV Input ---
Simulated CV provided (snippet):

John Doe - Data Scientist
Email: john.doe@email.com | Phone: 123-456-7890 | LinkedIn: /in/johndoe | GitHub: /johndoe

Summary:
Data Scientist with 3+ years of experience leveraging machine learning, ...

--- CV Processing ---
Requesting CV skill extraction from Gemini...
Gemini extracted 32 skills.
Extracted Skills from CV: ['A/B Testing', 'AWS Sagemaker', 'Algorithms', 'Communication', 'Data Analysis', 'Data Cleaning', 'Data Visualization', 'Deep Learning', 'Docker', 'ETL', 'Excel', 'Git', 'Keras', 'Machine Learning', 'Matplotlib', 'NumPy', 'Pandas', 'Plotly', 'Power BI', 'Predictive Modeling', 'Probability', 'PyTorch', 'Python', 'R', 'Report Creation', 'SQL', 'Scikit-learn', 'Seaborn', 'Statistics', 'Tableau', 'TensorFlow', 'XGBoost']


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

CV Embedding generated (shape: (384,))

--- Starting Mock Interview Loop ---

--- Question 1 of 4 ---
(Targeting Difficulty: Medium)
Selected predefined question ID 5 (Category: SQL, Difficulty: Medium).
Attempting Gemini candidate generation (for demo purposes)...
Gemini generated candidate: 'Explain how you would approach a classification problem with highly imbalanced classes, and what evaluation metrics you would prioritize.' (Candidate NOT used in flow).
Question (ID: 5, Actual Diff: Medium): Write a SQL query to find the top 3 departments with the highest average salary.
(Relevant Skills: SQL,Data Analysis)

Your Answer:
L1 adds an L1 penalty (sum of absolute weights) forcing some weights to zero for feature selection. L2 adds an L2 penalty (sum of squared weights), shrinking weights but keeping them non-zero, good for multicollinearity.

--- Evaluating Answer ---


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Requesting evaluation from Gemini...
Gemini evaluation successful.
Evaluation Results (Difficulty: Medium):
  - Content Score (LLM):  0.0/10
  - Clarity Score (LLM):  8.0/10
  - Depth Score (LLM):    7.0/10
  - Similarity (Embed):   0.0/10
  - Points Earned (Scaled): 7.5 / 20

  - Qualitative Feedback:
The candidate completely missed the question, providing an answer related to machine learning rather than SQL. While the machine learning explanation is understandable, it's crucial to address the question asked in an interview.  Focus on demonstrating skills relevant to the job description.

--- Checking for Follow-up Question ---
Requesting follow-up question from Gemini...
Gemini generated follow-up: That's a description of L1 and L2 regularization, not a SQL query.  Can you please answer the original question about the SQL query?

Follow-up Question: That's a description of L1 and L2 regularization, not a SQL query.  Can you please answer the original question about the SQL query?
(P

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Requesting evaluation from Gemini...
Gemini evaluation successful.
Evaluation Results (Difficulty: Easy):
  - Content Score (LLM):  8.0/10
  - Clarity Score (LLM):  9.0/10
  - Depth Score (LLM):    7.0/10
  - Similarity (Embed):   8.2/10
  - Points Earned (Scaled): 14.4 / 20

  - Qualitative Feedback:
The candidate demonstrates a good understanding of techniques for handling missing data.  To improve, focus on adding more detail regarding the selection of appropriate methods based on the characteristics of the missing data and the potential impact of each approach.  More explanation on the differences between listwise and pairwise deletion would also strengthen the response.

--- Checking for Follow-up Question ---
Requesting follow-up question from Gemini...
Gemini generated follow-up: Could you elaborate on when you would choose KNN imputation over regression imputation?

Follow-up Question: Could you elaborate on when you would choose KNN imputation over regression imputation?
(Poin

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Requesting evaluation from Gemini...
Gemini evaluation successful.
Evaluation Results (Difficulty: Medium):
  - Content Score (LLM):  0.0/10
  - Clarity Score (LLM):  8.0/10
  - Depth Score (LLM):    0.0/10
  - Similarity (Embed):   0.0/10
  - Points Earned (Scaled): 4.0 / 20

  - Qualitative Feedback:
The answer is entirely off-topic.  It demonstrates a misunderstanding of the question and a failure to address the core concepts of L1 and L2 regularization.  The candidate should review fundamental machine learning concepts.

--- Checking for Follow-up Question ---
Requesting follow-up question from Gemini...
Gemini generated follow-up: Your answer describes A/B testing; how does this relate to L1 and L2 regularization?

Follow-up Question: Your answer describes A/B testing; how does this relate to L1 and L2 regularization?
(Points: 4.0, UpperThr: 16.0, LowerThr: 10.0)

--- Question 4 of 4 ---
(Targeting Difficulty: Easy)
Selected predefined question ID 3 (Category: Experimental Design,

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Requesting evaluation from Gemini...
Gemini evaluation successful.
Evaluation Results (Difficulty: Hard):
  - Content Score (LLM):  0.0/10
  - Clarity Score (LLM):  8.0/10
  - Depth Score (LLM):    2.0/10
  - Similarity (Embed):   1.2/10
  - Points Earned (Scaled): 5.5 / 20

  - Qualitative Feedback:
The response demonstrates a misunderstanding of the question.  The candidate focused on concepts from machine learning (bias-variance tradeoff) instead of A/B testing methodology.  They should review the principles of A/B testing and experimental design.

--- Checking for Follow-up Question ---
Requesting follow-up question from Gemini...
Gemini generated follow-up: How would you account for and mitigate bias and variance in your A/B testing framework design?

Follow-up Question: How would you account for and mitigate bias and variance in your A/B testing framework design?
(Points: 5.5, UpperThr: 16.0, LowerThr: 10.0)

--- Interview Complete: Analysis ---

Overall Performance Score (Avg. C

Unnamed: 0_level_0,User,Score,Badges
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,AI_Legend,9.8,"Passed!,Top Performer"
2,CodeNinja,9.5,"Passed!,Top Performer"
3,DataGuru,9.1,Passed!
4,StatsWizard,8.8,Passed!
5,ProbSolver,8.5,Passed!


Your score (2.3/10) places you: Below Top 5

--- Skill Performance Summary ---
Average Score (Similarity) per Skill Area:
  - A/b testing: 1.2/10 (from 1 question(s))
  - Data analysis: 0.0/10 (from 1 question(s))
  - Data cleaning: 8.2/10 (from 1 question(s))
  - Data preprocessing: 8.2/10 (from 1 question(s))
  - Experimental design: 1.2/10 (from 1 question(s))
  - Machine learning: 0.0/10 (from 1 question(s))
  - Model tuning: 0.0/10 (from 1 question(s))
  - Regularization: 0.0/10 (from 1 question(s))
  - Sql: 0.0/10 (from 1 question(s))
  - Statistics: 1.2/10 (from 1 question(s))

--- Recommended Study Areas ---
  - Explore resources for SQL: SQL for DS (Course), Leetcode SQL (Practice)
  - Explore resources for Statistics: Stats Thinking Course (Course), Practical Stats Book (Book)
  - Explore resources for Machine Learning: ML Mastery (Blog), PRML Book (Book)
  - Explore resources for Experimental Design: Trustworthy Exp Book (Book), Udacity A/B Course (Course)

--- Resume Tweak 

## 5. Optional: Interactive Demo with Gradio

While the previous cell (Cell 7) simulates a full interview flow and shows the complete analysis, this section provides an optional interactive demo using the Gradio library. This creates a simple web interface directly in the notebook's output, allowing for interaction with JobSage's features:

*   **Interview Simulator Tab:** Practice answering one question at a time and receive immediate evaluation and feedback.
*   **Salary Negotiation Tab:** Simulate a negotiation scenario based on your inputs.
*   **(Results Tab):** Placeholder tab structure (dynamic updates from the interview tab require more advanced Gradio techniques; refer to Cell 7 output for full results analysis).

*(Note: Running interactive Gradio apps can sometimes be tricky within Kaggle notebooks depending on network/proxy settings. The comprehensive simulation output in Cell 7 serves as the primary demonstration if this Gradio interface doesn't launch correctly or has limited functionality).*

In [5]:
# ==================================================
# Cell 9: Gradio Interface Definition
# (Including Negotiation Tab - Force Install Added)
# ==================================================
print("Setting up Gradio interface (including Negotiation Tab)...")

# --- Force Install Gradio right before use --- <<< ADDED THIS BLOCK
print("Attempting to install Gradio again just in case...")
!pip install gradio --quiet
print("Forced Gradio install attempt complete.")
# --- END of Added Block ---

# --- 1. Import Gradio ---
try:
    import gradio as gr
    print("Gradio imported successfully AFTER forced install attempt.") # Added print
except ImportError:
    print("ERROR: Gradio still not found even after forced install attempt in Cell 9.")
    gr = None
except Exception as e_import:
    print(f"ERROR during Gradio import: {e_import}") # Catch other import errors
    gr = None

if gr: # Only proceed if Gradio was imported successfully
    try:
        # --- Ensure Core Logic Functions are Accessible ---
        generate_question_func = globals().get('generate_question')
        evaluate_answer_func = globals().get('evaluate_answer')
        generate_follow_up_func = globals().get('generate_follow_up')
        extract_cv_skills_func = globals().get('extract_cv_skills')
        simulate_negotiation_func = globals().get('simulate_negotiation')
        MAX_POINTS_PER_QUESTION = globals().get('MAX_POINTS_PER_QUESTION', 20)

        if not all([generate_question_func, evaluate_answer_func, generate_follow_up_func, extract_cv_skills_func, simulate_negotiation_func]):
             raise NameError("One or more required core logic functions (from Cell 5) are not defined.")

        # --- 2. Define Gradio Handler Functions ---

        # Handles ONE question-answer cycle in the Interview Tab
        def run_interview_cycle(user_answer_input, current_state):
            """Processes submitted answer, gets evaluation, prepares next question."""
            print(f"Gradio submit answer clicked. Current state index: {current_state['current_question_idx']}")
            idx = current_state["current_question_idx"]
            q_text = current_state["current_q_text"]
            q_ideal = current_state["current_q_ideal"]
            q_difficulty = current_state["current_q_difficulty"]
            num_q = current_state["total_questions"]

            if not isinstance(user_answer_input, str) or not user_answer_input.strip():
                gr.Warning("Please provide an answer before submitting.")
                return {feedback_display: gr.update(value="**Error:** Please provide an answer.")}

            # --- Evaluate Answer ---
            try:
                evaluation = evaluate_answer_func(q_text, user_answer_input, q_ideal, q_difficulty)
                current_state["session_scores"].append(evaluation)
                current_points = evaluation.get('points', 0.0)
                current_state["session_points"] += current_points

                feedback_text = f"""--- Evaluation (Difficulty: {q_difficulty}) ---
Content Score: {evaluation.get('content_score', 'N/A'):.1f}/10 | Clarity Score: {evaluation.get('clarity_score', 'N/A'):.1f}/10 | Depth Score: {evaluation.get('depth_score', 'N/A'):.1f}/10

AI Feedback:
{evaluation.get('qualitative_feedback', 'N/A')}"""
                print("Gradio evaluation complete.")
            except Exception as e:
                 print(f"Error during Gradio answer evaluation: {e}")
                 return { feedback_display: gr.update(value=f"**Error evaluating answer:** {e}") }

            # --- Generate Follow-up ---
            follow_up_text = ""
            if current_points < (MAX_POINTS_PER_QUESTION * 0.75): # Condition for follow-up
                try:
                    follow_up = generate_follow_up_func(q_text, user_answer_input)
                    if follow_up:
                        follow_up_text = f"**Follow-up Question:** {follow_up}"
                        print("Gradio generated follow-up.")
                    else:
                        print("Gradio: No follow-up triggered or generated.")
                except Exception as e:
                    print(f"Error during Gradio follow-up generation: {e}")

            # --- Prepare Updates dictionary ---
            updates = {
                feedback_display: feedback_text,
                follow_up_display: follow_up_text,
                current_q_score_display: current_points,
                total_points_display: round(current_state["session_points"], 1)
            }

            # --- Prepare for next question OR end interview ---
            next_idx = idx + 1
            if next_idx >= num_q:
                # --- END INTERVIEW ---
                print("Gradio interview finished.")
                all_sim_scores = [s.get('similarity_score', 0.0) for s in current_state["session_scores"]]
                overall_score = np.mean(all_sim_scores) if all_sim_scores else 0.0
                final_analysis_text = f"\n\n--- FINAL ANALYSIS ---\nOverall Performance (Avg Similarity): {overall_score:.1f}/10\nTotal Points: {current_state['session_points']:.1f}"

                updates[question_display] = f"## Interview Complete! ({num_q} questions asked)\n\nTotal Points: {current_state['session_points']:.1f}. See Cell 7 output for full analysis & recommendations."
                updates[answer_input] = gr.update(interactive=False, value="Interview Finished.")
                updates[submit_btn] = gr.update(interactive=False)
                updates[progress_display] = f"Finished {num_q} questions."
                updates[feedback_display] += final_analysis_text # Append summary
                current_state["interview_complete"] = True
            else:
                # --- NEXT QUESTION ---
                print(f"Preparing next question (Index: {next_idx})")
                next_difficulty = current_state["current_q_difficulty"]
                upper_threshold = MAX_POINTS_PER_QUESTION * 0.8
                lower_threshold = MAX_POINTS_PER_QUESTION * 0.5
                if current_points >= upper_threshold:
                     if next_difficulty == "Easy": next_difficulty = "Medium"
                     elif next_difficulty == "Medium": next_difficulty = "Hard"
                elif current_points < lower_threshold:
                     if next_difficulty == "Hard": next_difficulty = "Medium"
                     elif next_difficulty == "Medium": next_difficulty = "Easy"

                try:
                    asked_ids = [q['id'] for q in current_state.get("questions_asked", [])]
                    nq_text, nq_ideal, nq_skills, nq_id, nq_difficulty = generate_question_func(
                         excluded_ids=asked_ids, difficulty=next_difficulty
                    )
                    if nq_id == -2:
                         print("No more suitable questions found.")
                         updates[question_display] = "## Interview Complete!\n\nNo more suitable questions found."
                         updates[answer_input] = gr.update(interactive=False, value="")
                         updates[submit_btn] = gr.update(interactive=False)
                         current_state["interview_complete"] = True
                    else:
                         current_state["questions_asked"].append({'id': nq_id, 'text': nq_text, 'ideal': nq_ideal, 'skills': nq_skills, 'difficulty': nq_difficulty})
                         current_state["current_q_id"] = nq_id
                         current_state["current_q_text"] = nq_text
                         current_state["current_q_ideal"] = nq_ideal
                         current_state["current_q_difficulty"] = nq_difficulty

                         question_md = f"## Question {next_idx + 1} of {num_q}\n**Category:** {nq_skills} | **Difficulty:** {nq_difficulty}\n\n{nq_text}"
                         updates[question_display] = question_md
                         updates[answer_input] = gr.update(value="")
                         updates[progress_display] = f"Question {next_idx + 1} of {num_q}"
                         updates[follow_up_display] = "" # Clear follow-up

                except Exception as e:
                    print(f"Error getting next Gradio question: {e}")
                    updates[question_display] = f"Error loading next question: {e}"
                    updates[answer_input] = gr.update(interactive=False)
                    updates[submit_btn] = gr.update(interactive=False)
                    current_state["interview_complete"] = True

            current_state["current_question_idx"] = next_idx
            updates[interview_state] = current_state
            return updates
        # --- End of run_interview_cycle ---


        # Handles the start/restart button click
        def start_interview_logic(cv_text):
            """Initializes or resets the interview state and gets the first question."""
            print("Start/Restart interview button clicked.")
            num_q = 3
            new_state = {
                "current_question_idx": 0, "questions_asked": [], "current_q_id": None,
                "current_q_text": "", "current_q_ideal": "", "current_q_difficulty": "Medium",
                "total_questions": num_q, "session_scores": [], "session_points": 0.0,
                "cv_skills": [], "performance_by_category": {}, "interview_complete": False
            }
            if isinstance(cv_text, str) and cv_text.strip():
                if extract_cv_skills_func: new_state["cv_skills"] = extract_cv_skills_func(cv_text)
                else: print("Warning: CV skill extraction function unavailable.")

            try:
                q_text, q_ideal, q_skills, q_id, q_difficulty = generate_question_func(excluded_ids=[], difficulty="Medium")
                if q_id < 0 :
                    return { question_display: f"Error: Could not get first question (Code: {q_id}).", interview_state: new_state }

                new_state["questions_asked"] = [{'id': q_id, 'text': q_text, 'ideal': q_ideal, 'skills': q_skills, 'difficulty': q_difficulty}]
                new_state["current_q_id"] = q_id
                new_state["current_q_text"] = q_text
                new_state["current_q_ideal"] = q_ideal
                new_state["current_q_difficulty"] = q_difficulty

                question_md = f"## Question 1 of {num_q}\n**Category:** {q_skills} | **Difficulty:** {q_difficulty}\n\n{q_text}"

                return {
                    question_display: question_md,
                    answer_input: gr.update(interactive=True, value=""),
                    submit_btn: gr.update(interactive=True),
                    start_btn: gr.update(value="Restart Interview Session"),
                    interview_state: new_state,
                    progress_display: f"Question 1 of {num_q}",
                    feedback_display: "(Feedback will appear here after submitting answer)",
                    follow_up_display: "",
                    current_q_score_display: 0,
                    total_points_display: 0
                }
            except Exception as e:
                 print(f"Error in start_interview_logic: {e}")
                 return { question_display: f"Error starting interview: {e}", interview_state: new_state }
        # --- End of start_interview_logic ---


        # Handles the negotiation button click
        def run_negotiation_logic(job_title, years_exp, current_salary):
             """Calls the core negotiation simulation function."""
             print("Negotiate button clicked.")
             if simulate_negotiation_func:
                 # Use yield for streaming-like effect (shows message first)
                 yield gr.update(value="Simulating negotiation with AI... Please wait.")
                 result = simulate_negotiation_func(job_title, years_exp, current_salary)
                 yield gr.update(value=result)
             else:
                 yield gr.update(value="Error: Negotiation simulation function not available.")
        # --- End of run_negotiation_logic ---


        # --- 3. Create Gradio Interface Layout ---
        print("Creating Gradio interface layout...")
        with gr.Blocks(theme=gr.themes.Soft(primary_hue=gr.themes.colors.blue), title="JobSage AI Coach") as app:
            gr.Markdown("# JobSage: AI Mock Interview Coach")
            gr.Markdown("Navigate tabs for Interview Practice and Salary Negotiation Simulation.")

            # --- Tab 1: Interview Simulator ---
            with gr.Tab("Interview Simulator"):
                interview_state = gr.State({
                    "current_question_idx": -1, "questions_asked": [], "current_q_id": None,
                    "current_q_text": "", "current_q_ideal": "", "current_q_difficulty": "Medium",
                    "total_questions": 3, "session_scores": [], "session_points": 0.0,
                    "cv_skills": [], "performance_by_category": {}, "interview_complete": False
                })
                with gr.Row():
                    with gr.Column(scale=3):
                        cv_input_interview = gr.Textbox(label="Paste Your CV Text Here (Optional)", placeholder="Needed for skill analysis & job recs...", lines=5)
                        start_btn = gr.Button("Start / Restart Interview Session", variant="primary")
                        progress_display = gr.Markdown("Press 'Start' to begin.")
                        question_display = gr.Markdown("...")
                        answer_input = gr.Textbox(label="Your Answer", placeholder="Type your answer here...", lines=7, interactive=False)
                        submit_btn = gr.Button("Submit Answer", interactive=False)
                    with gr.Column(scale=2):
                        gr.Markdown("### Evaluation & Feedback")
                        current_q_score_display = gr.Number(label="Points this Q", value=0, interactive=False)
                        total_points_display = gr.Number(label="Total Session Points", value=0, interactive=False)
                        feedback_display = gr.Textbox(label="Feedback", lines=10, interactive=False)
                        follow_up_display = gr.Markdown("*Follow-up questions may appear here.*")

            # --- Tab 2: Results & Recommendations ---
            with gr.Tab("Results & Recommendations"):
                 gr.Markdown("*(This tab shows placeholder structures. Full analysis results are printed in the notebook output of Cell 7 after the simulation finishes.)*")
                 gr.Markdown("See Cell 7 output for Detailed Performance Summary, Skill Analysis, Benchmarking, Job Recommendations, Study Recommendations, and Resume Tweaks.")

            # --- Tab 3: Salary Negotiation Simulator ---
            with gr.Tab("Salary Negotiation Simulator"):
                gr.Markdown("Enter details about the role and your current situation to simulate a negotiation.")
                with gr.Row():
                    with gr.Column(scale=1):
                        job_title_input = gr.Dropdown(choices=["Data Scientist", "ML Engineer", "Data Analyst", "Research Scientist", "Data Engineer", "Software Engineer"], label="Job Title" , value = "Data Scientist")
                        years_exp_input = gr.Slider(minimum=0, maximum=20, step=1, label="Years of Experience", value=3)
                        current_salary_input = gr.Number(label="Current Approx Salary ($)", value=100000)
                        negotiate_btn = gr.Button("Simulate Negotiation", variant="primary")
                    with gr.Column(scale=2):
                        negotiation_display = gr.Textbox(label="Negotiation Simulation & Feedback", lines=15, interactive=False) # Use Textbox


            # --- 4. Connect Event Handlers ---
            start_btn.click(
                 fn=start_interview_logic,
                 inputs=[cv_input_interview],
                 outputs=[
                      question_display, answer_input, submit_btn, start_btn,
                      interview_state, progress_display, feedback_display,
                      follow_up_display, current_q_score_display, total_points_display
                 ]
            )
            submit_btn.click(
                 fn=run_interview_cycle,
                 inputs=[answer_input, interview_state],
                 outputs=[
                      feedback_display, follow_up_display, current_q_score_display,
                      total_points_display, question_display, answer_input,
                      submit_btn, progress_display, interview_state
                 ]
            )
            # Use 'yield' for negotiation simulation output
            negotiate_btn.click(
                fn=run_negotiation_logic,
                inputs=[job_title_input, years_exp_input, current_salary_input],
                outputs=[negotiation_display]
            )


        # --- 5. Launch Interface ---
        print("Launching Gradio interface... Please wait for the UI to appear below.")
        app.launch(share=True, debug=False)
        print("Gradio launch command executed.")


    except NameError as e:
        print(f"\nERROR during Gradio setup: A required function or variable might be missing.")
        print(f"Please ensure Cell 5 (Core Logic Functions) ran successfully. Details: {e}")
    except Exception as e:
        print(f"\nERROR setting up or launching Gradio interface: {e}")
        import traceback
        traceback.print_exc()
        print("The static simulation from Cell 7 serves as the primary demo if Gradio fails.")

else:
    print("Gradio library not loaded. Skipping Gradio interface setup.")

print("-" * 30)
print("GRADIO DEMO CELL COMPLETE (Attempted Force Install)")
print("-" * 30)

Setting up Gradio interface (including Negotiation Tab)...
Attempting to install Gradio again just in case...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.9/46.9 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.2/322.2 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.4/11.4 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.3/62.3 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hForced Gradio install attempt complete.
Gradio imported successfully AFTER forced install attempt.
Creating Gradio interface layout...
Launching Gradio interface... Please wait for the UI to appear below.
* Running on local URL:  http://127.0.0.1:7860
* Running on public URL

Gradio launch command executed.
------------------------------
GRADIO DEMO CELL COMPLETE (Attempted Force Install)
------------------------------


## 6. GenAI Capabilities Used

This project demonstrates several Generative AI capabilities learned during the course, fulfilling the minimum requirement of three:

1.  **Embeddings:**
    *   **What:** Dense vector representations of text capturing semantic meaning.
    *   **How Used:** The `sentence-transformers` library (`all-MiniLM-L6-v2` model), loaded via `SentenceTransformer`, is used in the `get_text_embedding` function.
        *   **Answer Evaluation:** Embeddings of the user's answer and ideal answer points are compared using cosine similarity within `evaluate_answer` to calculate the `similarity_score`, providing a quantitative measure of content overlap.
        *   **Job Recommendation (RAG Component):** The user's CV text is embedded. This embedding acts as a query vector compared against pre-computed embeddings of job descriptions (`jobs_df['embeddings']`) using cosine similarity in `recommend_jobs`. This semantic search retrieves relevant job matches.

2.  **Few-Shot Prompting:**
    *   **What:** Guiding an LLM by providing examples of the desired task format within the prompt.
    *   **How Used:** The `generate_question` function demonstrates this capability by constructing a prompt for the **Gemini API**. This prompt includes several example questions (category, difficulty, text) drawn from `questions_df` to illustrate the desired output style and domain, before asking Gemini to generate a *new*, similar question based on target parameters. *(Note: For consistent demo evaluation, a predefined question is ultimately selected, but the few-shot prompt to Gemini is executed to showcase the technique).*

3.  **Retrieval Augmented Generation (RAG):**
    *   **What:** Improving AI responses by retrieving relevant external information and providing it as context. This project uses simplified forms of RAG focused on the *retrieval* aspect based on semantic understanding or derived context.
    *   **How Used:**
        *   **Job Recommendation:** This uses **vector similarity search** (a core RAG component). The user's derived CV embedding (query) is used to retrieve the most similar job descriptions from our vector "knowledge base" (the pre-computed `jobs_df['embeddings']`). The `recommend_jobs` function performs this retrieval.
        *   **Study Recommendation:** Uses **context-based retrieval**. The `recommend_study_topics` function takes the `weakest_skills` identified during the session (derived context) and retrieves corresponding actionable advice from a structured knowledge base (`study_rec_df`).

4.  **Structured Output / Controlled Generation / GenAI Evaluation:**
    *   **What:** Using an LLM for tasks beyond freeform text generation, requiring specific output formats or performing evaluation.
    *   **How Used:**
        *   **Answer Evaluation (`evaluate_answer`):** The **Gemini API** is prompted to act as an expert evaluator. It's explicitly instructed to assess specific dimensions (Content, Clarity, Depth) and return **structured output** including numerical scores (out of 10) and textual feedback, following a defined format (using keywords like `Content Score:`, `Clarity Score:`, etc., sometimes targeted for JSON parsing). This demonstrates both GenAI Evaluation and controlled generation.
        *   **Follow-up Questions (`generate_follow_up`):** The Gemini prompt constrains the LLM to generate *only* a relevant *question*, preventing conversational filler.
        *   **CV Skill Extraction (`extract_cv_skills`):** Gemini is prompted to analyze CV text and return skills specifically formatted as a **JSON array**, demonstrating structured output generation.
        *   **Negotiation Simulation (`simulate_negotiation`):** Gemini is prompted to generate a multi-turn dialogue *and* structured feedback points, following specific formatting instructions.

This implementation showcases the integration of **Embeddings, Few-Shot Prompting, RAG (Retrieval focus), and Structured/Controlled Generation (including GenAI Evaluation)**, meeting the Capstone requirements.

## 7. Limitations & Future Work

While JobSage demonstrates several powerful GenAI capabilities, this notebook implementation has certain limitations:

**Current Limitations:**

1.  **Static Data:** The question bank (`questions_df`), job listings (`jobs_df`), study recommendations (`study_rec_df`), and negotiation scenarios (`negotiation_scenarios_df`) are based on static, predefined DataFrames. A production system would require dynamic databases and potentially API integrations (e.g., for live job listings).
2.  **Simulated Benchmarking:** The "FAANG Benchmarking" uses hardcoded mean/standard deviation values (`FAANG_MEAN_SCORE`, `FAANG_STD_DEV`). Real-world benchmarking would require anonymized data from actual applicants, which is difficult to obtain. This feature is currently illustrative.
3.  **Text-Only Interaction:** The simulation and Gradio demo rely solely on text input/output. Voice interaction would enhance realism but is complex for a notebook environment.
4.  **Basic RAG & Skill Extraction:** Job recommendations rely primarily on embedding similarity of the full description/CV. Study recommendation RAG is a simple lookup. CV skill extraction relies on Gemini's interpretation or keyword fallback. More sophisticated NLP/RAG techniques could improve these.
5.  **Limited Session State & No Persistence:** The notebook simulates a single session. Features like tracking progress over time, persistent leaderboards across users, or saving user profiles require a database backend. Gradio state is temporary.
6.  **LLM Variability & Cost:** Gemini API responses can have slight variations. Reliance on this external API incurs potential costs and dependencies. API errors (e.g., blocked prompts, network issues) are handled with fallbacks but can affect the experience.
7.  **Simplified Difficulty Adaptation:** The adaptive difficulty logic is basic (based only on the immediately preceding question's score). More sophisticated adaptation could consider overall performance or specific skill weaknesses.
8.  **Negotiation MVP:** The negotiation simulator uses Gemini to generate a full dialogue based on a prompt; it doesn't involve interactive RAG against the scenario data in this version.

**Future Work:**

1.  **Database Integration:** Transition to a database (e.g., PostgreSQL, Firestore) for users, content, history, and leaderboards.
2.  **Live Job Feeds:** Integrate with job board APIs for real-time job recommendations with filtering (location, experience).
3.  **Voice I/O:** Implement speech-to-text and text-to-speech in a full web application deployment.
4.  **Enhanced RAG & NLP:** Use vector databases (e.g., ChromaDB, Pinecone) for job matching, implement reranking, use LLMs/NER for better CV parsing, potentially use RAG for negotiation replies based on stored tactics.
5.  **Richer Gamification:** Implement more sophisticated badge logic, points visualization, and maybe peer-comparison features.
6.  **Implement Resume Tweaks (Robustly):** Use the LLM suggestions (`suggest_resume_tweaks`) more effectively, perhaps allowing users to accept/reject tweaks that update a stored CV profile.
7.  **User Authentication & Profiles:** Add secure login and profile management.
8.  **Evaluation & Refinement:** Conduct user testing to refine prompts, difficulty adaptation, scoring weights, and overall user experience. Explore GenAI evaluation metrics for assessing the quality of generated feedback/questions.

## 8. Conclusion

JobSage successfully demonstrates the integration of multiple Generative AI capabilities – including **Embeddings**, **Few-Shot Prompting**, **Retrieval Augmented Generation (RAG)**, and **Structured LLM Generation/Evaluation** via the Gemini API – to create a sophisticated mock interview coaching tool within a Kaggle Notebook.

By moving beyond simple Q&A, JobSage provides personalized value through:
*   Granular, multi-dimensional feedback (Content, Clarity, Depth).
*   Adaptive question difficulty.
*   Context-aware follow-up questions.
*   Performance benchmarking against simulated norms.
*   Gamification elements (Points, Badges).
*   Actionable recommendations for study, potential resume improvements, and relevant job applications.
*   A simulated salary negotiation practice module.

This project serves as a robust proof-of-concept, illustrating how modern AI can address key pain points in technical interview preparation. While this notebook version has limitations, particularly around data persistence and real-time data feeds, it establishes a strong foundation and showcases a clear vision for a tool that could significantly empower job seekers in competitive fields.