# 📘 Table of Contents

1. [🔧 Import Libraries](#-import-libraries)
2. [🚀 Initialize Resume Screening System](#-initialize-resume-screening-system)
3. [📤 Resume Upload and Text Extraction](#-resume-upload-and-text-extraction)
4. [🧠 Embedding Generation & FAISS Indexing](#-embedding-generation--faiss-indexing)
5. [📝 Resume Scoring & Evaluation](#-resume-scoring--evaluation)
6. [📊 Ranking, Fit Score & Recommendation](#-ranking-fit-score--recommendation)
7. [📁 Exporting Results & Summarization](#-exporting-results--summarization)
8. [💬 Interactive Resume Q&A](#-interactive-resume-qa)
9. [⚠️ Skill Gap Detection](#-skill-gap-detection)
10. [🔍 Resume Similarity & Plagiarism](#-resume-similarity--plagiarism)
11. [📧 Recruiter Notification & Alerts](#-recruiter-notification--alerts)
12. [📆 Interview Scheduling with Google Calendar](#-interview-scheduling-with-google-calendar)
13. [🔗 LinkedIn Profile Extraction](#-linkedin-profile-extraction)


# 🧠 Ultra-Clean Resume Screening System  
This notebook implements all 22 resume screening features using GPT-4o, FAISS, hybrid scoring, and a full RAG pipeline.


## 🔧 Install Libraries


In [None]:
# @title
# ===============================
# ✅ INSTALL ALL REQUIRED PACKAGES (ONCE)
# ===============================
!pip install -q faiss-cpu pdfplumber pytesseract python-docx fpdf \
sentence-transformers langchain langchain-community pymupdf \
langchain-openai openai matplotlib pycountry

# ✅ INSTALL SYSTEM DEPENDENCIES FOR OCR SUPPORT
!apt-get install -y poppler-utils tesseract-ocr > /dev/null 2>&1


## 🔧 Install  & Import Libraries


In [None]:

# ===============================
# ✅ INSTALL ALL REQUIRED PACKAGES
# ===============================
!pip install --upgrade openai

!pip install -q faiss-cpu
!pip install -q pdfplumber pytesseract
!pip install pymupdf
!pip install -q langchain langchain-community
!pip install langchain-openai
!pip install -U langchain-community
!pip install --upgrade langchain

!apt-get install -y poppler-utils tesseract-ocr > /dev/null 2>&1
!pip install -q langchain fpdf python-docx sentence-transformers

# ===============================
# ✅ FINAL CLEAN IMPORT BLOCK (NO DUPLICATES)
# ===============================

# Core
import os, json, re, numpy as np, pandas as pd

# NLP & ML
import faiss
from sentence_transformers import SentenceTransformer, util
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# PDF & OCR
import pdfplumber, pytesseract, fitz  # fitz = PyMuPDF
from PIL import Image
from fpdf import FPDF
from docx import Document
from io import BytesIO

# Visuals
import matplotlib.pyplot as plt

# OpenAI & LangChain

from langchain.agents import initialize_agent, Tool
from langchain.agents.agent_types import AgentType
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from openai import OpenAI



from langchain_community import chat_models  # If needed internally


# System + Uploads
import pycountry


# Optional: Colab-friendly display tools
from IPython.display import display, Markdown






## ✅ Initialize OpenAI API Key

In [None]:



# ✅ Initialize OpenAI client


import os

# ✅ Set your OpenAI key securely
os.environ["OPENAI_API_KEY"] = "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"






## 🚀 Initialize Resume Screening System


In [None]:
# ✅ STEP 1: Define your class first

class ResumeScreeningSystem:
    def __init__(self, embed_model='all-mpnet-base-v2', llm_model='gpt-4o'):
        self.model = SentenceTransformer(embed_model)
        self.client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
        self.util = util
        self.faiss = faiss
        self.re = re
        self.pd = pd
        self.TfidfVectorizer = TfidfVectorizer
        self.cosine_similarity = cosine_similarity
        self.resume_texts = {}
        self.resume_embeddings = []
        self.resume_names = []
        self.index = None
        self.tfidf = None
        self.tfidf_matrix = None
        self.job_description = ""
        self.job_embedding = None
        self.LLM_MODEL = llm_model
        self.latest_results = []

 # Resume Upload and Text Extraction

    def extract_text(self, file, filename="uploaded_file"):
        if filename.endswith(".pdf"):
            try:
                with pdfplumber.open(file) as pdf:
                    return "\n".join([p.extract_text() for p in pdf.pages if p.extract_text()])
            except:
                return self.extract_with_ocr(file)
        elif file.name.endswith(".docx"):
            return "\n".join([p.text for p in Document(file).paragraphs]) # Fixed: Corrected the list comprehension
        return ""



    def extract_with_ocr(self, file):
        try:
            pdf = fitz.open(stream=file.read(), filetype="pdf")
            text = []
            for page in pdf:
                img = Image.open(BytesIO(page.get_pixmap().tobytes()))
                text.append(pytesseract.image_to_string(img))
            return "\n".join(text)
        except Exception as e:
            print(f"❌ OCR failed: {e}")
            return ""

     # Embedding Generation & FAISS Indexing

    def generate_resume_embeddings(self):
        self.resume_names = list(self.resume_texts.keys())
        self.resume_embeddings = [self.model.encode(text) for text in self.resume_texts.values()]

    def build_faiss_index(self):
        self.generate_resume_embeddings()
        if not self.resume_embeddings:
           raise ValueError("Resume embeddings not generated yet.")
        dim = len(self.resume_embeddings[0])
        self.index = self.faiss.IndexFlatIP(dim)  # Cosine similarity via inner product

            # ✅ Normalize resume embeddings (crucial for cosine similarity)
        self.resume_embeddings = [v / np.linalg.norm(v) for v in self.resume_embeddings]
        self.index.add(np.array(self.resume_embeddings))  # ✅ Add normalized vectors to FAISS



    def build_tfidf_matrix(self):
        texts = list(self.resume_texts.values()) + [self.job_description]
        self.tfidf = self.TfidfVectorizer(stop_words='english')
        self.tfidf_matrix = self.tfidf.fit_transform(texts)

         # ✅ Save FAISS index + resume data
    def save_faiss_index(self, index_path="faiss_index.index"):
        if self.index:
            faiss.write_index(self.index, index_path)
            np.save("resume_embeddings.npy", self.resume_embeddings)
            with open("resume_names.json", "w") as f:
                 json.dump(self.resume_names, f)
            print("✅ FAISS index and metadata saved.")

    # ✅ Load saved FAISS index + resume data
    def load_faiss_index(self, index_path="faiss_index.index"):
        if os.path.exists(index_path):
           self.index = faiss.read_index(index_path)
           self.resume_embeddings = np.load("resume_embeddings.npy", allow_pickle=True)
           with open("resume_names.json") as f:
             self.resume_names = json.load(f)
           print("✅ FAISS index and metadata loaded.")
        else:
           print("⚠️ No saved FAISS index found.")


    # ✅ Multi-job matching loop
    def run_multi_job_evaluation_loop(self, job_list):
        all_results = {}
        for job in job_list:
            print(f"\n============================\n📌 Job: {job}")
            self.job_description = job
            results = self.evaluate_resumes()
            top = sorted(results, key=lambda x: -x['Similarity'])[:5]
            for r in top:
                print(f"- {r['Name']}: {r['Fit']} ({r['Similarity']})")
            all_results[job] = top
        return all_results


# Resume Scoring & Evaluation

    def evaluate_resumes(self, semantic_weight=0.7):
    # Step 1: Build indexes
        self.build_faiss_index()
        self.build_tfidf_matrix()

    # Step 2: Encode the job description
        self.job_embedding = self.model.encode(self.job_description)

    # Step 3: Semantic similarity using FAISS (cosine)
        normalized_job_embed = self.job_embedding / np.linalg.norm(self.job_embedding)
        scores, _ = self.index.search(np.array([normalized_job_embed]), len(self.resume_embeddings))
        semantic_sim = scores[0]


    # Step 4: Keyword similarity using TF-IDF cosine
        keyword_sim = self.cosine_similarity(
        self.tfidf_matrix[-1],
        self.tfidf_matrix[:-1]
    ).flatten()

    # Step 5: Combine both similarities using the semantic weight
        hybrid = semantic_weight * semantic_sim + (1 - semantic_weight) * keyword_sim
        hybrid = hybrid.tolist()  # ✅ Ensure hybrid is iterable

    # Step 6: Initialize result container
        self.latest_results = []

    # Step 7: Loop through scores and evaluate
        for i, score in enumerate(hybrid):
            category = self.categorize_score(score)
            print(f"📄 {self.resume_names[i]} – Score: {score:.4f} → {category}")

            self.latest_results.append({
            "Name": self.resume_names[i],
            "Similarity": round(score, 3),
            "Fit": self.categorize_score(score),
            "Recommendation": self.tag_highly_recommended([score])[0] # Fixed: tag_highly_recommended expects a list
        })

        return self.latest_results


    def categorize_score(self, score):
        if score >= 0.70:
             return "Excellent"
        elif score >= 0.55:
              return "Good"
        elif score >= 0.40:
              return "Fair"
        return "Poor"


    def tag_highly_recommended(self, scores, threshold=0.80):
        return ["Highly Recommended" if s >= threshold else "Standard" for s in scores]


# Interactive Resume Q&A
#RAG (Retrieval-Augmented Generation)
# 🆕 Helper: Get top-N resume contexts for RAG


    def get_top_resume_contexts(self, top_n=3):
       if self.index is None:
           raise ValueError("FAISS index not built yet.")
       if self.job_embedding is None or len(self.job_embedding) == 0:
           self.job_embedding = self.model.encode(self.job_description)

       scores, idx = self.index.search(np.array([self.job_embedding]), top_n)
       context = "\n\n".join([self.resume_texts[self.resume_names[i]] for i in idx[0]])
       return context



    def run_gpt_prompt(self, prompt):
        try:
            response = self.client.chat.completions.create(  # ✅ INVOKE the GPT model
                model=self.LLM_MODEL,
                messages=[{"role": "user", "content": prompt}]
           )
            return response.choices[0].message.content     # ✅ COMPLETION: get the model's reply
        except Exception as e:
            print(f"❌ GPT Error: {e}")
            return "⚠️ GPT failed to respond."



    def rag_query_answering(self, query):
        q_embed = self.model.encode(query)
        scores, idx = self.index.search(np.array([q_embed]), 3)
        context = "\n\n".join([self.resume_texts[self.resume_names[i]] for i in idx[0]])
        prompt = f"Answer based on resumes:\n{context}\n\nQuestion: {query}"
        return self.run_gpt_prompt(prompt)


    def rag_resume_similarity_analysis(self):
        embs = self.model.encode(list(self.resume_texts.values()), convert_to_tensor=True)
        sim_matrix = self.util.cos_sim(embs, embs).cpu().numpy()

        names = list(self.resume_texts.keys())
        results = []

        for i in range(len(names)):
            for j in range(i + 1, len(names)):  # avoid duplicate pairs
                sim_score = sim_matrix[i][j]
                percentage = round(float(sim_score) * 100, 2)
                results.append((names[i], names[j], percentage))  # ✅ correct format

        results.sort(key=lambda x: x[2], reverse=True)  # sort by similarity
        return results  # ✅ returns list of tuples



#####


# ✅ Auto-improve weakest resumes using similarity (replaces manual version)
    def rag_resume_improvement(self, dummy_input=""):
        try:
            # Get top 3 strongest resumes based on FAISS similarity
            top_scores, top_indices = self.index.search(np.array([self.job_embedding]), len(self.resume_names))
            top_names = [self.resume_names[i] for i in top_indices[0][:3]]

            # Exclude top resumes, sort others by similarity (lowest = weakest)
            non_top = [r for r in self.latest_results if r["Name"] not in top_names]
            weak_resumes = sorted(non_top, key=lambda r: r["Similarity"])[:2]  # 2 lowest-similarity resumes

            # Get top resume text context for comparison
            context = "\n\n".join([self.resume_texts[name] for name in top_names])

            # Generate improvement suggestions
            output = ""
            for r in weak_resumes:
                name = r["Name"]
                resume_text = self.resume_texts[name]
                prompt = (
                    f"You are a professional resume coach. The job description is:\n\n"
                    f"{self.job_description}\n\n"
                    f"Here are 3 strong resumes:\n{context}\n\n"
                    f"Now suggest specific improvements for the following resume (Name: {name}):\n\n"
                    f"{resume_text}\n\n"
                    f"Guidelines:\n"
                    f"- Write 3–5 bullet points\n"
                    f"- Be specific and helpful\n"
                    f"- Focus on clarity, missing content, formatting, or skills\n"
                )
                result = self.run_gpt_prompt(prompt)
                output += f"\n📄 {name} – Suggested Improvements:\n{result}\n{'-'*80}\n"

            return output or "⚠️ No weak resumes found to improve."
        except Exception as e:
            return f"❌ RAG improvement failed: {e}"









# ✅ RAG: Detect Missing Skills (Compare with top resumes)


    def detect_missing_skills(self):
        try:
            context = self.get_top_resume_contexts(top_n=3)  # 👈 You can change to top_n=5 if needed
            print(f"📄 Total resumes found: {len(self.resume_texts)}")
            output = ""
            for name, resume_text in self.resume_texts.items():
                prompt = (
                    f"🧠 You are a highly precise resume screening expert.\n\n"
                    f"🎯 TASK: Identify **only the clearly missing skills** in the target resume.\n"
                    f"DO NOT hallucinate or invent skills that are not explicitly mentioned in the job or top resumes.\n"
                    f"DO NOT guess or include generic soft skills.\n"
                    f"DO NOT explain anything.\n"
                    f"DO NOT include skills that are already present.\n"
                    f"Format the output exactly like below.\n\n"
                    f"📌 Job Description:\n{self.job_description}\n\n"
                    f"📌 Top 3 Candidate Resumes:\n{context}\n\n"
                    f"📄 Target Resume (Name: {name}):\n{resume_text}\n\n"
                    f"✅ Output Format:\n"
                    f"{name} Resume – Missing Skills: Skill1, Skill2, Skill3"
            )
                result = self.run_gpt_prompt(prompt)
                output += f"\n🧾 Analyzing Resume: {name}\n{result}\n"
            return output
        except Exception as e:
            return f"❌ RAG skill detection failed: {e}"





    # ✅ Generate mock interview questions based on job, top resumes, and target resume
    def generate_mock_questions(self, resume_text):
        try:
            context = self.get_top_resume_contexts(top_n=3)  # Top 3 candidates as context
            prompt = (
            "You are a technical interviewer creating mock interview questions for a candidate.\n\n"
            "Here is the job description:\n"
            f"{self.job_description}\n\n"
            "Here are examples of top candidate resumes:\n"
            f"{context}\n\n"
            "Here is the target candidate's resume:\n"
            f"{resume_text}\n\n"
            "Based on the job requirements and top resumes, generate 5 strong mock interview questions tailored to this candidate. "
            "Focus on skills, experience, and any competitive gaps.\n\n"
            "Format your output as a numbered list."
           )
            return self.run_gpt_prompt(prompt)
        except Exception as e:
            return f"❌ Error generating mock questions: {e}"



    # ✅ RAG: Recommend Job Roles

    def recommend_roles(self, dummy_input=""):
        try:
            import pandas as pd  # ✅ for Excel export

            # ✅ Get top 3 resumes for RAG context
            scores, indices = self.index.search(np.array([self.job_embedding]), len(self.resume_names))
            top_indices = indices[0][:3]
            top_names = [self.resume_names[i] for i in top_indices]
            context = "\n\n".join([self.resume_texts[name] for name in top_names])

           # ✅ Prepare a list to hold Excel data
            excel_data = []

           # ✅ Loop through ALL resumes
            for name, resume_text in self.resume_texts.items():
            # 🔢 Determine Level
                if name in top_names:
                 level = "TOP"
                else:
                     idx = self.resume_names.index(name)
                     sim_score = scores[0][idx] * 100
                     if sim_score >= 70:
                        level = "MID"
                     else:
                        level = "LOW"

            # ✅ GPT prompt for job roles
            prompt = (
                   f"You are an expert career advisor. "
                   f"Based on this resume, the job description, and the top resumes, "
                   f"recommend exactly 3 job roles that best match the resume’s skills and experience.\n\n"
                   f"Job Description:\n{self.job_description}\n\n"
                   f"Top Resumes:\n{context}\n\n"
                   f"Target Resume (Name: {name}):\n{resume_text}\n\n"
                   f"Return only 3 job titles, each on its own line."
             )
            result = self.run_gpt_prompt(prompt)

            # ✅ Clean and split roles
            roles = [r.strip("-• \n") for r in result.split("\n") if r.strip()]
            while len(roles) < 3:  # ensure always 3 roles
                roles.append("")

            role1, role2, role3 = roles[:3]

            # ✅ Add to Excel data
            excel_data.append({
                "Resume Name": name,
                "Level": level,
                "Role 1": role1,
                "Role 2": role2,
                "Role 3": role3
            })

        # ✅ Convert to DataFrame
            df = pd.DataFrame(excel_data)

        # ✅ Save to Excel
            excel_file = "/content/job_role_recommendations.xlsx"
            df.to_excel(excel_file, index=False)

            return f"✅ Excel exported: {excel_file}\n📂 Check the left sidebar to download."
        except Exception as e:
            return f"❌ RAG role recommendation failed: {e}"


##############################################################
# Export & Summarization

    def generate_resume_summary_pdf(self, top_n=3, filename="top_resumes.pdf"):
        top = sorted(self.latest_results, key=lambda x: -x['Similarity'])[:top_n]
        pdf = FPDF(); pdf.add_page(); pdf.set_font("Arial", size=12)
        for r in top:
            summary = self.run_gpt_prompt(f"Summarize this resume:\n\n{self.resume_texts[r['Name']]}")
            pdf.multi_cell(0, 10, f"Name: {r['Name']}\n\nSummary:\n{summary}\n\n---\n")
        pdf.output(filename)

    def export_resumes_to_pdf(self, results=None, filename="results.pdf"):
        if not results: results = self.latest_results
        pdf = FPDF(); pdf.add_page(); pdf.set_font("Arial", size=12)
        for r in results:
            pdf.multi_cell(0, 10, f"Name: {r['Name']}\nScore: {r['Similarity']}\nFit: {r['Fit']}\n---\n")
        pdf.output(filename)

    def export_resumes_to_excel(self, results=None, filename="results.xlsx"):
        if not results: results = self.latest_results
        self.pd.DataFrame(results).to_excel(filename, index=False)

#   Analytics & Visualization

    def show_dashboard(self):
        df = self.pd.DataFrame(self.latest_results)
        counts = df['Fit'].value_counts()
        counts.plot(kind='bar')
        plt.title("Resume Fit Distribution")
        plt.xlabel("Fit Category")
        plt.ylabel("Number of Resumes")
        plt.grid(True)
        plt.show()


    def generate_summary_pdf_for_top_n(self, top_n):


        top_resumes = sorted(self.latest_results, key=lambda x: x.get('Similarity', 0), reverse=True)[:top_n]

        if not top_resumes:
            return "❌ No resume data found. Please evaluate resumes first."

        pdf = FPDF()
        pdf.add_page()
        pdf.set_font("Arial", size=12)
        pdf.cell(200, 10, txt=f"Summary of Top {top_n} Resumes", ln=True, align='C')
        pdf.ln(5)

        for i, resume in enumerate(top_resumes, start=1):
            pdf.multi_cell(0, 10, txt=f"""
{i}. Name: {resume['Name']}
   Fit: {resume['Fit']}
   Similarity: {resume['Similarity']}
   Recommendation: {resume['Recommendation']}
""")
            pdf.ln(2)

        output_path = "/content/top_resume_summary.pdf"
        pdf.output(output_path)

        return output_path

 # Recruiter Interaction
    def generate_invite_email(self, name, job):
        return self.run_gpt_prompt(f"Write an interview invitation to {name} for {job} role.")

    def generate_job_description(self, title):
        return self.run_gpt_prompt(f"Write job description for: {title}")


    def auto_alert_recruiter(self, job_title="N/A"):
        top_resumes = [r for r in self.latest_results if r['Recommendation'] == "Highly Recommended"]
        if not top_resumes:
            print("📭 No Highly Recommended resumes found.")
            return

        print("\n📢 --- Recruiter Alert ---")
        print(f"🔔 Top matching resumes found for job: {job_title}")
        print(f"Number of top resumes: {len(top_resumes)}")
        print("Suggested action: Contact candidates or schedule interviews.")
        print("Names of top candidates:")
        for r in top_resumes:
            print(f"✅ {r['Name']} — Fit: {r['Fit']}, Score: {r['Similarity']}")
        print("\n📬 Manually alert recruiter by email or internal system.")

    def schedule_interview(self, candidate_email, candidate_name, job_title, date_time_str):
        subject = f"Interview Invitation for {job_title}"
        message = self.generate_invite_email(candidate_name, job_title)

        print("\n📧 --- Interview Invitation Email ---")
        print(f"To: {candidate_email}")
        print(f"Subject: {subject}")
        print("Message:\n")
        print(message)

        print("\n📅 --- Calendar Event Info ---")
        print(f"Event: Interview for {job_title}")
        print(f"Candidate: {candidate_name}")
        print(f"Date & Time (UTC): {date_time_str}")
        print("📌 Copy & paste this into your Gmail and Google Calendar manually.")

 # Utility Functions

    def extract_skill_profile(self, text):
        return self.run_gpt_prompt(f"List skills (tech + soft):\n\n{text}")

    def extract_contact_links(self, text):
        email = self.re.findall(r"[\w.-]+@[\w.-]+", text)
        linkedin = self.re.findall(r"https?://(?:www\.)?linkedin\.com/in/[\w-]+", text)
        github = self.re.findall(r"https?://(?:www\.)?github\.com/[\w-]+", text)
        return {"emails": list(set(email)), "linkedin": list(set(linkedin)), "github": list(set(github))}


    def detect_country_from_text(self, text):
        return list(set([c.name for c in pycountry.countries if c.name.lower() in text.lower()]))

    def show_system_status(self):
        print(f"✅ Resumes Loaded: {len(self.resume_texts)}")
        print(f"✅ Job Description Set: {'Yes' if self.job_description else 'No'}")
        print(f"✅ FAISS Index: {'Yes' if self.index else 'No'}")
        print(f"✅ Embeddings Ready: {'Yes' if self.resume_embeddings else 'No'}")


    ### cleanup command to allow re-running without kernel restart:

    def reset_system(self):
        self.resume_texts = {}
        self.resume_embeddings = []
        self.resume_names = []
        self.index = None
        self.tfidf = None
        self.tfidf_matrix = None
        self.latest_results = []
        self.job_description = ""
        self.job_embedding = None

## 🚀 Run & Evaluate the Resume Screening System


In [None]:
# ✅ Run & Evaluate the Resume Screening System (all in one cell)

# ✅ Test if OpenAI GPT-4o integration is working
test = ResumeScreeningSystem()
reply = test.client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Say hi"}]
)
print("✅ GPT-4o Response:", reply.choices[0].message.content)

# ✅ STEP 2: Create an instance of the system
system = ResumeScreeningSystem()

# ✅ STEP 3: Automatically load PDF resumes from sidebar
import io  # Only needs to be imported once


resume_folder = "/content"  # PDFs should be uploaded in sidebar (left pane)
pdf_files = [f for f in os.listdir(resume_folder) if f.endswith(".pdf")]

if not pdf_files:
    print("⚠️ No PDF resumes found in /content. Please upload them using the left sidebar (Files tab).")
else:
    for filename in pdf_files:
        path = os.path.join(resume_folder, filename)
        print(f"📂 Processing: {filename} ...", end=" ")
        try:
            with open(path, "rb") as f:
                file = io.BytesIO(f.read())
                text = system.extract_text(file, filename)
                if text.strip():
                    system.resume_texts[filename] = text
                    print("✅ Success")
                else:
                    print("⚠️ Empty content after extraction.")
        except Exception as e:
            print(f"❌ Failed: {e}")


# ✅ STEP 4: Load saved FAISS index if available

system.build_faiss_index()


# ✅ STEP 5: Run evaluation on multiple job descriptions
job_list = [
    "AI Engineer with deployment experience",
    "Data Scientist with Python and ML",
    "NLP Researcher in Arabic Language",
]

# ✅ Add safety check before evaluating
if not system.resume_texts:
    print("❌ No resumes loaded. Please upload resumes before evaluating.")
else:
    system.run_multi_job_evaluation_loop(job_list)

# ✅ STEP 6: Save FAISS index after evaluation
system.save_faiss_index()




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


✅ GPT-4o Response: Hi there! How can I assist you today?
📂 Processing: Adam John I.pdf ... ⚠️ Empty content after extraction.
📂 Processing: Jessica Thompson.pdf ... ✅ Success
📂 Processing: James  Patel .pdf ... ✅ Success
📂 Processing: Maria Lopez.pdf ... ✅ Success
📂 Processing: John  Carter .pdf ... ✅ Success
📂 Processing: Carlos Mendoza .pdf ... ✅ Success
📂 Processing: Amanda Hughes.pdf ... ✅ Success
📂 Processing: Emily Zhao.pdf ... ✅ Success
📂 Processing: Isaac Roberts .pdf ... ✅ Success
📂 Processing: Elizabeth Johnson.pdf ... ✅ Success
📂 Processing: Marcus Fields.pdf ... ✅ Success
📂 Processing: Michael Carter.pdf ... ✅ Success
📂 Processing: Ethan Morales .pdf ... ✅ Success
📂 Processing: David Nguyen.pdf ... ✅ Success
📂 Processing: Amanda Hughes - Copy.pdf ... ✅ Success
📂 Processing: Laura Kim .pdf ... ✅ Success
📂 Processing: Jonathan Lee.pdf ... ✅ Success
📂 Processing: Daniel Evans.pdf ... ✅ Success
📂 Processing: Ethan Clark.pdf ... ✅ Success

📌 Job: AI Engineer with deployment expe

 ## 📦 LangChain Tools Integration and Agent Initialization Block

In [None]:
# ✅ Define tools


from langchain.tools import tool

# 📊 Resume Evaluation
@tool
def evaluate_resumes(job_description: str) -> str:
    """Evaluate resumes against a job description and return top candidates."""
    system.job_description = job_description
    results = system.evaluate_resumes()
    top = sorted(results, key=lambda x: -x['Similarity'])[:5]
    return "\n".join([f"{r['Name']}: {r['Fit']} ({r['Similarity']})" for r in top])



@tool
def show_dashboard(dummy: str = "") -> str:
    """Display a bar chart of fit categories from the latest evaluation."""
    system.show_dashboard()
    return "📊 Dashboard displayed."



@tool
def show_highly_recommended(dummy: str = "") -> str:
    """Show resumes that are marked as 'Highly Recommended'."""
    top = [r for r in system.latest_results if r['Recommendation'] == "Highly Recommended"]
    return "\n".join([f"{r['Name']}: {r['Fit']} ({r['Similarity']})" for r in top])


@tool
def export_resumes_to_excel(dummy: str = "") -> str:
    """Export current resume evaluation results to Excel."""
    system.export_resumes_to_excel()
    return "✅ Excel export completed."


@tool
def export_resumes_to_pdf(dummy: str = "") -> str:
    """Export current resume evaluation results to PDF."""
    system.export_resumes_to_pdf()
    return "✅ PDF export completed."





@tool
def generate_resume_summary_pdf(tool_input: str) -> str:
    """Generate a summary PDF for the top N resumes and save it."""
    cleaned = tool_input.strip()

    if not cleaned.isdigit():
        return "⚠️ Enter a valid number like '3'."

    top_n = int(cleaned)
    if top_n <= 0:
        return "⚠️ Please enter a number greater than 0."

    # ✅ Call the actual summary PDF generation logic here
    # Replace this with your real logic
    result = system.generate_summary_pdf_for_top_n(top_n)

    return f"✅ Summary PDF generated for top {top_n} resumes.\n📎 Check left sidebar (Files tab) to download: {result}"





# 📧 Email + Calendar

@tool
def generate_invite_email(data: str) -> str:
    """Generate an interview invitation email for a given candidate and job."""
    try:
        name = data.split(",")[0].split(":")[1].strip()
        job = data.split(",")[1].split(":")[1].strip()
        return system.generate_invite_email(name, job)
    except:
        return "⚠️ Format: name: John, job: Data Scientist"


@tool
def schedule_interview(dummy: str = "") -> str:
    """Display interview invitation details manually."""
    return "📅 Copy & paste interview info from system.schedule_interview() manually"


# 🧠 GPT-Based Intelligence
@tool
def generate_job_description(title: str) -> str:
    """Generate a job description using GPT based on a title."""
    return system.generate_job_description(title)


@tool
def rag_resume_similarity_analysis(dummy: str = "") -> str:
    """Find resume pairs with similarity percentage and export to CSV."""
    try:
        results = system.rag_resume_similarity_analysis()

        # ✅ Fix ambiguous truth value error
        if results is None or len(results) == 0:
            return "⚠️ Not enough resumes to compare."

        import pandas as pd

        # ✅ Save results to CSV
        df = pd.DataFrame(results, columns=["Resume A", "Resume B", "Similarity (%)"])
        csv_path = "/content/resume_similarity_results.csv"
        df.to_csv(csv_path, index=False)

        # ✅ Format top 10 most similar for CLI
        top_matches = results[:10]
        formatted = "\n".join([
            f"✅ {a} ↔ {b}: {score}%" for a, b, score in top_matches
        ])

        return f"🔍 Top Resume Similarities:\n{formatted}\n\n📁 Full results saved to: {csv_path}"

    except Exception as e:
        return f"⚠️ Error computing similarities: {e}"





@tool
def detect_missing_skills(tool_input: str = "") -> str:
    """Detect missing skills in each resume based on job and top candidates."""
    return system.detect_missing_skills()


@tool
def generate_mock_questions(resume_text: str) -> str:
    """Generate 5 mock interview questions based on a resume, job description, and top candidates."""
    return system.generate_mock_questions(resume_text)


@tool
def rag_resume_improvement(resume_text: str) -> str:
    """Suggest resume improvements using top candidates as context."""
    return system.rag_resume_improvement(resume_text)

@tool
def recommend_roles(resume_text: str) -> str:
    """Recommend 3 job roles that best fit the resume content."""
    return system.recommend_roles(resume_text)

@tool
def extract_skill_profile(resume_text: str) -> str:
    """Extract technical and soft skills from a resume."""
    return system.extract_skill_profile(resume_text)

# 🔍 Contact / Metadata / System

@tool
def extract_contact_links(resume_text: str) -> str:
    """Extract email, LinkedIn, and GitHub links from a resume."""
    links = system.extract_contact_links(resume_text)
    return f"✉️ Emails: {links['emails']}\n🔗 LinkedIn: {links['linkedin']}\n🐙 GitHub: {links['github']}"


@tool
def reset_system(dummy: str = "") -> str:
    """Reset the system to clear loaded resumes and state."""
    system.reset_system()
    return "🔁 System reset completed."



@tool
def show_system_status(dummy: str = "") -> str:
    """Display internal system status (e.g., embeddings ready, index loaded)."""
    from io import StringIO
    output = StringIO()
    print("✅ Resumes Loaded:", len(system.resume_texts), file=output)
    print("✅ Job Description Set:", bool(system.job_description), file=output)
    print("✅ FAISS Index:", bool(system.index), file=output)
    print("✅ Embeddings Ready:", bool(system.resume_embeddings), file=output)
    return output.getvalue()



@tool
def detect_country_from_text(resume_text: str) -> str:
    """Detect countries mentioned in the resume text."""
    countries = system.detect_country_from_text(resume_text)
    return f"🌍 Countries Detected: {countries}" if countries else "🌐 No country found."


@tool
def auto_alert_recruiter(job_title: str = "") -> str:
    """Trigger an alert with top candidate names for a specific job title."""
    system.auto_alert_recruiter(job_title=job_title)
    return f"📢 Recruiter alert issued for job: {job_title}"



@tool
def rag_query_answering(query: str) -> str:
    """Ask a natural language question and get an answer based on top resumes."""
    return system.rag_query_answering(query)

@tool
def run_multi_job_evaluation(job_text: str) -> str:
    """
    Run resume screening for multiple job descriptions (comma-separated).
    Example: "Data Scientist, NLP Engineer, AI Researcher"
    """
    job_list = [j.strip() for j in job_text.split(",")]
    output = system.run_multi_job_evaluation_loop(job_list)
    return "✅ Multi-job evaluation completed. See printed results above."



@tool
def save_faiss_index(dummy: str = "") -> str:
    """Save the FAISS index and metadata to disk."""
    system.save_faiss_index()
    return "💾 FAISS index and metadata saved."



@tool
def load_faiss_index(dummy: str = "") -> str:
    """Load a previously saved FAISS index and metadata from disk."""
    system.load_faiss_index()
    return "🔁 FAISS index and metadata loaded."



@tool
def exit(dummy: str = "") -> str:
    """Exit the session."""
    return "👋 Session ended. Type Ctrl+C or close the notebook to fully stop."




tools = [
    evaluate_resumes,
    rag_query_answering,
    show_dashboard,
    show_highly_recommended,
    export_resumes_to_excel,
    export_resumes_to_pdf,
    generate_resume_summary_pdf,
    generate_invite_email,
    schedule_interview,
    generate_job_description,
    rag_resume_similarity_analysis,
    detect_missing_skills,
    generate_mock_questions,
    rag_resume_improvement,
    recommend_roles,
    extract_skill_profile,
    extract_contact_links,
    reset_system,
    show_system_status,
    detect_country_from_text,
    auto_alert_recruiter,
    run_multi_job_evaluation,
    save_faiss_index,
    load_faiss_index,
    exit,
]



# ✅ Initialize LangChain agent Setup


llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    openai_api_key=os.environ["OPENAI_API_KEY"] )


# ✅ Initialize memory
memory = ConversationBufferMemory(memory_key="chat_history")

# ✅ Initialize the agent with tools, memory, and verbosity
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)


# ✅ STEP 1–3: Job Evaluation, Recruiter Alert, and Interview Scheduling
# helper function used to run a preliminary evaluation pipeline before entering the CLI.
def run_initial_pipeline():
    # ✅ STEP 1: Enter Job Description and Run Evaluation
    system.job_description = input("📄 Enter the job description:\n")
    results = system.evaluate_resumes()

    # ✅ Show Top Candidates
    print("\n📊 Top Candidates:")
    for r in sorted(results, key=lambda x: -x["Similarity"])[:5]:
        print(f"{r['Name']}: {r['Fit']} ({r['Similarity']})")

    # ✅ STEP 2: Notify Recruiter
    system.auto_alert_recruiter(job_title="Data Scientist")

    # ✅ STEP 3: Generate Email + Schedule Interview
    top_candidate = results[0]
    contact_info = system.extract_contact_links(system.resume_texts[top_candidate["Name"]])
    candidate_email = contact_info["emails"][0] if contact_info["emails"] else "placeholder@example.com"

    system.schedule_interview(
        candidate_email=candidate_email,
        candidate_name=top_candidate["Name"],
        job_title="Data Scientist",
        date_time_str="2025-07-05T15:00:00"
    )

  llm = ChatOpenAI(
  memory = ConversationBufferMemory(memory_key="chat_history")
  agent = initialize_agent(


# 🛠️ Interactive CLI Agent Interface[

In [None]:
# ✅ FIX: Run initial pipeline in background so CLI doesn't get blocked
import threading

def safe_run_initial_pipeline():
    try:
        run_initial_pipeline()
        print("✅ Initial pipeline finished.")
    except Exception as e:
        print(f"❌ Pipeline error: {e}")

# 🔄 Run in background
threading.Thread(target=safe_run_initial_pipeline, daemon=True).start()



📄 Enter the job description:
data science


In [None]:
# === CLI Interface ===

print("\n🎯 Tool-Based Resume Screening Agent CLI - 25 Features in Order")
print("Type the number (1-25) of the command you want to run:")
print("""
 1.  evaluate_resumes
 2.  rag_query_answering
 3.  show_dashboard
 4.  show_highly_recommended
 5.  export_resumes_to_excel
 6.  export_resumes_to_pdf
 7.  generate_resume_summary_pdf
 8.  generate_invite_email
 9.  schedule_interview
10.  generate_job_description
11.  rag_resume_similarity_analysis
12.  detect_missing_skills
13.  generate_mock_questions
14.  rag_resume_improvement
15.  recommend_roles
16.  extract_skill_profile
17.  extract_contact_links
18.  reset_system
19.  show_system_status
20.  detect_country_from_text
21.  auto_alert_recruiter
22.  run_multi_job_evaluation
23.  save_faiss_index
24.  load_faiss_index
25.  exit
""")

while True:
    try:
        cmd = input("\n🔧 Enter command number (1-25): ").strip()

        if cmd == "1":
            job = input("Enter job description:\n")
            print(evaluate_resumes.run(job))

        elif cmd == "2":
            query = input("Enter your query:\n")
            print(rag_query_answering.run(query))

        elif cmd == "3":
            print(show_dashboard.run(tool_input="show me dashboard"))

        elif cmd == "4":
            print(show_highly_recommended.run(tool_input=""))

        elif cmd == "5":
            print(export_resumes_to_excel.run(tool_input=""))

        elif cmd == "6":
            print(export_resumes_to_pdf.run(tool_input=""))







        elif cmd == "7":
            top_n = input("How many top resumes? (e.g. 3): ").strip()
            print(generate_resume_summary_pdf.run(tool_input=top_n))




        elif cmd == "8":
            data = input("Format: name: John, job: Data Scientist\n")
            print(generate_invite_email.run(data))

        elif cmd == "9":
            scheduled = 0
            for resume in system.latest_results:
                if resume["Recommendation"] == "Highly Recommended":
                    name = resume["Name"]
                    email = resume.get("Email", f"{name.replace(' ', '').lower()}@example.com")
                    job = "Interview Candidate"
                    time = f"2025-07-21 {10 + scheduled}:00"  # Example times: 10:00, 11:00, etc.

                    system.schedule_interview(email, name, job, time)
                    scheduled += 1

            if scheduled == 0:
                print("⚠️ No 'Highly Recommended' candidates found.")
            else:
                print(f"✅ Scheduled interviews for {scheduled} candidates.")


        elif cmd == "10":
            title = input("Enter job title:\n")
            print(generate_job_description.run(title))

        elif cmd == "11":
            print(rag_resume_similarity_analysis.run(tool_input=""))



        elif cmd == "12":
                 print("🔍 Detecting missing skills for each resume based on your entered job...\n")
                 print(detect_missing_skills.run(tool_input=""))



        elif cmd == "13":
            text = input("Paste resume text:\n")
            print(generate_mock_questions.run(text))



        elif cmd == "14":
             print(rag_resume_improvement.run(tool_input=""))




        elif cmd == "15":
             print("🎯 Recommending job roles for ALL resumes (PDF table)...")
             print(recommend_roles.run(tool_input=""))
             print("📂 Check the sidebar for job_role_recommendations.pdf")

        elif cmd == "16":
            text = input("Paste resume text:\n")
            print(extract_skill_profile.run(text))

        elif cmd == "17":
            text = input("Paste resume text:\n")
            print(extract_contact_links.run(text))

        elif cmd == "18":
            print(reset_system.run(tool_input=""))

        elif cmd == "19":
            print(show_system_status.run(tool_input=""))

        elif cmd == "20":
            text = input("Paste resume text:\n")
            print(detect_country_from_text.run(text))

        elif cmd == "21":
            job_title = input("Enter job title:\n")
            print(auto_alert_recruiter.run(job_title))

        elif cmd == "22":
            jobs = input("Comma-separated jobs (e.g. Data Scientist, NLP Engineer):\n")
            print(run_multi_job_evaluation.run(jobs))

        elif cmd == "23":
            print(save_faiss_index.run(tool_input=""))

        elif cmd == "24":
            print(load_faiss_index.run(tool_input=""))

        elif cmd == "25":
            print("👋 Exiting CLI.")
            break

        else:
            print("⚠️ Invalid number. Please enter 1 to 25.")

    except Exception as e:
        print(f"❌ Error: {e}")


🎯 Tool-Based Resume Screening Agent CLI - 25 Features in Order
Type the number (1-25) of the command you want to run:

 1.  evaluate_resumes
 2.  rag_query_answering
 3.  show_dashboard
 4.  show_highly_recommended
 5.  export_resumes_to_excel
 6.  export_resumes_to_pdf
 7.  generate_resume_summary_pdf
 8.  generate_invite_email
 9.  schedule_interview
10.  generate_job_description
11.  rag_resume_similarity_analysis
12.  detect_missing_skills
13.  generate_mock_questions
14.  rag_resume_improvement
15.  recommend_roles
16.  extract_skill_profile
17.  extract_contact_links
18.  reset_system
19.  show_system_status
20.  detect_country_from_text
21.  auto_alert_recruiter
22.  run_multi_job_evaluation
23.  save_faiss_index
24.  load_faiss_index
25.  exit

📄 Jessica Thompson.pdf – Score: 0.3671 → Poor
📄 James  Patel .pdf – Score: 0.3219 → Poor
📄 Maria Lopez.pdf – Score: 0.3861 → Poor
📄 John  Carter .pdf – Score: 0.3195 → Poor
📄 Carlos Mendoza .pdf – Score: 0.3150 → Poor
📄 Amanda Hughes.pd

In [None]:
print(system.resume_texts.keys())


dict_keys(['Elizabeth Johnson.pdf', 'Daniel Evans.pdf', 'Jonathan Lee.pdf', 'Emily Zhao.pdf', 'Isaac Roberts .pdf', 'Amanda Hughes - Copy.pdf', 'David Nguyen.pdf', 'Emily Zhang .pdf', 'Isabella Torres.pdf', 'Ethan Morales .pdf', 'Carlos Mendoza .pdf', 'John  Carter .pdf', 'James  Patel .pdf', 'Amanda Hughes.pdf', 'Ethan Clark.pdf', 'Kevin O’Brien .pdf', 'Jessica Thompson.pdf', 'Ava Johnson.pdf'])


In [None]:
print("✅ Total resumes stored:", len(system.resume_texts))
print("✅ Resume names:", list(system.resume_texts.keys()))


✅ Total resumes stored: 18
✅ Resume names: ['Elizabeth Johnson.pdf', 'Daniel Evans.pdf', 'Jonathan Lee.pdf', 'Emily Zhao.pdf', 'Isaac Roberts .pdf', 'Amanda Hughes - Copy.pdf', 'David Nguyen.pdf', 'Emily Zhang .pdf', 'Isabella Torres.pdf', 'Ethan Morales .pdf', 'Carlos Mendoza .pdf', 'John  Carter .pdf', 'James  Patel .pdf', 'Amanda Hughes.pdf', 'Ethan Clark.pdf', 'Kevin O’Brien .pdf', 'Jessica Thompson.pdf', 'Ava Johnson.pdf']
