<a href="https://colab.research.google.com/github/cherypallysaisurya/ResuVerse/blob/main/FINAL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install required packages if needed
!pip install pdfplumber transformers scikit-learn

import pdfplumber
from transformers import pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# ---------------------------
# Utility Functions
# ---------------------------

def extract_text_from_pdf(path):
    text = ""
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            if page.extract_text():
                text += page.extract_text() + "\n"
    return text.strip()

def summarize_text(text, model, max_chunk_words=500):
    words = text.split()
    if len(words) <= 100:
        return text

    summaries = []
    chunk = []
    for word in words:
        chunk.append(word)
        if len(chunk) >= max_chunk_words:
            input_text = " ".join(chunk)
            summary = model(input_text, max_length=150, min_length=50, do_sample=False)[0]['summary_text']
            summaries.append(summary)
            chunk = []
    if chunk:
        input_text = " ".join(chunk)
        summary = model(input_text, max_length=150, min_length=50, do_sample=False)[0]['summary_text']
        summaries.append(summary)
    return "\n".join(summaries)

def compute_similarity(text1, text2):
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([text1, text2])
    similarity = cosine_similarity(vectors[0:1], vectors[1:2])[0][0]
    return round(similarity * 100, 2)

# ---------------------------
# Load Summarization Model
# ---------------------------

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# ---------------------------
# Load and Process PDFs
# ---------------------------

jd_text = extract_text_from_pdf("/content/STAFF-8600.pdf")
resume_text = extract_text_from_pdf("/content/experience.pdf")

jd_summary = summarize_text(jd_text, summarizer)
resume_summary = summarize_text(resume_text, summarizer)

# ---------------------------
# Matching Score
# ---------------------------

match_percentage = compute_similarity(jd_summary, resume_summary)

# ---------------------------
# Display Summaries and Match
# ---------------------------

print("📄 Job Description Summary:\n", jd_summary)
print("\n👤 Resume Summary:\n", resume_summary)
print(f"\n📊 Match Score: {match_percentage}%")

if match_percentage >= 75:
    print("✅ Strong match! Your experience aligns well with the job.")
elif match_percentage >= 50:
    print("⚠️ Moderate match. Some alignment, but some gaps too.")
else:
    print("❌ Low match. You may need to tailor your resume more closely.")

# ---------------------------
# Q&A Using FLAN-T5 or Similar
# ---------------------------
# ✅ Upgrade to FLAN-T5-XL for smart generative answers
from transformers import pipeline
import torch

device = 0 if torch.cuda.is_available() else -1
qa_generator = pipeline("text2text-generation", model="google/flan-t5-large", device=device)

def ask_flan(question, context):
    prompt = f"""You are a helpful assistant. Based on the job description below, answer the question in detail.

Job Description:
{context}

Question:
{question}

Answer in full sentences:"""
    response = qa_generator(prompt, max_length=256, do_sample=False)[0]['generated_text']
    return response.strip()

# 💬 Q&A Loop
print("\n💬 Ask your questions about the job description (type 'quit' to stop):")
while True:
    user_question = input("Your question: ").strip()
    if user_question.lower() == "quit":
        break
    answer = ask_flan(user_question, jd_summary)  # Use summary for focus
    print("\nAnswer:", answer)




Device set to use cpu


📄 Job Description Summary:
 The Department of Business and Economic Affairs, Office of Workforce Opportunity (BEA/OWO) is issuing this Request for Information (RFI) inviting vendors to submit their capabilities, vision and interests in the NH Works System One Stop Operator. The SWIB is mandated through the Workforce Innovation and Opportunity Act (WIOA) of 2014.
WIOA also includes the following workforce programs as One-Stop Partners which may or may not be co-located at the NH Works Offices: • Family Literacy and Adult Education Act • Vocational Rehabilitation • Career and Technical Education (Perkins Act) • Community Services Block Grant.
The State retains the right to promote transparency and to place this RFI into public domain and to make copy of the RFI available as a provision of New Hampshire access to public records laws. Please do not include any information in your RFI response that is confidential or proprietary, as the State assumes no responsibility for excluding informat

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu



💬 Ask your questions about the job description (type 'quit' to stop):
Your question: What are the Job responsibilites?

Answer: Demonstrating continuous improvement principles which include the interactive process of plan, do, check, act. The ability to meet the workforce development needs of participants and the employment needs of local employers.
Your question: What is the Contract Duration?

Answer: The State retains the right to promote transparency and to place this RFI into public domain and to make copy of the RFI available as a provision of New Hampshire access to public records laws. Please do not include any information in your RFI response that is confidential or proprietary, as the State assumes no responsibility for excluding information.
Your question: Thats not right

Answer: Please do not include any information in your RFI response that is confidential or proprietary, as the State assumes no responsibility for excluding information.
Your question: quit


#REFINED VERSION

In [1]:
!pip install --quiet pdfplumber transformers sentence-transformers scikit-learn sentencepiece

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m826.4 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.2/60.2 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:

jd_path = "/content/STAFF-8600.pdf"
resume_path = "/content/experience.pdf"

# ✅ Step 3: Import required libraries
import pdfplumber
from transformers import pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# ✅ Step 4: Utility functions
def extract_text_from_pdf(path):
    with pdfplumber.open(path) as pdf:
        return "\n".join(page.extract_text() for page in pdf.pages if page.extract_text())

def extract_scope_sections(full_text):
    lines = full_text.split('\n')
    relevant = []
    capture = False

    include_keywords = ['Job Description', 'Roles', 'Responsibilities', 'Scope of Work', 'Duties', 'Position Summary']
    end_keywords = ['Qualifications', 'Requirements', 'Skills', 'Education', 'Benefits', 'Compensation']

    for line in lines:
        lower = line.lower().strip()
        if any(kw.lower() in lower for kw in include_keywords):
            capture = True
        elif any(kw.lower() in lower for kw in end_keywords):
            capture = False
        if capture:
            relevant.append(line)
    return "\n".join(relevant) if relevant else "\n".join(lines[:100])

def summarize_text(text, model, max_chunk_words=500):
    words = text.split()
    if len(words) <= 100:
        return text
    summaries = []
    chunk = []
    for word in words:
        chunk.append(word)
        if len(chunk) >= max_chunk_words:
            input_text = " ".join(chunk)
            summary = model(input_text, max_length=150, min_length=50, do_sample=False)[0]['summary_text']
            summaries.append(summary)
            chunk = []
    if chunk:
        input_text = " ".join(chunk)
        summary = model(input_text, max_length=150, min_length=50, do_sample=False)[0]['summary_text']
        summaries.append(summary)
    return "\n".join(summaries)

def compute_similarity(text1, text2):
    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([text1, text2])
    return round(cosine_similarity(vectors[0:1], vectors[1:2])[0][0] * 100, 2)

# ✅ Step 5: Smart Hybrid Q&A Class
class SmartJDChatbot:
    def __init__(self):
        from transformers import pipeline
        import torch
        device = 0 if torch.cuda.is_available() else -1
        self.generator = pipeline("text2text-generation", model="google/flan-t5-large", device=device)

    def find_relevant_sentences(self, context, question, top_k=3):
        from sentence_transformers import SentenceTransformer, util
        model = SentenceTransformer('all-MiniLM-L6-v2')
        sentences = [s.strip() for s in context.split('.') if len(s.strip()) > 20]
        sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
        question_embedding = model.encode(question, convert_to_tensor=True)
        similarities = util.pytorch_cos_sim(question_embedding, sentence_embeddings)[0]
        top_results = similarities.argsort(descending=True)[:top_k]
        return ". ".join([sentences[i] for i in top_results])

    def ask_question(self, context, question):
        try:
            q = question.strip().lower()

            # 1️⃣ Rule-based direct answers
            if 'contract duration' in q or 'how long' in q:
                for line in context.split('\n'):
                    if any(word in line.lower() for word in ['duration', 'term', 'remain in effect', 'contract end']):
                        return line.strip()

            if any(key in q for key in ['skills', 'requirements']):
                skills = [line.strip() for line in context.split('\n') if any(word in line.lower() for word in ['skill', 'requirement', 'qualification'])]
                if skills:
                    return "Some listed skills/requirements include:\n" + "\n".join(skills[:5])

            if any(key in q for key in ['responsibilities', 'duties', 'scope']):
                resp = [line.strip() for line in context.split('\n') if any(word in line.lower() for word in ['responsible', 'duties', 'scope', 'services', 'perform'])]
                if resp:
                    return "Main responsibilities:\n" + "\n".join(resp[:5])

            # 2️⃣ Semantic retrieval
            relevant_context = self.find_relevant_sentences(context, question)

            # 3️⃣ Generative answer
            prompt = f"""Based on the following job description:

{relevant_context}

Answer the question clearly and professionally:
{question}"""
            answer = self.generator(prompt, max_length=200, do_sample=False)[0]['generated_text']
            return answer.strip()

        except Exception as e:
            return f"⚠️ Error: {e}"

# ✅ Step 6: Run the analysis

# Load and summarize
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

jd_text_full = extract_text_from_pdf(jd_path)
resume_text = extract_text_from_pdf(resume_path)

jd_scope_text = extract_scope_sections(jd_text_full)
jd_summary = summarize_text(jd_scope_text, summarizer)
resume_summary = summarize_text(resume_text, summarizer)

# Match score
score = compute_similarity(jd_summary, resume_summary)

# Output
print("\n📄 Job Description Summary:\n", jd_summary)
print("\n👤 Resume Summary:\n", resume_summary)
print(f"\n📊 Match Score: {score}%")
if score >= 75:
    print("✅ Strong match!")
elif score >= 50:
    print("⚠️ Moderate match.")
else:
    print("❌ Low match.")

# Q&A
chatbot = SmartJDChatbot()
print("\n💬 Ask questions about the job description (type 'quit' to stop):")
while True:
    question = input("Your question: ").strip()
    if question.lower() == "quit":
        break
    answer = chatbot.ask_question(jd_scope_text, question)
    print("\nAnswer:", answer)


Device set to use cpu



📄 Job Description Summary:
 The State of New Hampshire has defined the role of the Operator to be that of a coordinator. The Operator will serve as an intermediary to the public One-Stop Partners. The State retains the right to promote transparency and to place this RFI into public domain.
All inquiries concerning this RFI, including but not limited to, requests for clarifications, questions, and any changes to the RFI shall be submitted via email to the following RFI designated Points of Contact: TO: Joseph.A.Doiron@livefree.nh.gov Inquiries must be received by the Agency’s RFI Point of Contact no later than the conclusion of the Vendor Inquiry Period.

👤 Resume Summary:
 This staffing initiative is designed to deliver a team of skilled professionals who will provide comprehensive support for the development, enhancement, testing, and maintenance of high-quality software applications. The assigned staff will ensure that all software projects adhere to the highest standards of perform

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu



💬 Ask questions about the job description (type 'quit' to stop):
Your question: What is the expected process for vendors to respond to this RFI?


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Answer: inquiries concerning this RFI, including but not limited to, requests for clarifications, questions, and any changes to the RFI, shall be submitted via email to the RFI designated Points of Contact: TO: Joseph. gov Inquiries must be received by the Agency’s RFI Point of Contact no later than the conclusion of the Vendor Inquiry Period. Please do not include any information in your RFI response that is confidential or proprietary, as the State assumes no responsibility for excluding information in response to records requests
Your question: quit
