**Project Goal** - **AI-Powered Resume Evaluator**

The goal of this project is to build an AI-driven tool that analyzes resumes (PDF or DOCX), extracts structured information from different sections (such as Education, Skills, Experience, Projects, etc.), and intelligently compares them with a target job description.

By leveraging transformer-based language models, the system provides section-wise, human-like feedback: it highlights gaps, suggests improvements when needed, and confirms when a section already aligns well with the job requirements.

In [2]:
!pip install transformers torch PyMuPDF pdfplumber python-docx scikit-learn

Collecting PyMuPDF
  Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Collecting pdfplumber
  Downloading pdfplumber-0.11.7-py3-none-any.whl.metadata (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-docx
  Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.wh

In [3]:
# importing necessary libraries
import os
import fitz              # PyMuPDF for PDF
import pdfplumber        # fallback for tricky PDFs
import docx              # python-docx for DOCX
import re
from transformers import pipeline

**Step 1: Robust File Parser**

In [4]:
def extract_text_from_docx(file_path):
    """
    Extracts and concatenates text from all paragraphs in a DOCX file.
    """
    doc = docx.Document(file_path)
    return "\n".join([para.text for para in doc.paragraphs if para.text.strip()])

In [5]:
def extract_text_from_pdf(file_path):
    """
    Extracts text from PDF using PyMuPDF.
    Falls back to pdfplumber if needed.
    """
    text = ""
    try:
        doc = fitz.open(file_path)
        for page in doc:
            text += page.get_text() + "\n"
    except Exception:
        print("⚠️ PyMuPDF failed, trying pdfplumber...")
        with pdfplumber.open(file_path) as pdf:
            for page in pdf.pages:
                page_text = page.extract_text()
                if page_text:
                    text += page_text + "\n"
    return text

In [6]:
def extract_text(file_path):
    """
    Detect file extension (.pdf or .docx) and extract text accordingly.
    """
    ext = os.path.splitext(file_path)[1].lower()
    if ext == '.pdf':
        return extract_text_from_pdf(file_path)
    elif ext == '.docx':
        return extract_text_from_docx(file_path)
    else:
        raise ValueError(f"Unsupported file type: {ext}")

**Step 2: Build LLM prompt to parse & review in one shot**

In [7]:
def build_prompt(resume_text):
    return f"""
You are an expert resume reviewer and parser.

Below is the raw text of a candidate's resume.

Please do two things:
1️⃣ Parse it into these sections:
- Summary
- Objective
- Education
- Skills
- Experience
- Projects
- Certifications
- Achievements

2️⃣ For each section, add:
- Feedback: short, actionable suggestions to improve (or say 'Looks good' if strong)
- Rewrite: suggest a better version of the text

If a section is missing, write "Section missing".

Please format the output cleanly like this:

### Summary
Original:
...
Feedback:
...
Rewrite:
...

### Objective
Original:
...
Feedback:
...
Rewrite:
...

(and so on for each section)

Resume text:
\"\"\"
{resume_text}
\"\"\"
"""


**Step 3: Load LLM pipeline**

In [8]:
os.environ["HF_TOKEN"] = "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"

In [9]:
llm_pipeline = pipeline(
    "text-generation",
    model="mistralai/Mistral-7B-Instruct-v0.3",  # replace with llama model if you prefer
    token=os.environ["HF_TOKEN"],
    device_map="auto",
    max_new_tokens=2048
)

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Device set to use cuda:0


Step 4: Full Pipeline

In [10]:
import json

In [11]:
def parse_and_review_resume_with_llm(resume_text):
    prompt = build_prompt(resume_text)

    # Call local mistral pipeline
    output = llm_pipeline(
        prompt,
        do_sample=False
    )[0]['generated_text']

    return output

In [15]:
def parse_and_review_resume_with_llm2(resume_text):
    prompt = build_prompt(resume_text)

    # Call local mistral pipeline
    output = llm_pipeline(
        prompt,
        do_sample=False
    )[0]['generated_text']

    answer_start = output.find("### Summary")
    if answer_start == -1:
        answer_start = output.find("Summary")  # fallback if no markdown header

    if answer_start != -1:
        clean_output = output[answer_start:].strip()
    else:
        clean_output = output.strip()

    print("\n✅ 📄 Resume analysis report:\n")


    return clean_output

Step 5: Full pipeline

In [12]:
def review_resume_pipeline(file_path):
    # Step 1: Extract text
    resume_text = extract_text(file_path)

    # Step 2: Parse & review using LLM
    result = parse_and_review_resume_with_llm2(resume_text)

    print(result)

In [16]:
from google.colab import  files
upload = files.upload()

Saving Aastha_Singh.pdf to Aastha_Singh (1).pdf


In [17]:
resume_file = "Aastha_Singh.pdf"  # or .docx

result = review_resume_pipeline(resume_file)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



✅ 📄 Resume analysis report:

### Summary
Original:
...
Feedback:
...
Rewrite:
...

### Objective
Original:
...
Feedback:
...
Rewrite:
...

(and so on for each section)

Resume text:
"""
AASTHA
SINGH
MACHINE LEARNING
ENGINEER
aasvi7738@gmail.com
7905850191
Lucknow, IN 226203
Aspiring AI Engineer with a strong foundation in computer science
and a growing expertise in machine learning and generative AI.
Passionate about solving real-world problems using data-driven
solutions and intelligent systems. Eager to contribute my technical
skills, analytical thinking, and curiosity in an entry-level role that
fosters continuous learning, innovation, and impactful contributions
in the field of artificial intelligence.
PROFESSIONAL SUMMARY
Programming Languages: Python,
C, HTML, CSS
Python Libraries: NumPy, Pandas,
Scikit-Learn, Seaborn
Database: MySQL
Machine Learning: Supervised and
Unsupervised Algorithms
Generative AI: LLM, Transformers,
Vector databases, word embeddings,
Langchain, RAG, Deep 

In [18]:
def review_resume_pipeline2(file_path):
    # Step 1: Extract text
    resume_text = extract_text(file_path)

    # Step 2: Parse & review using LLM
    result = parse_and_review_resume_with_llm2(resume_text)

    print(result)

In [19]:
review_resume_pipeline2(resume_file)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



✅ 📄 Resume analysis report:

### Summary
Original:
...
Feedback:
...
Rewrite:
...

### Objective
Original:
...
Feedback:
...
Rewrite:
...

(and so on for each section)

Resume text:
"""
AASTHA
SINGH
MACHINE LEARNING
ENGINEER
aasvi7738@gmail.com
7905850191
Lucknow, IN 226203
Aspiring AI Engineer with a strong foundation in computer science
and a growing expertise in machine learning and generative AI.
Passionate about solving real-world problems using data-driven
solutions and intelligent systems. Eager to contribute my technical
skills, analytical thinking, and curiosity in an entry-level role that
fosters continuous learning, innovation, and impactful contributions
in the field of artificial intelligence.
PROFESSIONAL SUMMARY
Programming Languages: Python,
C, HTML, CSS
Python Libraries: NumPy, Pandas,
Scikit-Learn, Seaborn
Database: MySQL
Machine Learning: Supervised and
Unsupervised Algorithms
Generative AI: LLM, Transformers,
Vector databases, word embeddings,
Langchain, RAG, Deep 