<a href="https://colab.research.google.com/github/Satish-970/ATS_Checker_Job-Description/blob/main/ATS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ATS Resume Matcher: Your Guide to a Better Resume

This tool evaluates how well your resume matches a job description, providing an ATS score, keyword analysis, and suggestions for improvement. It supports resume uploads (PDF/DOCX) and job descriptions via text input or URL scraping.

**Features**:
- Upload resume in PDF or DOCX format.
- Provide job description via text or URL (scrapes job posting from websites).
- Get ATS match score, keyword overlap, and improvement suggestions.
- Perform grammar and spelling checks if no job description is provided.
- Visualize results with charts.

**How to Run**:
1. Run the first cell to install dependencies.
2. Run the second cell to set up Java for grammar checking.
3. Run the third cell to define functions.
4. Run the fourth cell to start the checker, upload your resume, and input the job description (text or URL).

**Notes**:
- Use text-based (not scanned) PDF/DOCX files for accurate extraction.
- For URLs, ensure the job posting is publicly accessible. Some dynamic websites may require manual text input.
- No sign-in required; files are uploaded locally and deleted after analysis.

In [1]:
# Step 1: Install and Import Libraries

# Install necessary libraries
!pip install spacy PyPDF2 python-docx scikit-learn matplotlib language-tool-python requests beautifulsoup4 selenium
!python -m spacy download en_core_web_sm
!apt-get update && apt-get install -y chromium-chromedriver

# Import libraries
import spacy
import PyPDF2
from docx import Document
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
from google.colab import files
import os
import language_tool_python
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

# Set up Selenium for dynamic content
def setup_selenium():
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    try:
        driver = webdriver.Chrome(options=chrome_options)
        return driver
    except Exception as e:
        print(f"Error setting up Selenium: {e}")
        return None

print("Setup complete!")

In [2]:
# Step 2: Install Java (required for LanguageTool)
!apt-get update
!apt-get install -y openjdk-17-jre-headless


In [3]:
# Step 3: Define Core Functions

# Load the spaCy model
try:
    nlp = spacy.load('en_core_web_sm')
except OSError:
    print("Downloading spaCy model...")
    !python -m spacy download en_core_web_sm
    nlp = spacy.load('en_core_web_sm')

# Initialize LanguageTool for grammar and spell checking
try:
    tool = language_tool_python.LanguageTool('en-US')
except Exception as e:
    print(f"Error initializing LanguageTool: {e}")
    tool = None

# Function to extract text from PDF
def extract_pdf_text(pdf_path):
    text = ''
    try:
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            for page in reader.pages:
                extracted_page_text = page.extract_text()
                if extracted_page_text:
                    text += extracted_page_text + ' '
    except Exception as e:
        print(f"Error extracting text from PDF: {e}")
        return ""
    return text.strip()

# Function to extract text from DOCX
def extract_docx_text(docx_path):
    text = ''
    try:
        doc = Document(docx_path)
        for paragraph in doc.paragraphs:
            text += paragraph.text + ' '
    except Exception as e:
        print(f"Error extracting text from DOCX: {e}")
        return ""
    return text.strip()

# Function to extract text from uploaded resume
def extract_resume_text(file_path):
    file_extension = file_path.lower().split('.')[-1]
    if file_extension == 'pdf':
        return extract_pdf_text(file_path)
    elif file_extension in ['docx', 'doc']:
        return extract_docx_text(file_path)
    else:
        print("Unsupported file format. Please upload PDF or DOCX files only.")
        return ""

# Function to extract text from a webpage
def extract_text_from_url(url):
    # Try Selenium for dynamic content
    driver = setup_selenium()
    if driver:
        try:
            driver.get(url)
            time.sleep(3)  # Allow page to load
            soup = BeautifulSoup(driver.page_source, 'html.parser')
            driver.quit()
        except Exception as e:
            print(f"Selenium failed: {e}")
            driver.quit()
            soup = None
    else:
        soup = None

    # Fallback to requests/BeautifulSoup for static content
    if not soup:
        try:
            response = requests.get(url, timeout=10)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, 'html.parser')
        except requests.exceptions.RequestException as e:
            print(f"Error fetching job description from URL: {e}")
            return ""

    # Remove scripts, styles, and non-content tags
    for script in soup(["script", "style", "header", "footer", "nav", "aside"]):
        script.decompose()
    # Target job description-specific elements (improve specificity)
    job_sections = soup.find_all(['div', 'section', 'article'], class_=['job-description', 'job-details', 'description', 'content'])
    if job_sections:
        text = ' '.join([element.get_text(strip=True) for element in job_sections])
    else:
        text = ' '.join([element.get_text(strip=True) for element in soup.find_all(['p', 'div', 'li', 'h1', 'h2', 'h3'])])
    return text.strip() if text else ""

# Function to extract keywords
def extract_keywords(text):
    doc = nlp(text.lower())
    keywords = set()

    # Expanded list of technical and professional skills
    tech_skills = {
        'python', 'java', 'javascript', 'sql', 'html', 'css', 'react', 'angular', 'vue', 'node.js',
        'django', 'flask', 'spring', 'hibernate', 'mongodb', 'postgresql', 'mysql', 'oracle', 'aws',
        'azure', 'gcp', 'docker', 'kubernetes', 'jenkins', 'git', 'github', 'gitlab', 'agile',
        'scrum', 'kanban', 'jira', 'confluence', 'tensorflow', 'pytorch', 'keras', 'pandas',
        'numpy', 'scipy', 'matplotlib', 'seaborn', 'tableau', 'powerbi', 'excel', 'r', 'scala',
        'spark', 'hadoop', 'kafka', 'elasticsearch', 'redis', 'graphql', 'rest', 'api',
        'microservices', 'devops', 'ci/cd', 'linux', 'windows', 'macos', 'bash', 'powershell',
        'communication', 'leadership', 'teamwork', 'management', 'analytics', 'problem-solving',
        'debugging', 'testing', 'automation', 'typescript', 'ruby', 'php', 'go', 'rust', 'swift',
        'kotlin', 'ansible', 'terraform', 'sql server', 'nosql', 'etl', 'data warehousing',
        'blockchain', 'ai', 'ml', 'nlp', 'computer vision', 'data visualization'
    }

    # Extract nouns, proper nouns, and technical skills
    for token in doc:
        if token.pos_ in ['NOUN', 'PROPN'] and len(token.text) > 2 and token.text.isalpha():
            keywords.add(token.text)
        elif token.text in tech_skills:
            keywords.add(token.text)

    # Expanded multi-word skills
    text_lower = text.lower()
    multi_word_skills = [
        'machine learning', 'data analysis', 'data science', 'project management',
        'artificial intelligence', 'deep learning', 'natural language processing',
        'computer vision', 'big data', 'business intelligence', 'data mining',
        'software development', 'web development', 'mobile development', 'full stack',
        'front end', 'back end', 'database management', 'cloud computing',
        'cyber security', 'network security', 'quality assurance', 'user experience',
        'user interface', 'api development', 'data engineering', 'devsecops',
        'agile methodology', 'software engineering', 'system administration'
    ]

    for skill in multi_word_skills:
        if skill in text_lower:
            keywords.add(skill)

    return keywords

# Function to calculate ATS score
def calculate_ats_score(resume_text, job_description):
    if not resume_text or not job_description:
        print("Error: Empty resume or job description text.")
        return 0.0
    documents = [resume_text, job_description]
    vectorizer = TfidfVectorizer(
        stop_words='english',
        ngram_range=(1, 2),
        max_features=1000,
        lowercase=True
    )
    try:
        tfidf_matrix = vectorizer.fit_transform(documents)
        similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
        return similarity * 100
    except Exception as e:
        print(f"Error calculating ATS score: {e}")
        return 0.0

# Function to provide improvement suggestions
def get_suggestions(resume_keywords, job_keywords):
    missing = job_keywords - resume_keywords
    if missing:
        missing_list = sorted(list(missing))[:10]
        return f"Consider adding these keywords to your resume: {', '.join(missing_list)}"
    return "Excellent! Your resume contains all the key skills from the job description."

# Function to provide score interpretation
def get_score_feedback(score):
    if score >= 80:
        return "Excellent match! Your resume aligns very well with this job."
    elif score >= 65:
        return "Good match! Minor improvements could boost your score."
    elif score >= 50:
        return "Moderate match. Review suggestions to improve alignment."
    elif score >= 35:
        return "Below average match. Consider significant improvements."
    else:
        return "Low match. Major revisions recommended to align with job requirements."

# Function to visualize the score
def visualize_score(score):
    if score == 0.0:
        print("No visualization generated due to invalid ATS score.")
        return
    # Color coding based on score ranges
    if score >= 80:
        color = 'green'
        performance = 'Excellent'
    elif score >= 65:
        color = 'lightgreen'
        performance = 'Good'
    elif score >= 50:
        color = 'orange'
        performance = 'Moderate'
    elif score >= 35:
        color = 'red'
        performance = 'Below Average'
    else:
        color = 'darkred'
        performance = 'Low'

    plt.figure(figsize=(10, 6))
    bars = plt.bar(['ATS Match Score'], [score], color=color, alpha=0.8, width=0.6)

    plt.ylim(0, 100)
    plt.ylabel('Match Percentage (%)', fontsize=12)
    plt.title('Resume ATS Match Analysis', fontsize=14, fontweight='bold')

    plt.text(0, score + 2, f'{score:.1f}%\n({performance})',
             ha='center', va='bottom', fontsize=14, fontweight='bold')

    plt.axhline(y=80, color='green', linestyle='--', alpha=0.3, label='Excellent (80%+)')
    plt.axhline(y=65, color='lightgreen', linestyle='--', alpha=0.3, label='Good (65%+)')
    plt.axhline(y=50, color='orange', linestyle='--', alpha=0.3, label='Moderate (50%+)')
    plt.axhline(y=35, color='red', linestyle='--', alpha=0.3, label='Below Average (35%+)')

    plt.legend(loc='upper right')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# Function to visualize keyword overlap
def visualize_keyword_overlap(resume_keywords, job_keywords):
    resume_count = len(resume_keywords)
    job_count = len(job_keywords)
    overlap_count = len(resume_keywords & job_keywords)

    ```chartjs
    {
        "type": "venn",
        "data": {
            "labels": ["Resume Keywords", "Job Description Keywords", "Overlap"],
            "datasets": [{
                "data": [
                    {"sets": [0], "size": resume_count - overlap_count},
                    {"sets": [1], "size": job_count - overlap_count},
                    {"sets": [0, 1], "size": overlap_count}
                ],
                "backgroundColor": ["#36A2EB", "#FF6384", "#FFCE56"]
            }]
        },
        "options": {
            "title": {
                "display": true,
                "text": "Keyword Overlap Analysis"
            }
        }
    }
    ```

# Function to perform general resume check
def perform_general_check(resume_text, file_name):
    print("\n" + "=" * 60)
    print("GENERAL RESUME HEALTH CHECK RESULTS")
    print("=" * 60)

    # Grammar and Spelling Analysis
    if tool:
        print("\nGrammar and Spelling Analysis:")
        matches = tool.check(resume_text)
        if matches:
            print(f"- Found {len(matches)} potential issue(s). Review these suggestions:\n")
            for i, match in enumerate(matches[:5]):
                print(f"  {i+1}. Issue: '{match.ruleId}', Suggestion: '{match.replacements}', Context: '{match.context}'")
            if len(matches) > 5:
                print("  ... and more. Please proofread your resume carefully.")
        else:
            print("- No major grammar or spelling issues detected. Great job! 👍")
    else:
        print("- Grammar checking unavailable due to LanguageTool initialization failure.")

    # Formatting and Structure Analysis
    print("\nFormatting and Structure Tips:")
    resume_lower = resume_text.lower()
    sections = ['experience', 'skills', 'education', 'projects', 'summary']
    found_sections = [sec for sec in sections if sec in resume_lower]

    if len(found_sections) >= 3:
        print(f"- Your resume uses standard section headings ({', '.join(found_sections)}), which is great for ATS parsing. ✅")
    else:
        print(f"- Consider using standard headings like 'Experience', 'Skills', and 'Education'. Missing: {', '.join(list(set(sections) - set(found_sections)))}.")

    if any(keyword in resume_lower for keyword in ['table', 'column', 'image']):
        print("- Warning: Complex formatting (tables, columns, images) detected. These may cause ATS parsing issues.")
    else:
        print("- Your resume appears to have a clean, ATS-friendly format. 👍")

    # Keyword and Content Analysis
    print("\nKeyword & Content Recommendations:")
    action_verbs = ['managed', 'developed', 'created', 'implemented', 'analyzed', 'led', 'designed', 'optimized', 'improved']
    found_verbs = [verb for verb in action_verbs if verb in resume_lower]

    if len(found_verbs) > 5:
        print("- Good use of action verbs to describe your experience.")
    else:
        print(f"- Use stronger action verbs like: {', '.join(action_verbs[:5])}.")

    if any(char.isdigit() and '%' in resume_text for char in resume_text) or any(char.isdigit() and '$' in resume_text for char in resume_text):
        print("- Quantifiable results found, showcasing your impact. Keep it up! 💪")
    else:
        print("- Add quantifiable results (e.g., 'Increased sales by 15%') for stronger impact.")

    resume_keywords = extract_keywords(resume_text)
    print(f"- Your resume contains {len(resume_keywords)} keywords. Ensure a mix of technical and soft skills.")

    # Cleanup
    if os.path.exists(file_name):
        os.remove(file_name)
        print(f"\nTemporary file '{file_name}' removed.")

    print("\n" + "=" * 60)
    print("Analysis Complete! Good luck! 🎉")
    print("=" * 60)

print("All functions defined successfully.")

In [4]:
# Step 4: Run the ATS Checker

def run_ats_checker():
    print("ATS Resume Checker")
    print("=" * 50)
    print("This tool analyzes your resume against a job description or performs a general health check.")
    print("Supported formats: PDF, DOCX, DOC")
    print("\nStep 1: Upload your resume file")
    print("Click 'Choose Files' below to select your resume...")

    # Upload resume file
    try:
        uploaded_files = files.upload()
    except Exception as e:
        print(f"Error during file upload: {e}")
        return

    if not uploaded_files:
        print("No file uploaded. Please try again.")
        return

    file_name = list(uploaded_files.keys())[0]
    print(f"Successfully uploaded: {file_name}")

    # Extract text from resume
    try:
        resume_text = extract_resume_text(file_name)
        if not resume_text or len(resume_text.strip()) < 100:
            print("Warning: Resume text is too short or unreadable. Ensure it's a text-based PDF/DOCX.")
            if os.path.exists(file_name):
                os.remove(file_name)
            return
    except Exception as e:
        print(f"Error reading file: {e}")
        print("Please upload a valid PDF or DOCX file.")
        if os.path.exists(file_name):
            os.remove(file_name)
        return

    print("\n" + "=" * 50)
    print("Step 2: Provide the job description")
    print("- Enter a URL to a job posting (e.g., https://example.com/job) and press Enter.")
    print("- Or, leave the URL blank and press Enter, then paste the job description text in the next prompt.")
    print("- Or, leave both blank (press Enter twice) for a general resume health check.")

    job_desc_url = input("\nJob Description URL (leave blank to paste text): ")
    job_desc = ""

    if job_desc_url.strip():
        print(f"\nFetching job description from: {job_desc_url}")
        job_desc = extract_text_from_url(job_desc_url)
        if not job_desc:
            print("Failed to extract job description from URL. Try pasting the job description text.")
            job_desc = input("\nJob Description (paste here, or leave blank for health check): ")
    else:
        job_desc = input("\nJob Description (paste here, or leave blank for health check): ")

    if not job_desc.strip():
        print("\nNo job description provided. Performing general resume health check...")
        perform_general_check(resume_text, file_name)
    else:
        if len(job_desc.strip()) < 50:
            print("Warning: Job description is very short. Consider providing a complete job posting.")

        print("\nAnalyzing resume against job description...")

        try:
            resume_keywords = extract_keywords(resume_text)
            job_keywords = extract_keywords(job_desc)
            ats_score = calculate_ats_score(resume_text, job_desc)

            print("\n" + "=" * 60)
            print("ATS MATCH ANALYSIS RESULTS")
            print("=" * 60)

            print(f"\nATS Match Score: {ats_score:.2f}%")
            print(f"Score Category: {get_score_feedback(ats_score)}")

            print(f"\nKeyword Analysis:")
            print(f"- Resume keywords: {len(resume_keywords)}")
            print(f"- Job description keywords: {len(job_keywords)}")
            print(f"- Matching keywords: {len(resume_keywords & job_keywords)}")

            matching_keywords = resume_keywords & job_keywords
            if matching_keywords:
                print(f"\nMatching skills: {', '.join(sorted(list(matching_keywords))[:15])}")

            print(f"\nImprovement Suggestions:")
            print(get_suggestions(resume_keywords, job_keywords))

            visualize_score(ats_score)
            visualize_keyword_overlap(resume_keywords, job_keywords)

            print("\nAdditional Tips to Boost Your Score:")
            if ats_score < 65:
                print("- Use exact keywords from the job description.")
                print("- Add a dedicated skills section.")
                print("- Tailor experience descriptions to job requirements.")
                print("- Use industry-standard terminology.")

            print("- Use a simple, ATS-friendly resume format.")
            print("- Include standard headings (Experience, Skills, Education).")
            print("- Avoid images, tables, and complex formatting.")

        except Exception as e:
            print(f"Error during analysis: {e}")

        finally:
            if os.path.exists(file_name):
                os.remove(file_name)
                print(f"\nTemporary file '{file_name}' removed.")

            print("\n" + "=" * 60)
            print("Analysis Complete! Good luck! 🎉")
            print("=" * 60)

# Run the ATS checker
run_ats_checker()

## Saving to GitHub

To store this notebook on GitHub:
1. Save this JSON as `ats_matcher.ipynb`.
2. Upload to a GitHub repository (e.g., `ATS_Checker_Job-Description`).
3. Open in Colab using the link in the first cell.

**Notes**:
- Runs in Google Colab; no Streamlit required.
- Ensure resume files are text-based.
- For URLs, use publicly accessible job postings. Dynamic websites may require manual text input.
- Temporary files are deleted after analysis.