<a href="https://colab.research.google.com/github/DivyaShreeK-dev/sdc/blob/main/Resume_analyzer_with_ai_q4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Step 1: Install required libraries
!pip install transformers spacy nltk

# Import necessary libraries
import spacy
import nltk
from transformers import pipeline
from IPython.display import display
import ipywidgets as widgets

# Download NLTK stopwords and punkt
nltk.download('stopwords')
nltk.download('punkt')

# Step 2: Initialize NLP models
nlp = spacy.load('en_core_web_sm')
summarizer = pipeline("summarization")

# Step 3: Process the resume (text input)
def analyze_resume(resume_text):
    # Tokenization and Named Entity Recognition (NER) using spaCy
    doc = nlp(resume_text)

    # Extracting important entities like skills (keywords, technologies), education, and experience
    skills = [ent.text for ent in doc.ents if ent.label_ == "GPE" or ent.label_ == "ORG" or ent.label_ == "PRODUCT"]
    education = [sent.text for sent in doc.sents if "University" in sent.text or "Degree" in sent.text]

    # Return extracted information
    return skills, education

# Step 4: Provide AI-based suggestions using GPT model
def generate_suggestions(resume_text):
    # Use GPT-based model to generate feedback
    feedback = summarizer(resume_text, max_length=150, min_length=50, do_sample=False)
    return feedback[0]['summary_text']

# Create a widget for user input (multiline text box)
resume_input = widgets.Textarea(
    value='',
    placeholder='Paste your resume text here...',
    description='Resume Text:',
    disabled=False,
    layout=widgets.Layout(width='100%', height='300px')
)

# Display the widget
display(resume_input)

# Button to trigger analysis
def on_button_click(b):
    # Get the resume text from the input widget
    resume_text = resume_input.value

    if not resume_text:
        print("Please input a resume.")
        return

    # Analyze the resume
    skills, education = analyze_resume(resume_text)

    # Generate suggestions
    suggestions = generate_suggestions(resume_text)

    # Output the results
    print("\nExtracted Skills:", skills)
    print("\nExtracted Education:", education)
    print("\nSuggestions for improvement:", suggestions)

# Create a button to trigger the analysis
analyze_button = widgets.Button(description="Analyze Resume")
analyze_button.on_click(on_button_click)

# Display the button
display(analyze_button)




[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


Textarea(value='', description='Resume Text:', layout=Layout(height='300px', width='100%'), placeholder='Paste…

Button(description='Analyze Resume', style=ButtonStyle())

Your max_length is set to 150, but your input_length is only 109. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=54)



Extracted Skills: ['MIT', 'Computer Science', 'JavaScript', 'React', 'Node.js', 'SQL', 'XYZ Corp', 'React', 'Node.js', 'MIT']

Extracted Education: []

Suggestions for improvement:  John Doe is a software engineer with 5 years of experience in developing web applications . He graduated from MIT with a degree in Computer Science . He worked at XYZ Corp (2019 - Present) He developed web applications using React and Node.js and worked with MongoDB .


Your max_length is set to 150, but your input_length is only 109. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=54)



Extracted Skills: ['MIT', 'Computer Science', 'JavaScript', 'React', 'Node.js', 'SQL', 'XYZ Corp', 'React', 'Node.js', 'MIT']

Extracted Education: []

Suggestions for improvement:  John Doe is a software engineer with 5 years of experience in developing web applications . He graduated from MIT with a degree in Computer Science . He worked at XYZ Corp (2019 - Present) He developed web applications using React and Node.js and worked with MongoDB .
