<a href="https://colab.research.google.com/github/Bipulz/smart-resume-bipulbhandari/blob/main/Coursework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [18]:
# This cell imports all necessary libraries for data processing, feature extraction, and machine learning models
import pandas as pd
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [20]:
# This cell loads the dataset from Google Drive and performs initial exploration including shape, columns, missing values, and distributions
file_path = '/content/drive/MyDrive/Colab Notebooks/dataset.csv'

print("Loading dataset...")
df = pd.read_csv(file_path)

print(f"Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
display(df.head())

print(f"\nMissing values per column:")
missing_percent = (df.isnull().sum() / len(df) * 100).sort_values(ascending=False)
print(missing_percent[missing_percent > 0])

print(f"\nDecision distribution:")
print(df['Decision'].value_counts())

print(f"\nUnique job roles: {df['Role'].nunique()}")
print("Job roles:")
print(df['Role'].unique())

Loading dataset...
Dataset loaded successfully!
Shape: (10174, 5)

Columns: ['Role', 'Resume', 'Decision', 'Reason_for_decision', 'Job_Description']

First few rows:


Unnamed: 0,Role,Resume,Decision,Reason_for_decision,Job_Description
0,E-commerce Specialist,Here's a professional resume for Jason Jones:\...,reject,Lacked leadership skills for a senior position.,Be part of a passionate team at the forefront ...
1,Game Developer,Here's a professional resume for Ann Marshall:...,select,Strong technical skills in AI and ML.,Help us build the next-generation products as ...
2,Human Resources Specialist,Here's a professional resume for Patrick Mccla...,reject,Insufficient system design expertise for senio...,We need a Human Resources Specialist to enhanc...
3,E-commerce Specialist,Here's a professional resume for Patricia Gray...,select,Impressive leadership and communication abilit...,Be part of a passionate team at the forefront ...
4,E-commerce Specialist,Here's a professional resume for Amanda Gross:...,reject,Lacked leadership skills for a senior position.,We are looking for an experienced E-commerce S...



Missing values per column:
Series([], dtype: float64)

Decision distribution:
Decision
reject    5114
select    5060
Name: count, dtype: int64

Unique job roles: 45
Job roles:
['E-commerce Specialist' 'Game Developer' 'Human Resources Specialist'
 'Mobile App Developer' 'UX Designer' 'Cloud Engineer'
 'Digital Marketing Specialist' 'AI Researcher' 'UI Engineer'
 'AR/VR Developer' 'Machine Learning Engineer' 'Database Administrator'
 'Data Engineer' 'Cybersecurity Analyst' 'Robotics Engineer'
 'Business Analyst' 'Data Analyst' 'Cloud Architect' 'Data Architect'
 'QA Engineer' 'System Administrator' 'DevOps Engineer' 'Product Manager'
 'Data Scientist' 'Full Stack Developer' 'Blockchain Developer'
 'Software Engineer' 'Content Writer' 'IT Support Specialist'
 'UI Designer' 'Cybersecurity Specialist' 'HR Specialist'
 'Network Engineer' 'Graphic Designer' 'UI/UX Designer' 'AI Engineer'
 'Project Manager' 'Software Developer' 'product manager'
 'software engineer' 'data engineer' 'ui engin

In [22]:
# This cell defines the text cleaning function and applies it to the resume column
print("\nStarting text cleaning...")

def clean_text(text):
    if pd.isna(text):
        return ""
    text = str(text).lower()
    text = re.sub(r'[^a-z\s]', ' ', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df['cleaned_resume'] = df['Resume'].apply(clean_text)

print("Text cleaning completed!")


Starting text cleaning...
Text cleaning completed!


In [23]:
# This cell filters the dataframe to use only resumes marked as 'select' for training
print("\nFiltering high-quality resumes marked as 'select' for training...")

select_df = df[df['Decision'].str.lower() == 'select'].copy()

print(f"Training on {len(select_df)} selected resumes")
print(f"\nJob role distribution in training data:")
print(select_df['Role'].value_counts())


Filtering high-quality resumes marked as 'select' for training...
Training on 5060 selected resumes

Job role distribution in training data:
Role
Data Scientist                  277
Software Engineer               250
Product Manager                 224
Data Engineer                   205
UI Engineer                     183
Data Analyst                    170
data engineer                   148
product manager                 147
software engineer               140
E-commerce Specialist           138
Digital Marketing Specialist    138
DevOps Engineer                 137
Robotics Engineer               135
data scientist                  134
Blockchain Developer            133
Human Resources Specialist      132
Cloud Architect                 130
IT Support Specialist           128
Full Stack Developer            128
UX Designer                     126
Mobile App Developer            124
Game Developer                  119
Business Analyst                119
Cybersecurity Analyst    

In [24]:
# This cell creates TF-IDF features from the cleaned resumes
print("\nCreating TF-IDF features...")

tfidf = TfidfVectorizer(
    stop_words='english',
    ngram_range=(1, 2),
    max_features=8000
)

X = tfidf.fit_transform(select_df['cleaned_resume'])
y = select_df['Role']

print(f"TF-IDF matrix shape: {X.shape}")


Creating TF-IDF features...
TF-IDF matrix shape: (5060, 8000)


In [25]:
# This cell splits the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")

Training samples: 4048
Test samples: 1012


In [26]:
# This cell trains the three machine learning models: Logistic Regression, SVM, and Random Forest
print("\nTraining machine learning models...")

logreg = OneVsRestClassifier(LogisticRegression(class_weight='balanced', max_iter=1000))
logreg.fit(X_train, y_train)

svm = OneVsRestClassifier(LinearSVC(class_weight='balanced'))
svm.fit(X_train, y_train)

rf = OneVsRestClassifier(RandomForestClassifier(
    n_estimators=300,
    class_weight='balanced',
    random_state=42
))
rf.fit(X_train, y_train)

print("All 3 models trained successfully!")


Training machine learning models...
All 3 models trained successfully!


In [27]:
# This cell evaluates the performance of each model on the test set
print("\nModel Performance on Test Set:\n")

models = [
    ("Logistic Regression", logreg),
    ("Support Vector Machine (SVM)", svm),
    ("Random Forest", rf)
]

for name, model in models:
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"--- {name} ---")
    print(f"Accuracy: {acc:.2%}")
    print("Classification Report:")
    print(classification_report(y_test, y_pred, zero_division=0))
    print()


Model Performance on Test Set:

--- Logistic Regression ---
Accuracy: 96.05%
Classification Report:
                              precision    recall  f1-score   support

                 AI Engineer       1.00      1.00      1.00         2
               AI Researcher       1.00      1.00      1.00        23
             AR/VR Developer       1.00      1.00      1.00        23
        Blockchain Developer       1.00      1.00      1.00        27
            Business Analyst       1.00      1.00      1.00        24
             Cloud Architect       1.00      1.00      1.00        26
              Cloud Engineer       1.00      1.00      1.00        20
              Content Writer       1.00      1.00      1.00        23
       Cybersecurity Analyst       0.96      1.00      0.98        24
    Cybersecurity Specialist       0.00      0.00      0.00         1
                Data Analyst       1.00      1.00      1.00        34
              Data Architect       1.00      1.00      1.0

In [30]:
# This cell defines the function to recommend a job role based on resume text using majority voting from the three models
def recommend_job(resume_text):
    if not resume_text or len(resume_text.strip()) == 0:
        print("Resume has not been added")
        return None

    cleaned = clean_text(resume_text)

    if len(cleaned.split()) < 5:
        print("Resume text is too short for accurate recommendation")
        return None

    features = tfidf.transform([cleaned])

    pred1 = logreg.predict(features)[0]
    pred2 = svm.predict(features)[0]
    pred3 = rf.predict(features)[0]

    predictions = [pred1, pred2, pred3]
    counts = {}
    for p in predictions:
        counts[p] = counts.get(p, 0) + 1

    recommended = max(counts, key=counts.get)
    votes = counts[recommended]

    print("FINAL RECOMMENDATION")
    print("-----------------------------------")
    print(f"Recommended Job Role: {recommended}")
    print(f"Model Agreement: {votes}/3")
    print(f"Confidence: {'High' if votes == 3 else 'Medium' if votes == 2 else 'Low'}")
    print("-----------------------------------")

In [31]:
# Testing the recommendation system with a realistic sample of resume that implies UI/UX expertise without directly mentioning "UI" or "UX"
test_resume = """
Bipul Bhandari

Email: bhandaribipul345@gmail.com | Phone: +977-98XXXXXXXXXX | LinkedIn: linkedin.com/in/bipul-bhandari | Portfolio: behance.net/bipulbhandari | Location: Kathmandu, Nepal

Professional Summary
Creative and detail-oriented designer with 5+ years of experience crafting intuitive digital interfaces for web and mobile applications. Skilled in user research, wireframing, prototyping, visual design, and usability testing. Passionate about creating seamless and engaging experiences through clean layouts, typography, color theory, and interaction patterns. Successfully improved user satisfaction and engagement metrics by over 35% through iterative design and data-informed decisions.

Professional Experience

Senior Designer
Tech Innovations Nepal, Kathmandu, Nepal
February 2021 – Present
- Led the complete design process for 12+ mobile apps and web platforms from concept to launch
- Conducted user interviews, surveys, and usability sessions with 150+ participants to identify pain points
- Created low-fidelity wireframes, interactive prototypes, and high-fidelity mockups using Figma
- Developed and maintained reusable component libraries and design systems for consistent branding
- Collaborated closely with developers to ensure accurate implementation across iOS, Android, and web
- Performed A/B testing on key screens, resulting in 30% higher task completion rates

Designer
Creative Studio Kathmandu, Lalitpur, Nepal
August 2018 – January 2021
- Designed responsive web experiences and native mobile screens for e-commerce and startup clients
- Built user flows, information architecture, and navigation structures for complex applications
- Created visual assets including icons, illustrations, and marketing materials
- Used analytics tools to track user behavior and inform design iterations
- Participated in daily stand-ups and sprint reviews in Agile teams

Education
Bachelor of Information Technology
Tribhuvan University, Kathmandu, Nepal
Graduated: 2018

Skills
- Design Tools: Figma (Advanced), Sketch, Adobe Photoshop, Adobe Illustrator, InVision
- Prototyping: Interactive prototypes, clickable mockups, animation
- Research: User interviews, surveys, personas, journey mapping, usability testing
- Visual Design: Typography, color theory, layout, iconography, branding
- Collaboration: Design handoff, developer communication, version control
- Other: HTML/CSS basics, accessibility standards, Agile methodology, Miro, Notion

Certifications
- Google Design Professional Certificate
- Interaction Design Specialization
- Figma Advanced Certification

Languages
Hindi (Fluent), Nepali (Native), English (Professional Proficiency)

Interests
Digital trends, typography, minimalist design, hiking in the Himalayas, sketching
"""

recommend_job(test_resume)

FINAL RECOMMENDATION
-----------------------------------
Recommended Job Role: UX Designer
Model Agreement: 3/3
Confidence: High
-----------------------------------
