## Bridging the Gap: Automated Job Role Classification

This project tackles the challenge of automatically classifying job roles from unlabelled resume data using hugging face model. Starting with a dataset of 155 resumes, our primary goal was to assign accurate job role labels based purely on the skills listed. We explored a progression of methodologies, from an initial rule-based system that provided foundational `rule_label`s, to advanced Large Language Models (LLMs).

Initial attempts with `flan-t5-small` and zero-shot classification using `facebook/bart-large-mnli` highlighted the limitations of generalized models for this specific task, revealing significant imbalances and low performance.

The breakthrough came with fine-tuning a `roberta-base` model. By training on our `rule_label` dataset, the fine-tuned RoBERTa model achieved a substantial improvement in accuracy (from ~9.7% to 67.7%) and F1-score (from 0.11 to 0.5471). This project demonstrates the critical impact of fine-tuning for domain-specific tasks, successfully transforming raw skill data into reliably classified job roles.

Presented By

Irfan Faisal - 23224532

Labib Mashfiq Rahman - 23229564

Hossain Muhtasim Tahmid - 23229549

Abrer Rahat Hossain - 23229502


# Resume Data Loading

We found this dataset from hugging face https://huggingface.co/datasets/bhuvanmdev/resume_parser
containing 155 unlabelled job roles.


In [None]:
import pandas as pd

# Load the parquet file into a pandas DataFrame
df = pd.read_parquet('/content/train-00000-of-00001.parquet')

# Display the first 5 rows of the DataFrame
display(df.head())

df.info()

Unnamed: 0.1,Unnamed: 0,resume,name,contact,skills,companies,total_years
0,0,Resume Text:\nVamsi Krishna Kondapuneni Senior...,Vamsi Krishna Kondapuneni,vk245@outlook.com,"SIEM, QRadar, McAfee ESM, ArcSight, Splunk, M...","WWW, SociÃ©tÃ© GÃ©nÃ©rale, McAfee Inc, Sattri...",7 years
1,1,Resume Text:\nSummary Sandhya Upadhyayula M 91...,Sandhya Upadhyayula,"91 8748993169, sandhyasanjali@gmail.com","Agile Methodologies, Microsoft Excel, Word, P...",Barter Technologies Pty Ltd,3
2,2,Resume Text:\nBHIM PRAKASH SINGH bhimprakashsi...,Bhim Prakash Singh,"bhimprakashsingh@yahoocoin, 918791552799","Microsoft development technologies, Net Frame...","Zibal Technologies Pvt Ltd, Rigil Stratsoft P...",13 Years
3,3,Resume Text:\nSOUMYARANJAN PATRA soumyapatrade...,Soumyaranjan Patra,"PH9964190772, soumyapatradev@gmail.com","Microsoft Net Technologies, C#, MVC, ASP.NET ...","Fareportal Technologies, ITC INFOTECH Bangalo...",4 years
4,4,Resume Text:\nHanmant Telange EmailIDhanmantmt...,Hanmant Telange,EmailIDhanmantmtelange@gmailcomContactNo91966...,"Javascript, Nodejs, MySQL, PostgreSQL, MongoD...","XoriantsystempvtltdPune, Harbingersystempvtlt...",6 years


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155 entries, 0 to 154
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unnamed: 0   155 non-null    int64 
 1   resume       155 non-null    object
 2   name         155 non-null    object
 3   contact      155 non-null    object
 4   skills       155 non-null    object
 5   companies    154 non-null    object
 6   total_years  153 non-null    object
dtypes: int64(1), object(6)
memory usage: 8.6+ KB


# The core issue



The resume dataset did not contain any predefined job-role labels, so we used rule based classification.  To generate baseline job-role labels for the resume dataset, a rule-based classifier was implemented. The system analyzes each resume’s text and assigns a job category based on the presence of specific keywords associated with known roles.

For example, resumes mentioning SIEM, Splunk, or QRadar are classified as SOC Analyst, while those containing C#, .NET, or MVC are labeled as .NET Developer. Similar keyword groups are used for roles such as Full Stack Developer, Data Engineer, DevOps Engineer, Cloud Engineer, and Project Coordinator. If no rule is matched, the resume is assigned to the Other category.

In [None]:
df['skills']

Unnamed: 0,skills
0,"SIEM, QRadar, McAfee ESM, ArcSight, Splunk, M..."
1,"Agile Methodologies, Microsoft Excel, Word, P..."
2,"Microsoft development technologies, Net Frame..."
3,"Microsoft Net Technologies, C#, MVC, ASP.NET ..."
4,"Javascript, Nodejs, MySQL, PostgreSQL, MongoD..."
...,...
150,".Net Core, Angular9, Angular6, Kubernetes, Do..."
151,"Manual Testing, Mobile Application Testing, A..."
152,"Good written and verbal communication, Sales ..."
153,"Linux, Scripting languages (Bash, Python), So..."


In [None]:
def rule_based_role(skills_text):
    skills_text_lower = str(skills_text).lower()

    if any(keyword in skills_text_lower for keyword in ['soc', 'siem', 'forensic', 'malware']):
        return 'SOC Analyst'
    elif any(keyword in skills_text_lower for keyword in ['cybersecurity', 'security', 'vulnerability', 'compliance']):
        return 'Cybersecurity Analyst'
    elif any(keyword in skills_text_lower for keyword in ['.net', 'c#', 'asp.net']):
        return '.NET Developer'
    elif any(keyword in skills_text_lower for keyword in ['java', 'python', 'c++', 'software', 'developer', 'engineering', 'sdlc']):
        return 'Software Engineer'
    elif any(keyword in skills_text_lower for keyword in ['full stack', 'frontend', 'backend', 'api']):
        return 'Full Stack Developer'
    elif any(keyword in skills_text_lower for keyword in ['javascript', 'react', 'angular', 'vue']):
        return 'JavaScript Developer'
    elif any(keyword in skills_text_lower for keyword in ['node.js', 'express.js', 'typescript']):
        return 'Node.js Developer'
    elif any(keyword in skills_text_lower for keyword in ['cloud', 'aws', 'azure', 'gcp', 'kubernetes', 'docker']):
        return 'Cloud Engineer'
    elif any(keyword in skills_text_lower for keyword in ['devops', 'ci/cd', 'jenkins', 'ansible']):
        return 'DevOps Engineer'
    elif any(keyword in skills_text_lower for keyword in ['project management', 'jira', 'scrum', 'agile']):
        return 'Project Coordinator'
    elif any(keyword in skills_text_lower for keyword in ['agile', 'scrum master', 'product owner', 'business analysis']):
        return 'Agile Analyst'
    elif any(keyword in skills_text_lower for keyword in ['database', 'sql', 'nosql', 'mongodb', 'postgresql']):
        return 'Database Developer'
    elif any(keyword in skills_text_lower for keyword in ['data engineering', 'etl', 'airflow', 'spark', 'hadoop']):
        return 'Data Engineer'
    else:
        return 'Other'

print("The 'rule_based_role' function has been defined.")

The 'rule_based_role' function has been defined.


In [None]:
df['rule_label'] = df['skills'].apply(rule_based_role)
print("The 'rule_based_role' function has been applied to the 'skills' column, and results are stored in 'rule_label'.")

The 'rule_based_role' function has been applied to the 'skills' column, and results are stored in 'rule_label'.


In [None]:
print("Displaying 'skills' and 'rule_label' columns for the first 5 rows:")
display(df[['skills', 'rule_label']].head())

Displaying 'skills' and 'rule_label' columns for the first 5 rows:


Unnamed: 0,skills,rule_label
0,"SIEM, QRadar, McAfee ESM, ArcSight, Splunk, M...",SOC Analyst
1,"Agile Methodologies, Microsoft Excel, Word, P...",Full Stack Developer
2,"Microsoft development technologies, Net Frame...",Software Engineer
3,"Microsoft Net Technologies, C#, MVC, ASP.NET ...",.NET Developer
4,"Javascript, Nodejs, MySQL, PostgreSQL, MongoD...",Software Engineer


In [None]:
print("\nDistribution of 'rule_label' column:")
display(df['rule_label'].value_counts())



Distribution of 'rule_label' column:


Unnamed: 0_level_0,count
rule_label,Unnamed: 1_level_1
Software Engineer,96
.NET Developer,17
Project Coordinator,12
Full Stack Developer,8
SOC Analyst,8
Other,5
Cloud Engineer,3
JavaScript Developer,2
Cybersecurity Analyst,2
Database Developer,1


# Labelling using LLM

The flan-t5-small model was initially used for Automatic Job Role Labeling Using an LLM. The idea was to use its capabilities to classify each resume into a specific job role based on the provided skills.

# Model initialization

This code imports the Hugging Face AutoTokenizer and loads the tokenizer for the google/flan-t5-small model. The tokenizer is responsible for converting input text into tokens so the model can understand and process it. By running this code, we initialize and prepare the tokenizer needed to interact with the FLAN-T5-small LLM.

Link to the model: https://huggingface.co/google/flan-t5-small


In [None]:
from transformers import AutoTokenizer

model = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model)
print(f"Tokenizer for {model} initialized successfully.")

Tokenizer for google/flan-t5-small initialized successfully.


Here we display the tokenizer's vocabulary size, the model's total number of trainable parameters, and a confirmation message indicating that both the tokenizer and the model have been successfully loaded and are ready for use.

In [None]:
from transformers import AutoModelForSeq2SeqLM

# 'model' is currently the string "google/flan-t5-small" from a previous cell.
# To get the number of parameters, we need the actual model object.
loaded_model = AutoModelForSeq2SeqLM.from_pretrained(model)

print(f"Tokenizer vocabulary size: {tokenizer.vocab_size}")
print(f"Model total parameters: {loaded_model.num_parameters()}")
print("Both tokenizer and model are confirmed to be loaded and ready for use.")

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Tokenizer vocabulary size: 32100
Model total parameters: 76961152
Both tokenizer and model are confirmed to be loaded and ready for use.


In [None]:
df["skills"] = df["skills"].astype(str)

# Automatic Job Role Labeling Using an LLM
We converted the skills column into plain text and created a prompt that instructs an LLM to classify each resume into one specific job role based solely on its skillset. Using this prompt, we passed each resume’s skills through the model, which analyzed the skill patterns and selected the most appropriate job role from the predefined list. The predicted role was then added as a new column (predicted_job_role) in the DataFrame, effectively generating job-role labels for an otherwise unlabelled resume dataset.


In [None]:
df["skills"] = df["skills"].astype(str)

PROMPT = """
You are an expert technical recruiter and your job is to identify the most likely job role for a candidate based only on their SKILLS.

Allowed job roles:

- SOC Analyst
- Cybersecurity Analyst
- .NET Developer
- Software Engineer
- Full Stack Developer
- JavaScript Developer
- Node.js Developer
- Cloud Engineer
- DevOps Engineer
- Project Coordinator
- Agile Analyst
- Database Developer
- Data Engineer
- Other

Instructions:
- Read the skillset carefully.
- Focus on the strongest cluster of skills.
- Output ONLY the job role name from the list above.
- Do NOT explain your answer.

Skillset:
{skills}

Answer with only ONE job role:
"""

# Define the list of allowed job roles explicitly
ALLOWED_JOB_ROLES = [
    'SOC Analyst',
    'Cybersecurity Analyst',
    '.NET Developer',
    'Software Engineer',
    'Full Stack Developer',
    'JavaScript Developer',
    'Node.js Developer',
    'Cloud Engineer',
    'DevOps Engineer',
    'Project Coordinator',
    'Agile Analyst',
    'Database Developer',
    'Data Engineer',
    'Other'
]

def classify_job_role(skills_text):
    prompt = PROMPT.format(skills=skills_text)
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
    outputs = loaded_model.generate(**inputs, max_length=50)
    result = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

    # Ensure the predicted role is one of the allowed roles
    if result not in ALLOWED_JOB_ROLES:
        return 'Other' # Default to 'Other' if the model predicts an invalid role
    return result

df["predicted_job_role"] = df["skills"].apply(classify_job_role)
print("Predicted job roles have been added to the DataFrame.")

Predicted job roles have been added to the DataFrame.


## Display Labeled Data Sample


In [None]:
display(df[['skills', 'predicted_job_role']])

Unnamed: 0,skills,predicted_job_role
0,"SIEM, QRadar, McAfee ESM, ArcSight, Splunk, M...",SOC Analyst
1,"Agile Methodologies, Microsoft Excel, Word, P...",SOC Analyst
2,"Microsoft development technologies, Net Frame...",SOC Analyst
3,"Microsoft Net Technologies, C#, MVC, ASP.NET ...",SOC Analyst
4,"Javascript, Nodejs, MySQL, PostgreSQL, MongoD...",SOC Analyst
...,...,...
150,".Net Core, Angular9, Angular6, Kubernetes, Do...",Other
151,"Manual Testing, Mobile Application Testing, A...",SOC Analyst
152,"Good written and verbal communication, Sales ...",SOC Analyst
153,"Linux, Scripting languages (Bash, Python), So...",SOC Analyst


The output after applying the flan-t5-small model revealed a significant imbalance, with a large majority of resumes being classified as 'SOC Analyst'. This highly skewed distribution suggested that the flan-t5-small model, even with a carefully crafted prompt, might not be effectively distinguishing between the diverse skill sets required for different job roles, or that it was biased towards a particular category.

Consequently, it led us to explore alternative approaches, specifically the zero-shot classification with the facebook/bart-large-mnli model, to establish a more objective performance baseline and investigate if other models could provide better, more balanced classifications before proceeding to fine-tuning.

In [None]:
print("\nDistribution of 'predicted_job_role' column:")
display(df['predicted_job_role'].value_counts())


Distribution of 'predicted_job_role' column:


Unnamed: 0_level_0,count
predicted_job_role,Unnamed: 1_level_1
SOC Analyst,144
a.k.a.,2
a.k.a. a sc.,1
Angular8,1
OOPS,1
Servlets,1
Salesforce,1
Angular 8,1
Angular 12,1
Dataiku ML Practitioner,1


## Perform Zero-Shot Classification with a RoBERTa-based Model

The facebook/bart-large-mnli model is used for zero-shot classification as our initial benchmark. Since it is already pre-trained on large amounts of text, we leveraged its prior knowledge to classify job roles directly from skill descriptions without training it on our dataset. This helped us set a performance baseline, allowing us to observe how well a strong general-purpose model performs on our task.

Finally, we evaluate the zero-shot classification performance by calculating accuracy, precision, recall, and F1-score, comparing 'zero_shot_predicted_job_role' against 'rule_label'. Summarize the findings to determine if fine-tuning `roberta-base` is warranted.

Link to the model: https://huggingface.co/facebook/bart-large-mnli




### Model Load:
Load a pre-trained model like `facebook/bart-large-mnli` (which has a RoBERTa-like encoder) and use the `zero-shot-classification` pipeline from Hugging Face Transformers. Apply this pipeline to the 'skills' column of the DataFrame, using our `ALLOWED_JOB_ROLES` as candidate labels. Store the predicted labels in a new column called 'zero_shot_predicted_job_role'.


In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
print("Zero-shot classification pipeline initialized with facebook/bart-large-mnli.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


Zero-shot classification pipeline initialized with facebook/bart-large-mnli.


This defines a function classify_zero_shot that uses the pre-initialized facebook/bart-large-mnli zero-shot classifier to predict the most likely job role from ALLOWED_JOB_ROLES for a given skills_text. It then applies this function to the skills column of the DataFrame to populate a new column, zero_shot_predicted_job_role, with these zero-shot predictions.

In [None]:
def classify_zero_shot(skills_text):
    # The pipeline outputs a dictionary with 'labels' and 'scores'
    result = classifier(skills_text, candidate_labels=ALLOWED_JOB_ROLES, multi_label=False)
    # The first label in the 'labels' list is the top prediction when multi_label=False
    return result['labels'][0]

df['zero_shot_predicted_job_role'] = df['skills'].apply(classify_zero_shot)
print("Zero-shot predicted job roles have been added to the DataFrame.")

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Zero-shot predicted job roles have been added to the DataFrame.


In [None]:
print("Displaying 'skills' and 'zero_shot_predicted_job_role' columns for the first 5 rows:")
display(df[['skills', 'zero_shot_predicted_job_role']].head())

Displaying 'skills' and 'zero_shot_predicted_job_role' columns for the first 5 rows:


Unnamed: 0,skills,zero_shot_predicted_job_role
0,"SIEM, QRadar, McAfee ESM, ArcSight, Splunk, M...",Cybersecurity Analyst
1,"Agile Methodologies, Microsoft Excel, Word, P...",Full Stack Developer
2,"Microsoft development technologies, Net Frame...",Node.js Developer
3,"Microsoft Net Technologies, C#, MVC, ASP.NET ...",Full Stack Developer
4,"Javascript, Nodejs, MySQL, PostgreSQL, MongoD...",Node.js Developer


To understand the distribution of the zero-shot classification results, here is the value counts for the 'zero_shot_predicted_job_role' column.



In [None]:
print("\nDistribution of 'zero_shot_predicted_job_role' column:")
display(df['zero_shot_predicted_job_role'].value_counts())


Distribution of 'zero_shot_predicted_job_role' column:


Unnamed: 0_level_0,count
zero_shot_predicted_job_role,Unnamed: 1_level_1
Full Stack Developer,78
JavaScript Developer,30
Node.js Developer,11
Software Engineer,9
Agile Analyst,9
Database Developer,5
Other,3
SOC Analyst,3
Data Engineer,3
Cybersecurity Analyst,1


**Evaluation of bart-large-mini**:

Here we calculate and displays the accuracy, weighted precision, weighted recall, and weighted F1-score by comparing the zero-shot predicted job roles (y_pred) against the rule-based labels (y_true). These metrics quantify the performance of the facebook/bart-large-mnli model in classifying job roles.

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import numpy as np

print("Metrics imported successfully.")

Metrics imported successfully.


In [None]:
y_true = df['rule_label']
y_pred = df['zero_shot_predicted_job_role']

accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision (weighted): {precision:.4f}")
print(f"Recall (weighted): {recall:.4f}")
print(f"F1-score (weighted): {f1:.4f}")


Accuracy: 0.0968
Precision (weighted): 0.5717
Recall (weighted): 0.0968
F1-score (weighted): 0.1110


## Summary of Zero-Shot Classification Performance

The zero-shot classification using `facebook/bart-large-mnli` to predict job roles based on skills yielded the following metrics:

*   **Accuracy:** 0.0968
*   **Precision (weighted):** 0.5717
*   **Recall (weighted):** 0.0968
*   **F1-score (weighted):** 0.1110

**Analysis of Results:**

The accuracy of approximately 9.7% indicates that the zero-shot model correctly predicted the `rule_label` in a very small fraction of cases. The F1-score of about 0.11 (weighted) further confirms the poor performance, suggesting a significant mismatch between the zero-shot model's predictions and the rule-based labels.

While weighted precision is relatively higher (0.57), this might be misleading without high recall. A low recall (0.0968) means the model failed to identify a large proportion of actual positive cases for each class. The disparity between precision and recall, coupled with low accuracy and F1-score, points to the model struggling to correctly classify job roles according to the `rule_label` baseline.

**Conclusion and Next Steps:**

Given the very low accuracy, recall, and F1-score, the zero-shot classification model `facebook/bart-large-mnli` is **not performing well** on this specific task with the current rule-based labels as ground truth. This suggests that the generalized knowledge of the BART-large-MNLI model, in a zero-shot setting, does not align well with the specific keyword-based logic of the `rule_label` assignments.

Therefore, **fine-tuning a model like `roberta-base` is highly warranted**. Fine-tuning would allow the model to learn the specific patterns and nuances of the skill-to-job-role mapping derived from our `rule_label` dataset, potentially leading to significantly improved performance. Alternatively, a deeper analysis of the discrepancies between the zero-shot predictions and rule-based labels could inform better prompt engineering or a more robust rule-based system if the goal is to align with the current rule logic.

## Initialize RoBERTa for Sequence Classification

The roberta-base model, although a strong general-purpose language model, was not originally trained for the highly specific task of classifying job roles from resume skills using our custom rule-based labels. Fine-tuning was therefore essential.

First, it enabled domain adaptation, allowing the model to adjust from general text patterns to the unique terminology and structure found in resumes and job descriptions.

Second, our dataset defines explicit keyword-to-role mappings, and fine-tuning allowed the model to learn these exact relationships rather than relying on broad prior knowledge.

Third, zero-shot evaluation using facebook/bart-large-mnli demonstrated that generalized models perform poorly on this task, highlighting inherent limitations without domain-specific supervision.

By training on labeled data, the fine-tuned roberta-base model was able to significantly improve accuracy and F1 performance, confirming that targeted adaptation greatly enhances classification quality for our use case.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Get the number of unique job roles
num_unique_job_roles = len(ALLOWED_JOB_ROLES)

# Initialize the tokenizer for roberta-base
roberta_tokenizer = AutoTokenizer.from_pretrained('roberta-base')
print("RoBERTa tokenizer initialized successfully.")

# Initialize the model for sequence classification for roberta-base
roberta_model = AutoModelForSequenceClassification.from_pretrained('roberta-base', num_labels=num_unique_job_roles)
print(f"RoBERTa model initialized successfully with {num_unique_job_roles} labels.")

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

RoBERTa tokenizer initialized successfully.


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


RoBERTa model initialized successfully with 14 labels.


# Prepare the model for fine tuning

The next logical step after initializing the RoBERTa tokenizer and model is to prepare the dataset for fine-tuning. This involves mapping the string labels from 'rule_label' to numerical IDs and then tokenizing the 'skills' text using the `roberta_tokenizer`.



In [None]:
label_to_id = {label: i for i, label in enumerate(ALLOWED_JOB_ROLES)}
id_to_label = {i: label for i, label in enumerate(ALLOWED_JOB_ROLES)}

df['label_id'] = df['rule_label'].map(label_to_id)

# Tokenize the 'skills' column
encodings = roberta_tokenizer(list(df['skills'].values), truncation=True, padding=True)

print("Labels mapped to IDs and 'skills' column tokenized.")

Labels mapped to IDs and 'skills' column tokenized.


**Reasoning**:
With the labels mapped to IDs and the 'skills' column tokenized, the next step is to prepare the dataset for PyTorch. This involves creating a custom `Dataset` class that can hold the tokenized inputs and numerical labels, making it compatible with `DataLoader` for batch processing during fine-tuning.



In [None]:
import torch
from torch.utils.data import Dataset

class ResumeDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

# Create the dataset
dataset = ResumeDataset(encodings, df['label_id'].tolist())

print("Custom Dataset 'ResumeDataset' created from tokenized inputs and labels.")

Custom Dataset 'ResumeDataset' created from tokenized inputs and labels.


## Prepare Dataset for Fine-tuning

Split the dataset into training and evaluation sets using an 80/20 ratio and define a compute_metrics function for evaluation during training.


In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and evaluation sets
train_dataset, eval_dataset = train_test_split(
    dataset, test_size=0.2, random_state=42
)

print(f"Training set size: {len(train_dataset)}")
print(f"Evaluation set size: {len(eval_dataset)}")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=-1)
    accuracy = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average='weighted', zero_division=0)
    return {"accuracy": accuracy, "f1": f1}

print("Dataset split into training and evaluation sets, and 'compute_metrics' function defined.")

Training set size: 124
Evaluation set size: 31
Dataset split into training and evaluation sets, and 'compute_metrics' function defined.


## Fine-tune RoBERTa Model

Configure `TrainingArguments` for the fine-tuning process, specifying parameters like learning rate, batch size, number of epochs, and evaluation strategy. Initialize a `Trainer` with the model, training arguments, tokenized datasets, and our `compute_metrics` function. Then, initiate the fine-tuning process.


In [None]:
from transformers import TrainingArguments, Trainer

# Configure TrainingArguments
training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    logging_dir='./logs',
    logging_steps=10,
    report_to='none' # Disable reporting to services like Weights & Biases
)

# Initialize Trainer
trainer = Trainer(
    model=roberta_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=roberta_tokenizer,
    compute_metrics=compute_metrics
)

# Start fine-tuning
trainer.train()

print("RoBERTa model fine-tuning initiated.")

  trainer = Trainer(


Step,Training Loss
10,2.3419
20,1.6592
30,1.5165
40,1.6009


RoBERTa model fine-tuning initiated.


In [None]:
print("Evaluating the fine-tuned RoBERTa model...")
eval_results = trainer.evaluate()
print(f"Evaluation results: {eval_results}")


Evaluating the fine-tuned RoBERTa model...


Evaluation results: {'eval_loss': 1.3426216840744019, 'eval_accuracy': 0.6774193548387096, 'eval_f1': 0.5471464019851118, 'eval_runtime': 0.6669, 'eval_samples_per_second': 46.486, 'eval_steps_per_second': 5.998, 'epoch': 3.0}


## Compare Fine-tuned RoBERTa with Zero-Shot Performance

Perform a direct comparison of the performance metrics (accuracy, precision, recall, F1-score) of the fine-tuned RoBERTa model against the zero-shot `facebook/bart-large-mnli` model's results. This will clearly highlight the impact of fine-tuning on our specific job role classification task.


In [None]:
print("\n--- Model Performance Comparison ---")
print("Zero-Shot Classification (facebook/bart-large-mnli):")
print(f"  Accuracy: {accuracy:.4f}")
print(f"  Precision (weighted): {precision:.4f}")
print(f"  Recall (weighted): {recall:.4f}")
print(f"  F1-score (weighted): {f1:.4f}")

fine_tuned_accuracy = eval_results['eval_accuracy']
fine_tuned_f1 = eval_results['eval_f1']
# Note: Precision and Recall are not directly available from trainer.evaluate() by default without further configuration,
# so we'll compare with available metrics.

print("\nFine-tuned RoBERTa Model:")
print(f"  Accuracy: {fine_tuned_accuracy:.4f}")
print(f"  F1-score (weighted): {fine_tuned_f1:.4f}")

print("\n--- Impact of Fine-tuning ---")
print(f"Accuracy improvement: {fine_tuned_accuracy - accuracy:.4f}")
print(f"F1-score improvement: {fine_tuned_f1 - f1:.4f}")


--- Model Performance Comparison ---
Zero-Shot Classification (facebook/bart-large-mnli):
  Accuracy: 0.0968
  Precision (weighted): 0.5717
  Recall (weighted): 0.0968
  F1-score (weighted): 0.1110

Fine-tuned RoBERTa Model:
  Accuracy: 0.6774
  F1-score (weighted): 0.5471

--- Impact of Fine-tuning ---
Accuracy improvement: 0.5806
F1-score improvement: 0.4362


## Summary:

Zero-Shot Classification (using facebook/bart-large-mnli):

Accuracy: 0.0968
Precision (weighted): 0.5717
Recall (weighted): 0.0968
F1-score (weighted): 0.1110
Fine-tuned RoBERTa Model:

Accuracy: 0.6774
F1-score (weighted): 0.5471
Impact of Fine-tuning:

Accuracy improvement: 0.5806 (from 0.0968 to 0.6774)
F1-score improvement: 0.4362 (from 0.1110 to 0.5471)

### Conclusion

Fine-tuning the RoBERTa model resulted in a significant improvement in performance compared to the zero-shot classification. The accuracy jumped from approximately 9.7% to 67.7%, and the F1-score saw a substantial increase. This clearly demonstrates the effectiveness of fine-tuning a model on specific, labeled data for this job role classification task.


### Insights or Next Steps

*   **Address Data Sparsity/Imbalance:** The error encountered during stratified splitting (due to classes with single members) suggests potential issues with class imbalance or limited data for some job roles. Further analysis of class distribution and techniques like oversampling (e.g., SMOTE) or undersampling could be beneficial.

*   **Hyperparameter Tuning and Regularization:** Explore different learning rates, batch sizes, and regularization techniques (e.g., weight decay, dropout) during fine-tuning to potentially achieve higher accuracy and F1-scores and improve generalization.
