## EQF Classification

### Load the pre-trained model

In [47]:
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline(
    "text-generation", 
    model=model_id, 
    model_kwargs={"torch_dtype": torch.bfloat16}, 
    device_map="auto"
)

### Load Data

In [50]:
import pandas as pd
from sklearn.model_selection import train_test_split

excel_file = './job_descriptions.csv'
df = pd.read_csv(excel_file)

_, sampled_df = train_test_split(df, test_size=0.00001, stratify=df['Qualifications'], random_state=42)

### Prepare Data

In [51]:
sampled_df['Concatenated'] = sampled_df[['Job Title', 'Job Description', 'skills', 'Responsibilities']].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)

texts = sampled_df['Concatenated'].tolist()
labels = sampled_df['Qualifications'].tolist()

test_texts, val_texts, test_labels, val_labels = train_test_split(texts, labels, test_size=0.8, random_state=42)

In [52]:
print("test_texts", len(test_texts))
print("val_texts", len(val_texts))

test_texts 3
val_texts 14


### Build Prompt

In [53]:
import random

def build_prompt(query, shot_texts, shot_labels, num_shots = 4):
    # Select random examples
    selected_indices = random.sample(range(len(shot_texts)), num_shots)
    selected_texts = [shot_texts[i] for i in selected_indices]
    selected_labels = [shot_labels[i] for i in selected_indices]

    # Format the prompt
    prompt = "Complete the academic requirement: \n"
    for text, label in zip(selected_texts, selected_labels):
        prompt += f"Job Offer: {text}\n\nRequirement: {label}\n\n***\n\n"
    prompt += f"Job Offer: {query}\n\nRequirement: "
    return prompt

### Classify

In [54]:
responses = []

for test_text, true_label in zip(test_texts, test_labels):
    prompt = build_prompt(test_text, val_texts, val_labels, 12)
    response = pipeline(prompt)
    responses.append(response)

### Evaluate

In [55]:
# Evaluate the responses
correct_predictions = 0

for response, true_label in zip(responses, test_labels):
    print (response, "-", true_label)
    if response.lower() in true_label.lower():
        correct_predictions += 1

# Calculate accuracy
accuracy = correct_predictions / len(test_texts)
print(f"Accuracy: {accuracy:.4f}")

Here is the completed academic requirement table:

| Job Offer | Requirements |
| --- | --- |
| Account Manager | Master's Degree |
| Digital Marketing Specialist (Email) | Bachelor's Degree |
| Digital Marketing Specialist (Email) | PhD or Doctorate |
| Email Marketing Specialist | Master's Degree |
| QA Analyst | PhD or Doctorate |
| Market Research Analyst | Master's Degree |
| Sales Associate | Bachelor's Degree |
| Account Executive | Bachelor's Degree |
| Web Designer | Bachelor's Degree |
| Electrical Engineer | Bachelor's Degree |
| HR Coordinator (Training) | Master's Degree |
| Speech Therapist | Bachelor's Degree |
| Supply Chain Manager |  |

Let me know if you have any further questions! - Bachelor's Degree
Here is the completed academic requirement section:

***

Job Offer: Web Designer
Frontend Web Designers create the visual elements and user interfaces of websites. They use HTML, CSS, and JavaScript to design responsive, user-friendly web pages, ensuring a seamless and