<a href="https://www.kaggle.com/code/aabdollahii/jobrecom?scriptVersionId=266637996" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="background-color:#1e1e1e; color:#f2f2f2; padding:20px; font-family:Arial, sans-serif">
    <h2 style="color:#00c8ff;">Step 1: Load & Inspect Dataset</h2>
    <p>In this step we aim to:</p>
    <ul>
        <li>Load the dataset from the Kaggle environment.</li>
        <li>Preview the first few rows to understand its structure.</li>
        <li>Generate a statistical summary of all columns.</li>
        <li>Save these insights as a <b>Dark Theme HTML report</b> for easy viewing.</li>
    </ul>
    <p>This inspection is essential to understand the dataset before any preprocessing or model building.</p>
</div>


In [12]:
import pandas as pd

# Path to dataset in Kaggle environment
path = "/kaggle/input/job-descriptions-2025-tech-and-non-tech-roles/job_dataset.csv"

# Load dataset
df = pd.read_csv(path)

# Preview first 5 rows
print("Preview of dataset:")
print(df.head())

# General info
print("\nDataset Info:")
print(df.info())

# Summary statistics
print("\nSummary Statistics:")
print(df.describe(include='all'))


Preview of dataset:
       JobID           Title ExperienceLevel YearsOfExperience  \
0  NET-F-001  .NET Developer         Fresher               0-1   
1  NET-F-002  .NET Developer         Fresher               0-1   
2  NET-F-003  .NET Developer         Fresher               0-1   
3  NET-F-004  .NET Developer         Fresher               0-1   
4  NET-F-005  .NET Developer         Fresher               0-1   

                                              Skills  \
0  C#; VB.NET basics; .NET Framework; .NET Core f...   
1  C#; .NET Framework basics; ASP.NET; Razor; HTM...   
2  C#; VB.NET basics; .NET Core; ASP.NET MVC; HTM...   
3  C#; .NET Framework; ASP.NET basics; SQL Server...   
4  C#; ASP.NET; MVC; Entity Framework basics; SQL...   

                                    Responsibilities  \
0  Assist in coding and debugging applications; L...   
1  Write simple C# programs under guidance; Suppo...   
2  Contribute to development of small modules; As...   
3  Support in software

<div style="background-color:#1e1e1e; color:#f2f2f2; padding:20px; font-family:Arial, sans-serif">
    <h2 style="color:#00c8ff;">Step 2: Preprocess Skills Column</h2>
    <p>In this step we will:</p>
    <ul>
        <li>Clean and normalize the <b>Skills</b> column to prepare for vectorization.</li>
        <li>Convert all text to lowercase for uniformity.</li>
        <li>Replace different separators (e.g., semicolons) with spaces.</li>
        <li>Remove extra spaces to ensure clean tokenization later.</li>
    </ul>
    <p>The results will be saved in a new column called <code>Skills_clean</code>, which we will use for embeddings and cosine similarity calculations.</p>
</div>


In [13]:
# Create cleaned skills column
df['Skills_clean'] = (
    df['Skills']
    .astype(str)                # Ensure string type
    .str.lower()                 # Lowercase
    .str.replace(';', ' ', regex=False)  # Replace semicolons with space
    .str.strip()                  # Remove extra spaces at start/end
    .str.replace(r'\s+', ' ', regex=True) # Remove multiple spaces
)

# Preview cleaned skills
print("Preview of cleaned skills:")
print(df[['Skills', 'Skills_clean']].head())


Preview of cleaned skills:
                                              Skills  \
0  C#; VB.NET basics; .NET Framework; .NET Core f...   
1  C#; .NET Framework basics; ASP.NET; Razor; HTM...   
2  C#; VB.NET basics; .NET Core; ASP.NET MVC; HTM...   
3  C#; .NET Framework; ASP.NET basics; SQL Server...   
4  C#; ASP.NET; MVC; Entity Framework basics; SQL...   

                                        Skills_clean  
0  c# vb.net basics .net framework .net core fund...  
1  c# .net framework basics asp.net razor html cs...  
2  c# vb.net basics .net core asp.net mvc html cs...  
3  c# .net framework asp.net basics sql server ht...  
4  c# asp.net mvc entity framework basics sql ser...  


Still working

<div style="background-color:#1e1e1e; color:#f2f2f2; padding:20px; font-family:Arial, sans-serif">
    <h2 style="color:#00c8ff;">Step 3: Create Embeddings for Skills</h2>
    <p>In this step we will:</p>
    <ul>
        <li>Use the <code>SentenceTransformer</code> model <b>all-MiniLM-L6-v2</b> to convert each job's cleaned skill list into a numerical vector (embedding).</li>
        <li>These embeddings capture the semantic meaning of skills, enabling better matching beyond exact keywords.</li>
        <li>We will store these embeddings in memory for fast similarity searches later in our recommendation engine.</li>
    </ul>
    <p>This transformation is a core part of building our cosine similarity-based job recommendation system.</p>
</div>


In [14]:
!pip install -q sentence-transformers

from sentence_transformers import SentenceTransformer

# Load a Kaggle-safe model
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Encode skills into embeddings
skills_embeddings = model.encode(df['Skills_clean'].tolist(), convert_to_tensor=True)

print("✅ Embeddings created successfully.")
print("Embedding tensor shape:", skills_embeddings.shape)



RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-68e6a388-1c45e67d48514beb0359d3c1;96650862-a38e-4907-8496-d9ce1f616559)

Entry Not Found for url: https://huggingface.co/api/models/sentence-transformers/paraphrase-MiniLM-L6-v2/tree/main/additional_chat_templates?recursive=false&expand=false.
additional_chat_templates does not exist on "main"