In [None]:
i

In [None]:
import os
os.chdir("..")

In [None]:
os.listdir()

## 📦 Step 0 – Load the Data

I'm starting by loading the resumes dataset using pandas. It includes columns like name, job title, experience, skills, and education.


In [30]:
# Import pandas to work with data
import pandas as pd

# Load my dataset
df = pd.read_csv("data/resumes.csv")  # path matches my project folder structure

# Peek at the first few rows
df.head()


FileNotFoundError: [Errno 2] No such file or directory: 'data/resumes.csv'

## 👀 Step 1 – Initial Data Preview

I'm checking the overall structure of the dataset and seeing if any values are missing or badly formatted.


In [None]:
# Check data structure and non-null values
df.info()

# Check for missing values
df.isnull().sum()


## 🔍 Step 2 – Job Titles & Experience Overview

I want to understand how many types of job titles there are and how experience is distributed across the dataset.


In [None]:
# Count job titles
df['job_title'].value_counts()


In [None]:
# Describe experience years
df['experience_years'].describe()


## 🛠️ Step 3 – Top Skills Analysis

The `skills` column contains many skills in a single string separated by `;`. I'm going to split them and count how often each skill shows up.


In [None]:
from collections import Counter

all_skills = []

# Split and clean each skill entry
for skill_str in df['skills']:
    all_skills.extend([s.strip() for s in skill_str.split(';')])

# Count frequency of each skill
skill_counts = Counter(all_skills)

# Display top 10 skills
skill_counts.most_common(10)


## 📊 Step 4 – Visualizing Top 10 Skills

To get a clearer view of the top skills, I'm creating a horizontal bar chart. This helps me quickly spot trends in the dataset.


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Prepare top 10 skills
top_skills = skill_counts.most_common(10)
skills, counts = zip(*top_skills)

# Create bar plot
plt.figure(figsize=(10, 5))
sns.barplot(x=list(counts), y=list(skills))
plt.title('Top 10 Skills in the Dataset')
plt.xlabel('Frequency')
plt.ylabel('Skill')
plt.tight_layout()
plt.show()
