
# Day 4 ‚Äì PART 2 Practice Solutions

This notebook covers:

## üß† Level 1 ‚Äî Basic Preprocessing
- Label Encoding (Gender)
- One-Hot Encoding (City)
- Min-Max Scaling (Age)
- Standardization (ExperienceInCurrentDomain)

## ‚öôÔ∏è Level 2 ‚Äî Feature Engineering
- AgeGroup
- YearsInCompany
- Correlation with LeaveOrNot

## üî• Level 3 ‚Äî Data Scientist Thinking

## üß™ Bonus
Reusable preprocessing function.


In [None]:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, MinMaxScaler, StandardScaler

df = pd.read_csv("cleaned_employee_attrition.csv")
df.columns = df.columns.str.lower()

df.head()


## üß† Level 1 ‚Äî Basic Preprocessing

In [None]:

le = LabelEncoder()
df["gender_encoded"] = le.fit_transform(df["gender"])

df[["gender", "gender_encoded"]].head()


In [None]:

df = pd.get_dummies(df, columns=["city"], drop_first=True)
df.head()


In [None]:

minmax = MinMaxScaler()
df["age_minmax"] = minmax.fit_transform(df[["age"]])

df[["age", "age_minmax"]].head()


In [None]:

scaler = StandardScaler()
df["experience_standardized"] = scaler.fit_transform(
    df[["experienceincurrentdomain"]]
)

df[["experienceincurrentdomain", "experience_standardized"]].head()


## ‚öôÔ∏è Level 2 ‚Äî Feature Engineering

In [None]:

def create_age_group(age):
    if age < 30:
        return "Young"
    elif age <= 45:
        return "Mid"
    else:
        return "Senior"

df["agegroup"] = df["age"].apply(create_age_group)

df[["age", "agegroup"]].head()


In [None]:

df["yearsincompany"] = 2025 - df["joiningyear"]
df[["joiningyear", "yearsincompany"]].head()


In [None]:

if df["leaveornot"].dtype == "object":
    df["leaveornot"] = LabelEncoder().fit_transform(df["leaveornot"])

correlation = df.corr(numeric_only=True)["leaveornot"].sort_values(ascending=False)
correlation



## üî• Level 3 ‚Äî Data Scientist Thinking

### Which features look most important?
ExperienceInCurrentDomain, YearsInCompany, PaymentTier, Age.

### Which engineered features improved understanding?
YearsInCompany and AgeGroup improved interpretability.

### What features might cause data leakage?
Post-resignation features, exit interview data, final performance reviews.

### What features would you add with more data?
Salary, promotions, satisfaction score, overtime, manager rating.


## üß™ Bonus ‚Äî Preprocessing Function

In [None]:

def preprocess_employee_data(df):
    
    df = df.copy()
    df.columns = df.columns.str.lower()
    
    # Handle missing values
    df = df.fillna(df.median(numeric_only=True))
    
    # Encode categorical
    le = LabelEncoder()
    
    if "gender" in df.columns:
        df["gender"] = le.fit_transform(df["gender"])
        
    if "everbenched" in df.columns:
        df["everbenched"] = le.fit_transform(df["everbenched"])
    
    df = pd.get_dummies(df, columns=["city", "education"], drop_first=True)
    
    # Scale numerical
    scaler = StandardScaler()
    numeric_cols = ["age", "experienceincurrentdomain"]
    
    for col in numeric_cols:
        if col in df.columns:
            df[col] = scaler.fit_transform(df[[col]])
    
    return df

processed_df = preprocess_employee_data(df)
processed_df.head()
