1. Introduction:

Human Resource (HR) Data of a Multi-national Corporation (MNC)

This dataset contains HR information for employees of a multinational corporation (MNC). It includes 2 Million (20 Lakhs) employee records with details about personal identifiers, job-related attributes, performance, employment status, and salary information.
The dataset can be used for HR analytics, including workforce distribution, attrition analysis, salary trends, and performance evaluation.

2. ASK:

*What are the purposes of this analysis:*
- Workforce Distribution: Analyzing the demographics and structure of the workforce across different departments, locations, or roles. This could help management understand the composition of their staff and identify any imbalances.

- Attrition Analysis: Studying employee turnover to understand why employees are leaving the company. By analyzing data on terminated employees, an organization can identify trends and potential root causes of attrition, such as low job satisfaction or poor management.

- Salary Trends: Examining salary data to identify patterns, ensure fair compensation, and detect potential pay gaps. This can be used for benchmarking salaries against industry standards or for making informed decisions on compensation adjustments.

- Performance Evaluation: Assessing employee performance to identify high-achievers, pinpoint areas for improvement, and link performance to other factors like salary or retention.

*And they will be achieved by answering these questions:*

Q.1) What is the distribution of Employee Status (Active, Resigned, Retired, Terminated) ?

Q.2) What is the distribution of work modes (On-site, Remote) ?

Q.3) How many employees are there in each department ?

Q.4) What is the average salary by Department ?

Q.5) Which job title has the highest average salary ?

Q.6) What is the average salary in different Departments based on Job Title ?

Q.7) How many employees Resigned & Terminated in each department ?

Q.8) How does salary vary with years of experience ?

Q.9) What is the average performance rating by department ?

Q.10) Which Country have the highest concentration of employees ?

Q.11) Is there a correlation between performance rating and salary ?

Q.12) How has the number of hires changed over time (per year) ?

Q.13) Compare salaries of Remote vs. On-site employees — is there a significant difference ?

Q.14) Find the top 10 employees with the highest salary in each department.

Q.15) Identify departments with the highest attrition rate (Resigned %).

3. PREPARE:
*Data observation:*
- License: Database Contents License (DbCL) v1.0
This dataset is highly permissive but no wanrranties of its accuracy, errors, fitness for purposes. Which means the dataset should be used only in practicing.
- Validation Date: Last update was August 23, 2025.

Process preparation:**
Setup library and import dataset

In [2]:
# Import Library
import polars as pl
import seaborn as sns
import matplotlib.pyplot as plt
import os

from utils import check_missing_values, check_information, check_duplicates, display_duplicates

# Import dataset
df = pl.read_csv("HR_Data_MNC_Data Science Lovers.csv")

# USD & INR Relationship
n = 87.63

ModuleNotFoundError: No module named 'utils'

In [None]:
print("Show dataframe structure")
print("---------------------------------------------------------")
display(df.head(50))
print("check_missing_values")
print("---------------------------------------------------------")
display(df.null_count())
print("check_information")
print("---------------------------------------------------------")
display(df.describe())
print("check_duplicates")
duplicates = df.is_duplicated().sum() #May also be used with df.unique(subset = ['column'])
display(duplicates)
if duplicates > 0:
        print(f"Duplicate rows found: {duplicates}")
else:
        print("No duplicate rows found.")

*Initial Data Exploration:*
1) Unnamed: 0 – Index column (auto-generated, not useful for analysis, will be deleted).
2) Employee_ID – Unique identifier assigned to each employee (e.g., EMP0000001).
3) Full_Name – Full name of the employee.
4) Department – Department in which the employee works (e.g., IT, HR, Marketing, Operations).
5) Job_Title – Designation or role of the employee (e.g., Software Engineer, HR Manager).
6) Hire_Date – The date when the employee was hired by the company.
7) Location – Geographical location of the employee (city, country).
8) Performance_Rating – Performance evaluation score (numeric scale, higher is better).
9) Experience_Years – Number of years of professional experience the employee has.
10) Status – Current employment status (e.g., Active, Resigned).
11) Work_Mode – Mode of working (e.g., On-site, Hybrid, Remote).
12) Salary_INR – Annual salary of the employee in Indian Rupees.

*Data Cleaning*
With the above checking of missing values, duplications, data input consistence, column name. No problem is found.

4. PROCESS:

In [None]:
df = df.with_columns([
    pl.col("Hire_Date").str.to_date(),
    pl.col('Salary_INR').cast(pl.Float64)
])

In [None]:
display(df)
for col in df.columns:
    print(f'The column {col} has unique values: {df[col].unique()}')
    #print(f'The column {col} has unique values: {df["{col}"].unique()}')

In [None]:
display(df['Job_Title'].unique().to_list())

In [None]:
# Define job levels based on common corporate hierarchy
level = {
    'C-level': ['CTO', 'CFO', 'CEO'],
    'V-level': ['VP'],
    'D-level': ['Director'],
    'B-level': ['Manager'],
    'Employee': ['Specialist', 'Engineer', 'Accountant', 'Technician', 'Scientist', 'Executive', 'Analyst', 'Developer', 'Coordinator', 'Strategist']
}

CLevel_cond = pl.col('Job_Title').str.contains('|'.join(level['C-level']))
VLevel_cond = pl.col('Job_Title').str.contains('|'.join(level['V-level']))
DLevel_cond = pl.col('Job_Title').str.contains('|'.join(level['D-level']))
BLevel_cond = pl.col('Job_Title').str.contains('|'.join(level['B-level']))
Employee_cond = pl.col('Job_Title').str.contains('|'.join(level['Employee']))

df_with_level = df.with_columns(
    pl.when(CLevel_cond).then(pl.lit('C-level'))
    .when(VLevel_cond).then(pl.lit('V-level'))
    .when(DLevel_cond).then(pl.lit('D-level'))
    .when(BLevel_cond).then(pl.lit('B-level'))
    .otherwise(pl.lit('Employee'))
    .alias('Level')
)
display(df_with_level)

In [None]:
df_with_level = df_with_level.with_columns(
    pl.col('Hire_Date').dt.year().alias('Hire_Year'),
    pl.col('Hire_Date').dt.month().alias('Hire_Month'),
    pl.col('Hire_Date').dt.weekday().alias('Hire_Weekday'),
    (pl.col('Salary_INR')/n).alias('Salary_USD')
)
display(df_with_level)

There are remarkable value in various field:
1) Department: Finance, IT, Operations, HR, Marketing, Sales, and R&D.
2) Levels of management: Spreading from C-level to Employee.

In [None]:
# Export to csv
#df_with_level.write_csv('HR_Data_MNC_with_levels.csv')

5. ANALYSIS:

In [None]:
%%HTML

<div class='tableauPlaceholder' id='viz1756313292855' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;HR&#47;HR_Data_MNC&#47;Sheet1&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='HR_Data_MNC&#47;Sheet1' /><param name='tabs' value='yes' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;HR&#47;HR_Data_MNC&#47;Sheet1&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-GB' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1756313292855');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

6. SHARING & ACT: