In [2]:
# Load the 'Employee Productivity and Satisfaction HR Data' dataset
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('hr_dashboard_data.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'hr_dashboard_data.csv'

# 1. Distribution of Employee Salary

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
sns.kdeplot(df['Salary'], color='green', ax=ax, shade=True)

# add vertical line at mean
mean = df['Salary'].mean()
ax.axvline(mean, color='black', linestyle='--')

# Add title using ax.set_title()
ax.set_title('Distribution of Employee Salary')

plt.show()

#Bimodularity

In [None]:
df[['Salary']].describe()

Interpretation: The distribution of salary ranges from 30k to 120k. In this KDE graph, bimodularity is observed by two dominant groups: one with a salary range of 40k-60k and the other with a salary range of 100k-120k. There is a dip around the median at 80k between these two groups. This leads to show interesting trends in salary range. It points to speculate and explore what factors can contribute to higher compensations and which features are strongly associated with salary prediction in this data analysis.

# 2. Distribution of Salary by Gender

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
sns.kdeplot(data=df, x='Salary', hue='Gender', fill=True, common_norm=False, alpha=0.4)
# Add title using ax.set_title()
ax.set_title('Distribution of Salary by Gender')
plt.show()


In [None]:
df.groupby('Gender')[['Salary']].describe()

Interpretation: In this dataset of a total of 200 employees, there is an equal representation of males and females. There are 100 males and 100 females. This ensures that recruitment and hiring practices are fair and unbiased. The female salary range across 30k-117k and male salary range across 31k-120k. In this graph, females are slightly dominant in the higher range around 100k+ and males are slightly dominant at 60k+. Overall, the distribution of salary is equivalent with a mean salary at 76k. This concludes to show no gender disparity which entails that there is equal opportunity for compensation and promotions in similar roles and with comparable experience.

# 3. Distribution of Salary by Positions

In [None]:
order = df.groupby("Position")["Salary"].mean().sort_values(ascending=False).index

In [None]:
# KDE plot with hues
fig, ax = plt.subplots(figsize=(8, 6))
sns.kdeplot(data=df, x='Salary', hue='Position', fill=True, common_norm=False, alpha=0.4)
# Add title using ax.set_title()
ax.set_title('Distribution of Salary by Position')
plt.show()

In [None]:
df.groupby('Position')[['Salary']].describe()

Interpretation:The salary distribution by job title is shown with Managers leading the higher-end of compensation range of 100k-120k. Team leads are slightly overlapping with Mangers with a range of 90k-108k. Surprisingly, Senior Developers are not in the 100K group. Their salary ranges from 80K-90K. Below the Senior Developers, the Analysts range across 60k-75k, descending with Junior Developers at the range of 45k-60k. Lastly, the Interns are at the lowest margin of salary at 30-40K. Recalling from the salary distribution with bimodularity shown earlier- Junior Developers are that significant group at 40-60K and Managers are the other dominant group at 100k-120k. 

# 4. Distribution of Employees by Position

In [None]:
count_Pos = df['Position'].value_counts().reset_index()
count_Pos.columns=['Position','Number']
count_Pos

In [None]:
order = count_Pos.groupby("Position")["Number"].mean().sort_values(ascending=False).index

In [None]:
# Create a palette mapping for the unique positions
positions = df['Position'].unique()
palette = dict(zip(positions, sns.color_palette()))

In [None]:
sns.barplot(data=count_Pos, x='Position', y='Number', order=order, palette=palette, ci=None, alpha=0.7)
plt.xticks(rotation=90)
# Add title to the plot
plt.title('Distribution of Employees by Positon')
plt.show()

Interpretation: The bar plot shows the number of employees in each position with Mangers leading with 40 employees followed by Junior Developers at 35. The rest of the positions are roughly about the same number of employees in the 30+ range. The distribution of employees indicates that Managers have a higher representation and holds higher-end compensations in this company which concludes to say that the top ten percentile are all Managers. Interestingly, Team Leads and Senior Developers do not make it to the top ten percentile. 

# 5. Distribution of Top Ten Percentile Salary by Department

In [None]:
top_ten_quan = df.Salary.quantile(0.9)
dfTopTen = df[df['Salary'] > top_ten_quan]
dfTopTen

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))
sns.kdeplot(data=dfTopTen, x='Salary', hue='Department', fill=True, common_norm=False, alpha=0.4)
# Add title using ax.set_title()
ax.set_title('Top Ten Percentile Salary by Department')
plt.show()

In [None]:
dfTopTen.groupby('Department')[['Salary']].describe()

Interpretation: The top ten percentile consists of 20 Managers across five departments. The distribution of salary by department shows five in Finance, three in Marketing, three in IT, and six in Sales having the highest salary at the maximum at 118k-119k. Three employees in HR salary's distribution is clustered tightly with a mean salary at 116K. HR department has consistent pay structure whereas the other departments compensations are more spread out.

# 6. Relationship Between Salary and Project Completed

In [None]:
df.plot.scatter(x='Projects Completed', y='Salary', title = 'Project Completed vs. Salary')

In [None]:
df.groupby('Position')[['Projects Completed']].describe()


In [None]:
from scipy.stats import pearsonr
corr, p_value = pearsonr(df['Salary'], df['Projects Completed'])
print(f"Correlation: {corr}, P-value: {p_value}")

Interpretation: The scatter plot shows a strong positive correlation of 0.87 between salary and projects completed. This could possibly link to performance-based compensation where employees' performance is rewarded for completing projects. If this is the case, then this can motivate employees to work harder and strive for excellence when they know their efforts will be rewarded. It will be interesting to see how employees' productivity rate (%) and satisfactory rate (%) are related and how it impacts salary.