# **EMPLOYEE PERFORMANCE ANALYSIS for INX FUTURE Inc.**

<div align="center">

| **Filing Data**              | **Details**                             |
|------------------------------|-----------------------------------------|
| **Candidate Name**           | Norman Gwangwava                        |
| **Candidate Email**          | eng.normie@gmail.com                    |
| **Project Code**             | 10281                                   |
| **REP Name**                 | DataMites™ Solutions Pvt Ltd            |
| **Venue Name**               | Open Project                            |
| **Exam Country**             | India                                   |
| **Assessment ID**            | E10901-PR2-V18                          |
| **Module**                   | Certified Data Scientist - Project      |
| **Language**                 | English                                 |
| **Exam Format**              | Open Project - IABAC™ Submission        |
| **Registered Trainer**       | Ashok Kumar A                           |
| **Project Assessment**       | IABAC™                                  |
| **Project Documents**        | [Project Scenario](http://www.iabac.org/exam/p2/CDS_Project_2_INX_Future_Emp_Data_V1.6.pdf), [Project Submission Guidelines](http://www.iabac.org/exam/p2/IABAC_CDS_Project_Submission_Guidelines_V1.2.pdf) |
| **Submission Deadline Date** | 18-Jan-2025 @23:59 Hrs [IST]            |


</div>


---


# **Problem Statement**


---

INX Future Inc , (referred as INX ) , is one of the leading data analytics and automation solutions providers with over 15 years of global business presence. INX has consistently rated as the top 20 best employers for the past 5 years. INX human resource policies are considered as employee friendly and widely perceived as best practices in the industry.

Over the recent years, the employee performance indexes are not healthy and this is becoming a growing concern among the top management. There has been increased escalations on service delivery and client satisfaction levels came down by 8 percentage points.

The CEO, Mr. Brain decided to initiate a data science project, which analyses the current employee data and finds the core underlying causes of these performance issues. Project findings are expected to help the company take the right course of actions, and provide clear indicators of non performing employees, so that any penalization of non-performing employees, if required, may not significantly affect other employee morals.

The original dataset for this analysis is from [IABAC](http://data.iabac.org/exam/p2/data/INX_Future_Inc_Employee_Performance_CDS_Project2_Data_V1.8.xls).

Expected insights are:

- Department wise performances,
- Top 3 Important Factors affecting employee performance,
- A trained model which can predict employee performance based on factors as inputs. This will be used to hire employees,
- Recommendations to improve the employee performance based on insights from analysis.


---


#**Solution Approach**


---



The project solution approach follows a standard Machine Learning project methodology described below:

* **Data Preprocessing**:
    - Clean and preprocess dataset. Handle missing values, outliers, and any inconsistencies.
    - Convert features to appropriate formats.
    - Extract relevant features.

* **Feature Engineering**:
    - Creating new features that might impact employee performance. Steps taken are as below:
        - Convert categorical to numerical
        - Check outliers & Impute outliers
        - Feature transformation
        - Feature scaling

* **Exploratory Data Analysis (EDA)**:
    - Visualizing data to understand patterns, correlations, and distributions.
    - Identifying any trends, or anomalies.

* **Model Building/Training/Testing**:
    - Training predictive models to conduct employee performance analysis.
    - Some relevant models include:
        - **Logistic Regression**: Supervised machine learning algorithm used for binary or multi-class classification problems.
        - **Random Forests**: Handle non-linear relationships and feature importance.
        - **Gradient Boosting**: Ensemble method for accurate predictions.
        - **Neural Networks**: Deep learning models for complex patterns.

* **Evaluation and Validation**:
    - Evaluating model performance using metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), or Mean Absolute Percentage Error (MAPE).

* **Implementation**:
    - Once a reliable model is identified, it is integrated it into the employee performance management system.
    - The model is continuously monitored and updated as new data becomes available.



---

#**Analysis and Insights**


---



**Understanding the Dataset**


* The employee performance dataset consists of 1200 records, each record containing 28 (columns) features.

* There dataset consists of the following features:
 - 16 categorical features,
 - 11 numeric features, and
 - 1 alpha-numeric feature [EmpNumber] - not usefull for analysis hence dropped.

 _**NB:** Performance Rating is the target feature_




---


**General Observation and Insights**

* Approximately **46%** of employees are married, **22%** have devorced, whilst **22%** are single.

* Three departments **(Sales, Development, Research and Development)** out of the total of six constitute **89%** of the total employees.

* About **71%** of total employees rarely travel.

* Only **29%** of total employees work overtime.

* There is a low employee attrition of about **15%**.

* The number of the educational backgrounds present in the employees is **six** unique backgrounds.

* **Nineteen** unique employee job roles are present in this company.

* The most of the employees are having the education level of **3**

* The Job satisfaction level in this company is high level for the majority of employees.

* Only **11%** of employees in the company were achieved **level 4** - performance rating

* The employee age ranges from **18** to **60**, with most of the employees lying between age **25** to **40**,

* The distance from home to office ranges from **0** unit to **30** units, with the most of the employees coming from the range of **0** to **5** units,

* Employees have generally worked in multiple companies up to **8**, with most of the employees having worked for up to **2** companies before joining the company (INX Future Inc.),

* The hourly rate range is **65** to **95** for majority employees who work in this company,

* Work experiance range between 0 to 40, with most of the employee experiance ranging between **5** to **10**,

* In general, most of the employees have spent **5** working for INX Future Inc.,

* Most of the employees get **11%** to **15%** of salary hike in this company,

* **72.8%** of overall employees fall under performance rating of **3**, whilst only **11%** have scored performance rating of 4 and **16.2%** rating **2**,

* Average performance rating across all departments is approximately **3**,

* Employee education level ranges between **1** to **5** [below college - doctor], with majority employees having an education level of **3** and **4**. Education levels **1, 2 and 5** constitute total employees of **150, 250, and 50** respectively,

* Employee Environment Statisfaction level ranges from **1 to 4**, with **3 and 4** constituting the highest proportion of employees. **1 and 2** constitute thw least number of employees (approximately **230 and 250** respectively),

* Majority of employees (**700**) have a work life balance score of **3 (Better)**.  The remaining fewer proportions are distributed across levels 1 (Bad), 2 (Good) and 4 (Best).



---


**Impact of Other Features on Performance Rating (Target Feature)**

* Gender vs Performance Rating:
 - Majority male and female employees have a performance rating of **3**,
 - Least number of employees rate **4** for both males and females.

* EducationBackground vs Performance Rating:
 - Majority of life sciences and medical education background employees have performance rating 3,
 - Life sciences and medical education employees constitute higher numbers with performance rating of 4.

* MaritalStatus vs Performance Rating:
 - All employees across the three categories of marital status rate 3,
 - A bigger number from the Married category also have a performance rating of 2.

* Business Travel Frequency vs Performance Rating:
 - Majority of employees who rarely travel have a rating of **3**,
 - Frequent and non - travellers have a similar distribution pattern of performance rating, though they're very few.

* Overtime vs Performance Rating:
 - Majority employees don't do overtime,
 - Highest number of employees rate **3** among both categories doing overtime and those who don't,
 - Least number of employees rate **4** among both categories.

* Attrition vs Performance Rating:
 - A similar distribution pattern shows for those likely to leave (Attrition - Yes) and not likely to leave (Attrition - No), where majority rate 3, followed by 2 and 4 being the least.

 * EmpEducationLevel vs Performance Rating:
 - There is equal distribution of employee education level across the performance bands [3,4 & 2].

* EmpEnvironmentSatisfaction vs Performance Rating:
 - Majority employees with performance rating of 3 are in the range of EmpEnvironmentSatisfaction 2 - 4 and a median of 3,
 - Employees with performance rating of 4 are in the range of EmpEnvironmentSatisfaction 3 - 4,
 - Employees with performance rating 2 are in the range of EmpEnvironmentSatisfaction 1 - 2.

* EmpJobInvolvement vs Performance Rating:
 - There is equal distribution of employee job involvement level across the performance bands [3,4 & 2],
 - Employees with job involvement of 2 & 3 mostly have performance rating 3.

* EmpJobLevel vs Performance Rating:
 - Performance rating 2 and 3 have the highest proportion of employees with job levels 1 - 3 and a median job level of 2,
 - Performance rating 4 has the least proportion of employees, with job levels 1 and 2, as well as some outliers in job levels 4 and 4.

* EmpJobSatisfaction vs Performance Rating:
 - There is equal distribution of employee job satisfaction across the performance bands [3,4 & 2],
 - Majority job satisfaction range between 2 - 4, with median of 3.

* EmpWorkLifeBalance vs Performance Rating:
 - There is equal distribution of work life balance across the performance bands [3 & 2], with EmpWorkLifeBalance of 2 and 3
 - A very small proportion falls under performance rating 4, with median EmpWorkLifeBalance of 3 and some outliers in EmpWorkLifeBalance of 2 and 4.

* YearsSinceLastPromotion vs Performance Rating:
 - Performance rating 2 shows a large proportion of employees in the range of 0 - 5 years since last promotion, up to an upper quartile of about 13 years,
 - Performance rating of 3 is dominated by employees recently promoted, with a median of about 1 year and upper quartile of 5 years,
 - Performance rating of 4 follows a similar pattern with rating 3, except that there are very few employees in the outlier range of 5 to 15 years since last promotion.
 - _**NB:** employees that have overstayed in the same grade tend to perform less._



---


#**Findings**


---
The following seven models were evaluated:

  - Models:

 - Logistic Regression

 - Decision Tree

 - Random Forest

 - Support Vector Machine

 - Artificial Neural Network (ANN - MLP)

 - K-Nearest Neighbors (KNN)

 - Naive Bayes

* Decision Tree, Random Forest and Support Vector Machine models perform best by Training Accuracy measure [scoring 100%]. However, the three models perform low in terms of Testing Accuracy measure.

* Decision Tree, Random Forest and Support Vector Machine models also show relatively higher variation when comparing their Training and Testing Accuracy score ranges.

* Artificial Neural Network (ANN) model performs best - with higher Training and Testing Accuracy scores of **98.07%** and **96.80%** respectively.

* Artificial Neural Network - Multilayer Perception (ANN - MLP) model also shows less variation between the Training and Testing Accuracy, hence is recommended as the best model for predicting Employee Performance Rating for INX Future Inc.



---


#**Challenges**


---


* Under real corporate practice, there are data privacy issues that limit the project team from intensive exploration.

* There was no interface between the data owners and the project team, hence some proper needs analysis could not be conducted.

* The dataset is a postmoterm dump. This is not sufficient to provide real-world solution where data is live.

* The solution approach is only limited to notebook demonstration and remains far from solving the business problem being encountered since appropriate enterprise-scale deployment would need feature pipeline and applicable MLOps platform.



---


# **Recommendations to Improve Employee Performance Rating**


---


* It was discovered were that the following features have a significant impact on Employee Performance Rating in their decending order:
 - EmpEnvironmentSatisfaction,
 - EmpLastSalaryHikePercent,
 - YearsSinceLastPromotion,
 - YearsWithCurrManager,
 - ExperienceYearsAtThisCompany,
 - ExperienceYearsInCurrentRole,
 - EmpWorkLifeBalance.

* The least impactful features are:
 - TrainingTimesLastYear,
 - EmpDepartment,
 - EmpJobRole,
 - Gender,
 - EmpJobSatisfaction.

* **The top 3 factors affecting employee performance are:**
 - _EmpEnvironmentSatisfaction_,
 - _EmpLastSalaryHikePercent_,
 - _YearsSinceLastPromotion_,

 **NB:**
 - INX Future Inc. must focus focus on the top critical factors affecting performance rating.

 - The company must strive to provide a better environment in order to boost performance rating.

 - The company should regularly promote employees so that they keep motivated to perform better.

 - The company should increase the salary for employees regularly since stagnant salaries likely lead to poor employee performance. Employees will also likely struggle to maintain a better worklife balance.




---


# **Results and conclusion**


---


The project aimed to build a predictive model to enable INX Future Inc. to predict employee performance rating. A datset for employee performance was provided. Feature engineering was conducted in order to come up with insightful features that impact performance rating. The following seven models were evaluated:

* Logistic Regression

* Decision Tree

* Random Forest

* Support Vector Machine

* Artificial Neural Network (ANN - MLP)

* K-Nearest Neighbors (KNN)

* Naive Bayes

**Artificial Neural Network (ANN - MLP)** proved to be the best model based on training and testing accuracy scores - **98.07% and 96.80%** respectively.

It is concluded that the company should provide a better environment as it significantly increases employee performance rating. The company should also often increase the salary of employee, as well as offer them promotion. This helps them maintain a worklife balance leading to better performance rating.



---


# **References and Acknowledgements**


---


During the course of the project, author(s) consulted several sources amongst the ones listed below:

1. [IABAC Case Study Data](http://data.iabac.org/exam/p2/data/INX_Future_Inc_Employee_Performance_CDS_Project2_Data_V1.8.xls)

2. [Automated Feature Engineering Basics](https://www.kaggle.com/code/willkoehrsen/automated-feature-engineering-basics)