# **Project Summary**

## **Company Background:**
INX Future Inc., a global player in analytics and automation for over 15 years, has recently been facing a noticeable dip in employee performance. This decline is starting to impact client satisfaction and delivery quality, prompting leadership to dig deeper into what’s driving these changes. Complaints have increased, and satisfaction ratings dropped by around 8%.

With performance closely tied to business outcomes, INX decided it was time to get actionable insights from its own employee data. The aim was not only to understand the root causes of performance dips but also to forecast performance potential for future hiring without negatively impacting team morale during any course corrections.

#### **Key Goals**

* Compare how departments are performing across the board

* Identify the top 3 factors that influence employee performance

* Build a predictive model to assist in hiring decisions

* Recommend practical steps to help improve employee output and engagement

#### **Why This Analysis Matters:**
* Employee attrition leads to hiring and training costs.

* Losing good talent or misjudging bad performers can hurt business.

* Data-driven decisions will help management take fair, strategic actions while preserving company culture.

## **Data Requirements**

The employee performance date of INX Future Inc. can be downloads from below link.
http://data.iabac.org/exam/p2/data/INX_Future_Inc_Employee_Performance_CDS_Project2_Data_V1.8.xls


This Excel file includes a mix of numerical and categorical features related to employees’ demographics, job roles, satisfaction levels, and performance ratings.

## **Deep Dive into the Data**

#### Dataset Overview
* 19 numerical columns

* 9 categorical columns

* Target variable: PerformanceRating (Ordinal: e.g.: 2–4)


We’re working with supervised data, so we know the outcome labels and can train models to predict them.

#### **Exploration Highlights**

##### **Univariate Observations:**
* Most employees are aged between 25 to 45 years, with an average age of around 36.

* A significant portion of the workforce lives in close proximity to the office ,likely improving punctuality and availability.

* The majority of employees rated their job environment and involvement at level 3 , indicating a neutral to moderately positive sentiment.

* A large number of employees had limited prior job experience and received salary hikes in the range of 0-12%, reflecting moderate compensation adjustments.

##### **Bivariate Insights:**
* Development and R&D teams consistently show stronger performance compared to others.

* Roles like Data Scientists and Tech Leads are performing above average.

* Long-standing manager-employee relationships may hinder fresh motivation.

* Those who received higher salary hikes tend to perform better.

* Performance appears to dip when promotion is delayed beyond 2 years.

* A good work-life balance and relationship satisfaction seem to be clear indicators of better performance.

* Interesting gender-based rating trends emerged, possibly due to workforce composition rather than bias.


##### **Multivariate Visualization Insight:**
To gain deeper insight into how Age, Hourly Rate, and Years Since Last Promotion relate to Attrition, I utilized a pairplot:


* This visual revealed distinct clusters and spread patterns between those who stayed vs. those who left.

* For instance, employees with low hourly rates and longer durations since last promotion were more likely to show up under the attrition category.

* Younger employees with less than 2 years since promotion formed the bulk of those who stayed, indicating early career motivation and recent growth as retention factors.

This multivariate view confirmed many of the bivariate patterns and added a visual intuition to attrition-linked variables , strengthening the case for targeted HR interventions.

##### **Data Preparation:**

* No missing values to worry about , a clean dataset.

* Converted all categorical fields using Label Encoding.

* Detected and addressed outliers using IQR and 3 Sigma Rule techniques based on feature distribution.

* StandardScaler was used to normalize all numerical features.

##### **Feature Engineering & Selection:**

* Checked correlation coefficients to identify impactful variables.

* Dropped EmployeeNumber (acts as a unique ID, not useful for modeling).

* Final selected features include:

 * Department

 * Job Role

 * Environment Satisfaction

 * Salary Hike Percent

 * Work-Life Balance

 * Tenure with company, manager, and in role

Top 3 performance influencers identified:

1. Environment Satisfaction

2. Last Salary Hike Percent

3. Work-Life Balance

## **Model Building & Evaluation:**

To build a reliable prediction system for employee performance, I experimented with a variety of classification algorithms,from traditional methods like Logistic Regression to more complex models like Random Forests and Artificial Neural Networks.

Since the original dataset had slight class imbalance, I first applied SMOTE (Synthetic Minority Oversampling Technique) to balance the target variable distribution before model training. The dataset was then split into a 75% training and 25% testing set.

##### **Key Observations:**
 * Logistic Regression performed reasonably well and served as a good baseline model.

 * Decision Tree initially overfit the training data. After hyperparameter tuning, generalization improved, but still underperformed slightly compared to ensemble models.

 * Random Forest (tuned) delivered excellent results , high performance with less overfitting than its untuned version.

 * Gradient Boosting showed competitive scores, nearly matching the Random Forest in both accuracy and F1 metrics.

 * SVC had balanced scores but slightly underperformed on unseen data compared to ensemble models.

 * MLP (ANN) had strong training performance, but slightly lower test scores, suggesting potential overfitting or sensitivity to parameters.

##### **Final Model Selection: Random Forest Classifier:**

 After evaluating all the models, the tuned Random Forest Classifier was selected as the final model due to its strong performance on both training and testing data:

* High generalization capability

* Robust against overfitting

* Good interpretability via feature importance

* Consistently high F1 score

This model not only performs well but also allows us to understand the weight of each feature in decision-making , making it a practical choice for business use cases like performance forecasting and recruitment filtering.

## **Recommendations to improve the employee performance :**
* **Focus on Environment:** Improve workplace atmosphere and employee satisfaction.

* **Shuffle Managers Periodically:** Consider reassigning managers every 2-3 years to avoid stagnation.

* **Reward Progress:** Timely promotions (every 1–2 years) and meaningful salary hikes boost performance.

* **Support WorkLife Balance:** Offer flexibility or wellness initiatives , it directly correlates with ratings.

* **Hiring Insight:** While recruiting for HR roles, consider gender patterns ,female candidates in some roles have shown better performance trends.

* **Tap into Silent Performers:** Some employees with mid-level satisfaction still deliver excellent performance , don’t overlook them.