# Project: Human Resources Dataset Analysis

## Week 2: (1) Exploratory Data Analysis (EDA) And Determine Data Analysis Questions:

In the second week of this project, we focused on exploratory data analysis (EDA) to identify key questions related to employee retention. This phase aimed to determine how various factors, such as salary, job satisfaction, and overtime, impact the likelihood of employees leaving the company.


**This table includes the column names and their descriptions:**

| **#** | **Column Name**                    | **Description**                                                |
|-------|-------------------------------------|---------------------------------------------------------------|
| 1     | EmployeeID                         | Unique identifier for each employee                           |
| 2     | FirstName                          | The first name of the employee                                |
| 3     | LastName                           | The last name of the employee                                 |
| 4     | Gender                             | Gender of the employee (Male, Female, Non-binary)            |
| 5     | Age                                | Age of the employee (in years)                               |
| 6     | BusinessTravel                     | Frequency of business travel (Rarely, Occasionally, Frequent) |
| 7     | Department                         | The department where the employee works                       |
| 8     | DistanceFromHome                   | Distance from the employee's home to work (in kilometers)    |
| 9     | State                              | The state where the employee resides                          |
| 10    | Ethnicity                          | Ethnicity of the employee                                     |
| 11    | EducationField                     | Field of education of the employee (e.g., IT, Marketing)    |
| 12    | JobRole                            | Job title of the employee (e.g., Software Engineer, Sales Executive) |
| 13    | MaritalStatus                      | Marital status of the employee (e.g., Married, Single)       |
| 14    | Salary                             | Salary of the employee (in local currency)                   |
| 15    | StockOptionLevel                   | Level of stock options granted to the employee (number of shares) |
| 16    | OverTime                           | Whether the employee works overtime (Yes or No)              |
| 17    | HireDate                           | Date of hiring the employee (in date format)                 |
| 18    | Attrition                          | Whether the employee left the company (Yes or No)            |
| 19    | YearsAtCompany                     | Number of years the employee has been with the company       |
| 20    | YearsInMostRecentRole              | Number of years the employee has been in the most recent role |
| 21    | YearsSinceLastPromotion             | Number of years since the last promotion of the employee      |
| 22    | YearsWithCurrManager               | Number of years the employee has worked with the current manager |
| 23    | EducationLevel                     | Level of education (e.g., Bachelor's, Master's)              |
| 24    | PerformanceID                      | Performance evaluation identifier for the employee            |
| 25    | ReviewDate                         | Date of the last performance review for the employee          |
| 26    | TrainingOpportunitiesWithinYear    | Number of training opportunities available within the year    |
| 27    | TrainingOpportunitiesTaken         | Number of training opportunities taken                         |
| 28    | EnvironmentSatisfactionLevel       | Level of satisfaction with the work environment (Scale 1 to 5) |
| 29    | JobSatisfactionLevel               | Level of satisfaction with the job (Scale 1 to 5)            |
| 30    | RelationshipSatisfactionLevel      | Level of satisfaction with workplace relationships (Scale 1 to 5) |
| 31    | WorkLifeBalanceLevel               | Level of work-life balance (Scale 1 to 5)                     |
| 32    | SelfRatingLevel                    | Self-rating of the employee (Scale 1 to 5)                    |
| 33    | ManagerRatingLevel                 | Manager's rating of the employee (Scale 1 to 5)               |


In [1]:
# import Libraries
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import pyodbc

pd.options.display.max_rows = None

pd.options.display.max_columns = None
sns.set()


In [2]:
import pyodbc
import pandas as pd


def read_sql_query(query):
    
    server = 'DESKTOP-0CQ5N9B' 
    database = 'HR_system' 
    
    # SQL Authentication
    connection_string = (
        f"Driver={{ODBC Driver 17 for SQL Server}};" 
        f"Server={server};" 
        f"Database={database};"
        f"Trusted_Connection=yes;" 
    )

    # Creating connection
    try:
        connection = pyodbc.connect(connection_string) 
        print("Connection successful!")

        # Use cursor to execute query
        cursor = connection.cursor()
        cursor.execute(query)
        
        # Get results
        rows = cursor.fetchall()
        columns = [column[0] for column in cursor.description] 
        
        # Creating DataFrame
        df = pd.DataFrame.from_records(rows, columns=columns)

        return df

    except Exception as e:
        print(f"Error: {e}")
        connection.close()
        return None  
    
    finally:
        # close connection
        connection.close()



# Employees

In [3]:
pth= "../00-Dataset_Data_Model/"
#df_employees = pd.read_csv(f"{pth}06-All_Data_Employees.csv")

## Load Dataset of All_Data_Employees.csv (the view "FullEmployeePerformanceView")
query = """SELECT * FROM FullEmployeePerformanceView;"""

df_employees = read_sql_query(query)

df_employees.head()

Connection successful!


Unnamed: 0,EmployeeID,FirstName,LastName,Gender,Age,BusinessTravel,Department,DistanceFromHome,State,Ethnicity,EducationField,JobRole,MaritalStatus,Salary,StockOptionLevel,OverTime,HireDate,Attrition,YearsAtCompany,YearsInMostRecentRole,YearsSinceLastPromotion,YearsWithCurrManager,EducationLevel,PerformanceID,ReviewDate,TrainingOpportunitiesWithinYear,TrainingOpportunitiesTaken,EnvironmentSatisfactionLevel,JobSatisfactionLevel,RelationshipSatisfactionLevel,WorkLifeBalanceLevel,SelfRatingLevel,ManagerRatingLevel
0,001A-8F88,Christy,Jumel,Male,22,Some Travel,Technology,40,CA,White,Information Systems,Software Engineer,Married,27763.0,0,No,2021-09-05,No,1,0,1,0,Masters,,NaT,,,,,,,,
1,005C-E0FB,Fin,O'Halleghane,Non-Binary,24,Frequent Traveller,Sales,17,CA,White,Marketing,Sales Executive,Married,56155.0,1,No,2017-08-26,No,5,2,2,0,Masters,PR4067,2020-06-17,1.0,2.0,Neutral,Neutral,Dissatisfied,Dissatisfied,Exceeds Expectation,Meets Expectation
2,005C-E0FB,Fin,O'Halleghane,Non-Binary,24,Frequent Traveller,Sales,17,CA,White,Marketing,Sales Executive,Married,56155.0,1,No,2017-08-26,No,5,2,2,0,Masters,PR5070,2021-06-17,1.0,1.0,Satisfied,Satisfied,Very Satisfied,Very Satisfied,Meets Expectation,Meets Expectation
3,005C-E0FB,Fin,O'Halleghane,Non-Binary,24,Frequent Traveller,Sales,17,CA,White,Marketing,Sales Executive,Married,56155.0,1,No,2017-08-26,No,5,2,2,0,Masters,PR6165,2022-06-17,3.0,0.0,Neutral,Satisfied,Very Satisfied,Satisfied,Exceeds Expectation,Exceeds Expectation
4,00A3-2445,Wyatt,Ziehm,Male,30,Some Travel,Technology,6,CA,Black or African American,Computer Science,Machine Learning Engineer,Married,126238.0,0,No,2012-03-08,No,10,3,6,6,High School,PR1165,2016-06-19,2.0,2.0,Satisfied,Very Satisfied,Satisfied,Very Satisfied,Exceeds Expectation,Meets Expectation



## Summary statistics

1. **Do employees who live more than 20 miles away `DistanceFromHome` have a higher tendency to leave the company `Attrition`?**

**No**

2. **Are employees who have a higher level of satisfaction with the work environment `EnvironmentSatisfactionLevel` more likely to stay at the company  `Attrition`?**

No

3. **Are employees who receive more training opportunities `TrainingOpportunitiesTaken` less likely to leave the company `Attrition`?**

No

4. **Does an increase in salary `Salary` positively impact employee retention `Attrition`?**

Yes

5. **Does working overtime `OverTime` negatively affect employee retention `Attrition`?**

Yes

6. **Is there a correlation between an employee’s gender `Gender` and retention rate `Attrition`?**

Yes

7. **Are employees who frequently travel for business `BusinessTravel` more likely to leave the company `Attrition` compared to others?**

Yes

__________________________

### Results

1. The `Distance From Home` exceeding 20 miles does not have a significant impact on employee turnover.
   
2. Higher `Job Satisfaction` does not show a clear influence on employees' decision to remain with the company.

3. Providing additional `Training Opportunities` does not significantly affect employee retention.

4. An increase in `Salary` positively impacts employee retention, with higher salaries leading to a greater likelihood of employees staying.

5. `OverTime` negatively affects employee retention, contributing to a higher likelihood of turnover.

6. There is a notable relationship between `Gender` and retention rates. Female employees are less likely to leave the company compared to their male counterparts.

7. Employees who frequently travel for `Business Travel` are more likely to leave the company compared to those who travel less frequently.


### Summary

The analysis indicates that certain factors, such as `Salary` increases and `Gender`, significantly influence employee retention, with higher salaries leading to longer tenures and female employees being less likely to leave the company. However, other factors, including `Job Satisfaction`, `Distance From Home`, and `Training Opportunities`, do not show a clear or strong impact on retention. Additionally, `OverTime` and frequent `Business Travel` are associated with higher turnover rates.

While these findings provide useful insights, further in-depth analysis and verification are needed to understand better the complex dynamics between these variables and employee retention.

# END