**Import required libraries**

In [101]:
import pandas as pd
import matplotlib.pyplot as plt

**Load the dataset using built-in pd.read_csv function**

In [107]:
sample_df = pd.read_csv("general_data.csv")
sample_df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeID,Gender,JobLevel,JobRole,MaritalStatus,MonthlyIncome,NumCompaniesWorked,Over18,PercentSalaryHike,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,YearsSinceLastPromotion,YearsWithCurrManager
0,51,No,Travel_Rarely,Sales,6,2,Life Sciences,1,1,Female,1,Healthcare Representative,Married,131160,1.0,Y,11,8,0,1.0,6,1,0,0
1,31,Yes,Travel_Frequently,Research & Development,10,1,Life Sciences,1,2,Female,1,Research Scientist,Single,41890,0.0,Y,23,8,1,6.0,3,5,1,4
2,32,No,Travel_Frequently,Research & Development,17,4,Other,1,3,Male,4,Sales Executive,Married,193280,1.0,Y,15,8,3,5.0,2,5,0,3
3,38,No,Non-Travel,Research & Development,2,5,Life Sciences,1,4,Male,3,Human Resources,Married,83210,3.0,Y,11,8,3,13.0,5,8,7,5
4,32,No,Travel_Rarely,Research & Development,10,1,Medical,1,5,Male,1,Sales Executive,Single,23420,4.0,Y,12,8,2,9.0,2,6,0,4


**Convert the attrition variable which is a category type to a numerical type for better understanding of effect of attrition on other variables**

In [108]:
#The get_dummies() function is used to convert categorical variable into dummy/indicator variables
attr_df = pd.get_dummies(sample_df['Attrition'], drop_first=True)
attr_df.head()

Unnamed: 0,Yes
0,0
1,1
2,0
3,0
4,0


**Merging the above created dummy dataframe with original dataframe and removing the attrition feature from original**

In [109]:
#Merge above dataframe with original dataframe. concat function helps here to achieve this.
sample_df = pd.concat([sample_df, attr_df], axis=1)
sample_df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeID,Gender,JobLevel,JobRole,MaritalStatus,MonthlyIncome,NumCompaniesWorked,Over18,PercentSalaryHike,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,YearsSinceLastPromotion,YearsWithCurrManager,Yes
0,51,No,Travel_Rarely,Sales,6,2,Life Sciences,1,1,Female,1,Healthcare Representative,Married,131160,1.0,Y,11,8,0,1.0,6,1,0,0,0
1,31,Yes,Travel_Frequently,Research & Development,10,1,Life Sciences,1,2,Female,1,Research Scientist,Single,41890,0.0,Y,23,8,1,6.0,3,5,1,4,1
2,32,No,Travel_Frequently,Research & Development,17,4,Other,1,3,Male,4,Sales Executive,Married,193280,1.0,Y,15,8,3,5.0,2,5,0,3,0
3,38,No,Non-Travel,Research & Development,2,5,Life Sciences,1,4,Male,3,Human Resources,Married,83210,3.0,Y,11,8,3,13.0,5,8,7,5,0
4,32,No,Travel_Rarely,Research & Development,10,1,Medical,1,5,Male,1,Sales Executive,Single,23420,4.0,Y,12,8,2,9.0,2,6,0,4,0


In [110]:
#after converting the attrition column from category to numerical, we can remove the original attrition value
sample_df.drop('Attrition', inplace=True, axis=1)
sample_df.head()

Unnamed: 0,Age,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeID,Gender,JobLevel,JobRole,MaritalStatus,MonthlyIncome,NumCompaniesWorked,Over18,PercentSalaryHike,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,YearsSinceLastPromotion,YearsWithCurrManager,Yes
0,51,Travel_Rarely,Sales,6,2,Life Sciences,1,1,Female,1,Healthcare Representative,Married,131160,1.0,Y,11,8,0,1.0,6,1,0,0,0
1,31,Travel_Frequently,Research & Development,10,1,Life Sciences,1,2,Female,1,Research Scientist,Single,41890,0.0,Y,23,8,1,6.0,3,5,1,4,1
2,32,Travel_Frequently,Research & Development,17,4,Other,1,3,Male,4,Sales Executive,Married,193280,1.0,Y,15,8,3,5.0,2,5,0,3,0
3,38,Non-Travel,Research & Development,2,5,Life Sciences,1,4,Male,3,Human Resources,Married,83210,3.0,Y,11,8,3,13.0,5,8,7,5,0
4,32,Travel_Rarely,Research & Development,10,1,Medical,1,5,Male,1,Sales Executive,Single,23420,4.0,Y,12,8,2,9.0,2,6,0,4,0


**Create two new dataframes with attrition == 0(true) and attrition==1(false)**

In [112]:
#Create another dataframe whose attrition value is 1
attr_yes = sample_df[sample_df['Yes']==1]
attr_yes.head()

Unnamed: 0,Age,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeID,Gender,JobLevel,JobRole,MaritalStatus,MonthlyIncome,NumCompaniesWorked,Over18,PercentSalaryHike,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,YearsSinceLastPromotion,YearsWithCurrManager,Yes
1,31,Travel_Frequently,Research & Development,10,1,Life Sciences,1,2,Female,1,Research Scientist,Single,41890,0.0,Y,23,8,1,6.0,3,5,1,4,1
6,28,Travel_Rarely,Research & Development,11,2,Medical,1,7,Male,2,Sales Executive,Single,58130,2.0,Y,20,8,1,5.0,2,0,0,0,1
13,47,Non-Travel,Research & Development,1,1,Medical,1,14,Male,1,Research Scientist,Married,57620,1.0,Y,11,8,2,10.0,4,10,9,9,1
28,44,Travel_Frequently,Research & Development,1,2,Medical,1,29,Male,2,Research Scientist,Divorced,103330,3.0,Y,14,8,1,19.0,2,1,0,0,1
30,26,Travel_Rarely,Research & Development,4,3,Medical,1,31,Male,3,Research Scientist,Divorced,68540,2.0,Y,11,8,0,5.0,5,3,0,2,1


In [113]:
#Create another dataframe whose attrition value is 0
attr_no = sample_df[sample_df['Yes']==0]
attr_no.head()

Unnamed: 0,Age,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeID,Gender,JobLevel,JobRole,MaritalStatus,MonthlyIncome,NumCompaniesWorked,Over18,PercentSalaryHike,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,YearsAtCompany,YearsSinceLastPromotion,YearsWithCurrManager,Yes
0,51,Travel_Rarely,Sales,6,2,Life Sciences,1,1,Female,1,Healthcare Representative,Married,131160,1.0,Y,11,8,0,1.0,6,1,0,0,0
2,32,Travel_Frequently,Research & Development,17,4,Other,1,3,Male,4,Sales Executive,Married,193280,1.0,Y,15,8,3,5.0,2,5,0,3,0
3,38,Non-Travel,Research & Development,2,5,Life Sciences,1,4,Male,3,Human Resources,Married,83210,3.0,Y,11,8,3,13.0,5,8,7,5,0
4,32,Travel_Rarely,Research & Development,10,1,Medical,1,5,Male,1,Sales Executive,Single,23420,4.0,Y,12,8,2,9.0,2,6,0,4,0
5,46,Travel_Rarely,Research & Development,8,3,Life Sciences,1,6,Female,4,Research Director,Married,40710,3.0,Y,13,8,0,28.0,5,7,7,7,0


**Check the means of these new two dataframes**

In [138]:
attr_yes.mean()

Age                           33.607595
DistanceFromHome               9.012658
Education                      2.877637
EmployeeCount                  1.000000
EmployeeID                  2191.767932
JobLevel                       2.037975
MonthlyIncome              61682.616034
NumCompaniesWorked             2.936351
PercentSalaryHike             15.481013
StandardHours                  8.000000
StockOptionLevel               0.780591
TotalWorkingYears              8.255289
TrainingTimesLastYear          2.654008
YearsAtCompany                 5.130802
YearsSinceLastPromotion        1.945148
YearsWithCurrManager           2.852321
Yes                            1.000000
dtype: float64

In [139]:
attr_no.mean()

Age                           37.561233
DistanceFromHome               9.227088
Education                      2.919708
EmployeeCount                  1.000000
EmployeeID                  2208.139497
JobLevel                       2.068938
MonthlyIncome              65672.595296
NumCompaniesWorked             2.648480
PercentSalaryHike             15.157340
StandardHours                  8.000000
StockOptionLevel               0.796431
TotalWorkingYears             11.860780
TrainingTimesLastYear          2.827251
YearsAtCompany                 7.369019
YearsSinceLastPromotion        2.234388
YearsWithCurrManager           4.367397
Yes                            0.000000
dtype: float64

**From the above two tables(attr_yes.mean() and attr_no.mean()) we can frame below hypothesis statements**

### **Hypothesis 1:** <br>
The mean age of the employees who left the company is less than 36<br>

H0: $mu$ $<=$ 36 <br>
H1: $mu$ $>$ 36

### **Hypothesis 2:**<br>

The mean of employees who spent with their current manager is less than 3 years<br><br>
H0: $mu$ $<=$ 3 <br>
H1: $mu$ $>$ 3

### **Hypothesis 3:** <br>

The mean of total working years of employees who stayed in company is more than 10 years<br>

H0: $mu$ $>=$ 10 <br>
H1: $mu$ $<$ 10

### **Hypothesis 4:** <br>

The mean of years worked at XYZ company of employees who stayed in company is more than 5 years<br>

H0: $mu$ $>=$ 5 <br>
H1: $mu$ $<$ 5

### **Hypothesis 5** <br>

The mean of of employees with years since last promotion who has left the company is less than 2years.<br>

H0: $mu$ $<=$ 2 <br>
H1: $mu$ $>$ 2



### **Hypothesis 6** <br>

The mean monthly income of employees who has left the company is 61,000.<br>

H0: $mu$ $=$ 61,000 <br>
H1: $mu$ $!=$ 61,000



### **Hypothesis 7** <br>

The mean number of companies worked of employees who stayed in company is less than 2<br>

H0: $mu$ $<=$ 2 <br>
H1: $mu$ $>$ 2

### **Hypothesis 8** <br>
The median number of employees who stayed in the company is 2200<br>

H0: $mu$ $=$ 2200 <br>
H1: $mu$ $!=$ 2200



### **Hypothesis 9** <br>
The mean education of employess who stayed in the company XYZ are having bachelor's degree(Level=3)

H0: $mu$ $=$ 3 <br>
H1: $mu$ $!=$ 3


### **Hypothesis 10** <br>
The mean training time of employees who left the company XYZ is less than 2 years

H0: $mu$ $<=$ 3 <br>
H1: $mu$ $>$ 3
