#### **Import Libraries**

In [None]:
import pandas as pd
import numpy as np
from scipy import stats

#### **Import Data**

In [None]:
data = pd.read_csv('FILE PATH')
data.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


### **Understand Data**

Now, we can check to see we have any null values:

In [None]:
data.isnull().sum()

Unnamed: 0,0
Age,0
Attrition,0
BusinessTravel,0
DailyRate,0
Department,0
DistanceFromHome,0
Education,0
EducationField,0
EmployeeCount,0
EmployeeNumber,0


Then, for some columns, we can check the cardinality:

In [None]:
data['Attrition'].value_counts()

Unnamed: 0_level_0,count
Attrition,Unnamed: 1_level_1
No,1233
Yes,237


In [None]:
data['JobSatisfaction'].value_counts()

Unnamed: 0_level_0,count
JobSatisfaction,Unnamed: 1_level_1
4,459
3,442
1,289
2,280




---



### **Question**

The airport manager is looking for to know, is there any relationship between `Attrition` & `Job Satisfaction` or not?

So, in the first step, we create a contingency table or CrossTab

In [None]:
CrossTab = pd.crosstab(data['Attrition'], data['JobSatisfaction'], margins=True)
CrossTab

JobSatisfaction,1,2,3,4,All
Attrition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
No,223,234,369,407,1233
Yes,66,46,73,52,237
All,289,280,442,459,1470


For example, this table shows we have 52 employees which have `Attrition` and their `Job Satisfcation` is in level 3.

Then, we only need to focus on cross data an not the All column and row. So, we slice data:

In [None]:
Value_CrossTab = np.array([CrossTab.iloc[0][: -1].values,
                           CrossTab.iloc[1][: -1].values])
Value_CrossTab

array([[223, 234, 369, 407],
       [ 66,  46,  73,  52]])

### **Assess Hypothesis**

We have two hypothesis:

$H_0$: *The variables have relationship*

$H_1$: *The variables do NOT have relationship*

Finnaly, we apply Chi Square method to see relationship between categorical variable `Attrition` & numerical variable `Job Satisfaction`.

In [None]:
stats.chi2_contingency(Value_CrossTab)

Chi2ContingencyResult(statistic=17.505077010348, pvalue=0.0005563004510387556, dof=3, expected_freq=array([[242.40612245, 234.85714286, 370.73877551, 384.99795918],
       [ 46.59387755,  45.14285714,  71.26122449,  74.00204082]]))

Just because `p < 0.05`, we can say , there is not enough evidence to reject $H(0)$