## Notebook for hypothesis testings

H0: There is no significant impact of each factor on the Resignation.<br>
H1: Each factor significantly impacts the resignation of employees.

In [3]:
import pandas as pd

df = pd.read_csv("employee_preprocess_08.csv")

df.head()

Unnamed: 0,Employee_No,Employee_Code,Title,Gender,Marital_Status,Date_Joined,Date_Resigned,Status,Inactive_Date,Employment_Category,Employment_Type,Religion,Designation,Year_of_Birth,Is_resigned,Basic Salary,Net Salary,Attendance
0,347,6,Mr,Male,Married,1993-12-08,,Active,,Staff,Permanant,Buddhist,Driver,1965,0,28500.0,34218.52,473
1,348,33,Mr,Male,Married,1995-03-14,,Active,,Staff,Permanant,Buddhist,Driver,1973,0,27200.0,59098.18,618
2,349,53,Mr,Male,Married,1988-01-27,2021-06-28,Inactive,2021-06-28,Staff,Permanant,Buddhist,Account Clerk,1974,1,14400.0,31618.54,69
3,351,77,Ms,Female,Married,1999-10-01,2022-01-31,Inactive,2022-01-31,Staff,Permanant,Catholic,Purchasing Officer,1974,1,38600.0,45578.64,140
4,352,88,Mr,Male,Married,2001-01-26,,Active,,Staff,Permanant,Buddhist,Store Keeper,1980,0,25350.0,33220.82,560


In [15]:
from scipy.stats import chi2_contingency

categorical_columns = ['Gender', 'Marital_Status', 'Employment_Category', 'Employment_Type', 'Religion', 'Designation']
for column in categorical_columns:
    contingency_table = pd.crosstab(df[column], df['Is_resigned'])
    
    # Perform chi-square test
    chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)
    
    # Print results
    print(f"Chi-square test results for {column}:")
    print("Chi-square statistic:", chi2_stat)
    print("p-value:", p_val)
    print()

Chi-square test results for Gender:
Chi-square statistic: 0.20794056771626263
p-value: 0.648386376671389

Chi-square test results for Marital_Status:
Chi-square statistic: 1.4859072313181287
p-value: 0.22285256340521536

Chi-square test results for Employment_Category:
Chi-square statistic: 40.1279521388897
p-value: 1.933418669876751e-09

Chi-square test results for Employment_Type:
Chi-square statistic: 2.310741281079861
p-value: 0.1284827639796097

Chi-square test results for Religion:
Chi-square statistic: 9.843058181653872
p-value: 0.019948406511455206

Chi-square test results for Designation:
Chi-square statistic: 187.03691360497564
p-value: 0.0005208002949986373



Only the p-value of Religion < 0.05(alpha) <br>
Therefore we have enough evidence to reject the null hypothesis of religion. <br>
So, Religion significantly impacts the resignation of employees.

In [14]:
from scipy.stats import f_oneway

df_copy = df.copy()
df_copy['Join_Year'] = pd.to_datetime(df_copy['Date_Joined']).dt.year

numerical_columns = ['Net Salary', 'Attendance', 'Join_Year', 'Year_of_Birth']
for column in numerical_columns:
    # Perform one-way ANOVA
    f_stat, p_val = f_oneway(df_copy[df_copy['Is_resigned'] == 0][column], df_copy[df_copy['Is_resigned'] == 1][column])
    
    # Print results
    print(f"One-way ANOVA results for {column}:")
    print("F-statistic:", f_stat)
    print("p-value:", p_val)
    print()

One-way ANOVA results for Net Salary:
F-statistic: 12.35082898833844
p-value: 0.0004606332619948499

One-way ANOVA results for Attendance:
F-statistic: 125.55237631928553
p-value: 1.5824113770828567e-27

One-way ANOVA results for Join_Year:
F-statistic: 39.65750353515916
p-value: 4.5342742736806017e-10

One-way ANOVA results for Year_of_Birth:
F-statistic: 1.5805726311506618
p-value: 0.20897324664807276

