# Projet HumanForYou : Prédiction de l'Attrition des Employés

Ce notebook présente l'intégralité du pipeline de Machine Learning pour prédire quels employés sont susceptibles de quitter l'entreprise.
Nous allons :
1. Charger les données nettoyées et normalisées.
2. Analyser les facteurs d'influence (pourquoi ils partent/restent).
3. Entraîner 9 modèles différents (dont XGBoost et Réseau de Neurones).
4. Comparer les performances (Recall, F1-Score).
5. Visualiser les erreurs et l'importance des variables.

Initialisation des données

In [28]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings

# Imports Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, accuracy_score, confusion_matrix

# Configuration
warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid")

## 1. Chargement et Analyse Exploratoire
Nous commençons par charger le fichier CSV et analyser les corrélations pour comprendre les facteurs positifs (départ) et négatifs (rétention).

In [36]:
def charger_donnees(chemin):
    """Charge le fichier CSV."""
    print(f" Chargement du fichier : {chemin}...")
    try:
        df = pd.read_csv(chemin)
        print(f" Données chargées : {df.shape[0]} lignes, {df.shape[1]} colonnes.")
        return df
    except FileNotFoundError:
        print(f" Erreur : Le fichier '{chemin}' est introuvable.")
        return None

def analyser_facteurs_influents(df):
    """
    Affiche les corrélations : 
    - Positives (Rouge) = Causes de départ
    - Négatives (Vert) = Raisons de rester
    """
    print("\n Analyse des facteurs d'influence (Corrélation)...")
    
    # Calcul des corrélations avec 'Attrition'
    corr = df.corr(numeric_only=True)['Attrition'].sort_values(ascending=False)
    corr = corr.drop('Attrition', errors='ignore')
    
    # Top 10 positif et négatif
    top_corr = pd.concat([corr.head(10), corr.tail(10)])
    
    # Graphique
    plt.figure(figsize=(12, 6))
    colors = ['red' if x > 0 else 'green' for x in top_corr.values]
    sns.barplot(x=top_corr.values, y=top_corr.index, palette=colors)
    plt.title("Facteurs d'influence : Rouge = Fait partir | Vert = Fait rester")
    plt.xlabel("Corrélation")
    plt.axvline(x=0, color='black', linestyle='--')
    plt.show()

XGBOOST

In [30]:
import pandas as pd
import numpy as np

# Charger le fichier CSV (assurez-vous que le fichier est dans le même dossier)
df = pd.read_csv('data/processed_hr_data_encoded_normalized.csv')

# Vérifier que tout est correct
print("Dimensions du dataset :", df.shape)
df.head()

Dimensions du dataset : (4410, 45)


Unnamed: 0,Age,Attrition,BusinessTravel,DistanceFromHome,Education,JobLevel,MonthlyIncome,NumCompaniesWorked,PercentSalaryHike,StockOptionLevel,...,JobRole_Laboratory Technician,JobRole_Manager,JobRole_Manufacturing Director,JobRole_Research Director,JobRole_Research Scientist,JobRole_Sales Executive,JobRole_Sales Representative,MaritalStatus_Divorced,MaritalStatus_Married,MaritalStatus_Single
0,0.785714,0.0,0.5,0.178571,0.25,0.0,0.637546,0.111111,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.309524,1.0,1.0,0.321429,0.0,0.0,0.167457,0.0,0.857143,0.333333,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
2,0.333333,0.0,1.0,0.571429,0.75,0.75,0.964666,0.111111,0.285714,1.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
3,0.47619,0.0,0.0,0.035714,1.0,0.5,0.385045,0.333333,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.333333,0.0,0.5,0.321429,0.0,0.0,0.070195,0.444444,0.071429,0.666667,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0


KNN

Perceptron

Random Forest