# EMPLOYEE LIKELINESS TO LEAVE - FORECASTS

## EARLY-STAGES MODEL DEPLOYMENT 

This file is intended as an early-stage model's deployment. The model comes from the analysis about Salifort employee turnover.  

What this file does is allowing the user to choose the champion model (among those implemented) and to input the employee's features. It then returns that specific employee's probability to leave and whether or not they are likely to leave.  
  
The champion model is the **stacking_ensamble.pickle**. As such, it is set as the standard model. However, to make the deployment robust to future development, it accepts other model as predictors too. The available models are:  
  
- random forest: rf_model.pickle
- xgboosting: xgb_model.pickle
- random xgboosting: xgb_random.pickle
- stacking model: stacking_model.pickle (actual champion model)
- Support Vector Machines (SVM): svm_model.pickle
- Neural Network: nn_model.h5


### ---------------- SETTING UP THE ENVIRONMENT ----------------

load and import the required packages.

In [11]:
#import the needed packages 
#For data manipulation
import numpy as np
import pandas as pd
import os

# For data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

#for data encoding
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

# For data modeling
from xgboost import XGBClassifier, XGBRegressor, plot_importance, plot_tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, Lasso, LassoCV
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN
from sklearn.ensemble import StackingClassifier
from sklearn.svm import SVC #SVM models

#for Neural Network
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Input
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.callbacks import EarlyStopping
import keras_tuner as kt
from tensorflow.keras.metrics import AUC, Precision, Recall
from tensorflow.keras import Input
from tensorflow.keras.models import load_model as keras_load_model

#for PCA
from sklearn.decomposition import PCA

# For metrics
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import classification_report, accuracy_score, precision_score, \
                            recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay, \
                            roc_auc_score, roc_curve
from sklearn.tree import plot_tree
import shap
from sklearn.inspection import permutation_importance
from sklearn.metrics import precision_recall_curve #precision-recall tradeoff

# for standardization
from sklearn.preprocessing import StandardScaler

# For saving models
import pickle

define the function to select the champion model (i.e the one used for predictions).

In [12]:
#load the  model
def load_model(modelname = "stacking_model.pickle"):
    path = "/Users/lucaalbertini/Desktop/MLCourse/ML/Salifort_CaseStudy/models/"
    full_path = os.path.join(path, modelname)

    if modelname.endswith(('.h5', '.keras')):
        model = keras_load_model(full_path)
    elif modelname.endswith(('.pkl', '.pickle')):
        with open(full_path, 'rb') as file:
            model = pickle.load(file)
    else:
        raise ValueError("Unsupported model file format. Use .pkl, .pickle, .h5, or .keras")

    return model


define a function to input employee's feature and return their probability and likeliness to leave the company.

In [13]:
#function to check employee quitting probability
def employee_quitting_probability(model, threshold=0.5):
    # Input the variables
    satisfaction_level = float(input("Satisfaction level (0-1): "))
    last_evaluation = float(input("Last evaluation (0-1): "))
    number_project = int(input("Number of projects: "))
    average_monthly_hours = float(input("Average Monthly Working Hours: "))
    time_spend_company = int(input("Years spent at the company: "))
    Work_accident = bool(int(input("Did they have a work accident? (0 = no; 1 = yes): ")))
    promotion_last_5years = bool(int(input("Were they promoted in the last 5 years? (0 = no; 1 = yes): ")))
    department = input("Input employee department: ").lower()
    salary = input("Input employee's salary (low, medium, high): ").lower()

    # Map salary
    salary_mapping = {"low": 0, "medium": 1, "high": 2}
    salary_encoded = salary_mapping.get(salary, 1)  # default to medium if typo

    # Encode department
    grouped_dept_Tech = int(department in ["technical", "support", "it", "product_mng"])
    grouped_dept_admin = int(department in ["hr", "accounting", "management"])

    # Compute features
    satisfaction_gap = last_evaluation - satisfaction_level
    work_intensity = average_monthly_hours / number_project
    overworking = int(average_monthly_hours > 174)

    # Create DataFrame
    new_employee = pd.DataFrame([{
        "time_spend_company": time_spend_company,
        "work_accident": Work_accident,
        "promotion_last_5years": promotion_last_5years,
        "salary": salary_encoded,
        "grouped_dept_Tech": grouped_dept_Tech,
        "grouped_dept_admin": grouped_dept_admin,
        "work_intensity": work_intensity,
        "satisfaction_gap": satisfaction_gap,
        "overworking": overworking,
    }])

    # Predict probabilities
    if hasattr(model, "predict_proba"):
        prob = model.predict_proba(new_employee)[:, 1][0]
    elif "keras" in str(type(model)).lower():
        prob = model.predict(new_employee).flatten()[0]
    else:
        prob = model.predict(new_employee)[0]

    # Determine outcome
    if isinstance(prob, (int, float, np.floating)):
        outcome = "Leave" if prob >= threshold else "Stay"
    else:
        outcome = "Unknown"

    # Output results
    print("\nEmployee's features:")
    print(new_employee.to_string(index=False))
    print("\nProbability of resignation/being fired:")
    print(f"{prob:.2%}")
    print("\nThe employee is likely to:")
    print(outcome)

### ---------------------------------------------------------------

### -------------- PREDICT LIKELINESS TO LEAVE ----------------

Run the code chunk below and the system will ask to input the employee's features. Once done, it will return that specific employee's probability to leave along with whether or not they are likley to resign.  
  
Note. Change the model (default is stacking) by adding the model name (within " ") in the load_model function.

In [15]:
#load the model
model = load_model()
#compute employee likeliness to quit
employee_quitting_probability(model, threshold=0.3619)


Employee's features:
 time_spend_company  work_accident  promotion_last_5years  salary  grouped_dept_Tech  grouped_dept_admin  work_intensity  satisfaction_gap  overworking
                  4           True                   True       1                  1                   0           100.0              -0.2            1

Probability of resignation/being fired:
1.64%

The employee is likely to:
Stay
