GitHub - AlejandroNap/EmployeeAttritionML: WGU Capstone Project

Western Governors University – Computer Science Capstone

Author: Alejandro Napoles ID: 011244229 Project Type: Predictive Data Product (Machine Learning + Dashboard) Dataset: IBM HR Analytics Employee Attrition Dataset

This project demonstrates a fully functional data product that predicts the likelihood of employee attrition (whether an employee is likely to leave the company). It was developed as part of the WGU Capstone requirements for Part C.

The project consists of two main components:

HR_Attrition_Analysis.ipynb — Jupyter Notebook Handles data cleaning, feature engineering, model training, and evaluation. Trains a Logistic Regression model using IBM’s Employee Attrition dataset. Produces a reusable predictive model and scaler (.pkl files). Includes three visualization types (bar, heatmap, scatter).
app.py — Streamlit Web Application Loads the pretrained model created in the notebook (no retraining). Provides an interactive dashboard for exploring attrition risks. Allows users to input employee details and get a live prediction.

Installation and Setup

Download the Project:

git clone https://github.com/AlejandroNap/EmployeeAttritionML.git cd EmployeeAttritionML

Create a Virtual Environment:

py -m venv .venv

..venv\Scripts\activate

Install Dependencies:

pip install -r requirements.txt

Run the Application:

streamlit run app.py

The Streamlit app will open in your default web browser at: http://localhost:8501

Streamlit App Overview Page 1 – Predict Single Employee Enter attributes such as Department, Job Role, Overtime, Job Satisfaction, Age, etc. The app loads the trained model and scaler to predict attrition risk. Displays: Predicted risk score (0–1) Predicted label (Likely to Stay / Likely to Leave) Suggested HR action based on risk factors.

Page 2 – Dashboard Interactive filtering by Department, Overtime, and risk threshold. Displays: Bar chart: Attrition distribution. Heatmap: Numeric correlations. Scatter plot: Attrition Risk vs. Job Satisfaction. Includes a top-20 high-risk employee table for decision support.

Machine Learning Summary: Algorithm: Logistic Regression Scaler: StandardScaler (fit on training data) Encoding: One-Hot Encoding with pd.get_dummies(drop_first=True) Target Variable: Attrition (Yes/No -> binary 1/0) Accuracy: ≈ 0.85 on the IBM dataset Saved Artifacts: attrition_model.pkl – Trained model scaler.pkl – Feature scaler employees_with_risk.csv – Predicted probabilities for each employee

Requirements: Install all dependencies via: pip install -r requirements.txt

Notes for Evaluators:

The model was trained in HR_Attrition_Analysis.ipynb using IBM’s HR Analytics dataset. The Streamlit app (app.py) uses the pretrained Logistic Regression model to make real-time predictions. All rubric components from C1–C12 are addressed.

The project demonstrates a complete machine-learning lifecycle: data ingestion → feature engineering → model training → evaluation → deployment.

This project is for educational purposes under WGU academic use. Dataset courtesy of IBM’s public HR Analytics sample.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
models		models
reports		reports
visuals		visuals
HR_Attrition_Analysis (1).html		HR_Attrition_Analysis (1).html
HR_Attrition_Analysis.ipynb		HR_Attrition_Analysis.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages