Predicting Loan Payment Defaults

Project Overview

This project aims to predict loan payment defaults using various classification algorithms. By analyzing data on past loans, the goal is to optimize predictions to minimize financial risks, ensuring robust credit decisions.

Objective

Develop and evaluate classifiers to predict defaults for 500 new clients.
Identify the most effective classifier to balance risk mitigation and operational efficiency.

Workflow

Data Exploration
- Examined variables to identify predictors of defaults.
- Eliminated redundant or non-informative features (e.g., client ID and constant categorie).
Data Preprocessing
- Imputed missing values using:
  - Median substitution.
  - missForest for enhanced accuracy.
- Transformed education into numeric values for compatibility with algorithms.
- Created new features (e.g., DE = debcarte/emploi) to improve model accuracy.
Clustering
- Identified pivotal variables (e.g., age, adresse) through clustering.
- Created boolean variables for values around these pivots.
Classifier Development
- Tested models using caret package with:
  - Decision Trees (C5.0, rpart)
  - Random Forest (rf, ranger, Rborist)
  - Neural Networks (avNNet, nnet, pcaNNet)
  - Naive Bayes
  - Support Vector Machines (svmLinear2, svmPoly, svmRadial)
Evaluation
- Metrics used:
  - AUC (Area Under Curve) for global performance.
  - Positive Predictive Value (PPV) to focus on risk reduction.
- Balanced sampling ensured unbiased evaluations.
Optimal Classifier Selection
- Chose Naive Bayes with the SansrevQ dataset as the optimal model, balancing AUC and PPV.
Prediction
- Applied the optimal classifier to predict defaults for new clients.

Key Results

Best Model: Naive Bayes on the SansrevQ dataset.
Feature Engineering:
- New variable DE significantly improved model accuracy.
- Removing revenus simplified models without compromising performance.

File

Code: Code_DataMining_project.R
Dataset: projet.csv
New Predictions Dataset: projet_new.csv

Usage

Load and preprocess data from projet.csv.
Train and evaluate classifiers using the provided R script.
Apply the optimal model to projet_new.csv for predictions.

Future Improvements

Integrate financial impact analysis to weigh false negatives and positives.
Enhance feature engineering with domain-specific insights.

Authors

Tristan Gonçalves
Pierre-François Pinelli

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Code_DataMining_project.R		Code_DataMining_project.R
README.md		README.md
projet.csv		projet.csv
projet_new.csv		projet_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Loan Payment Defaults

Project Overview

Objective

Workflow

Key Results

File

Usage

Future Improvements

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting Loan Payment Defaults

Project Overview

Objective

Workflow

Key Results

File

Usage

Future Improvements

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages