Capstone Project for Nanodegree in Data Science
Read blog post at: https://mauricio-jac2.medium.com/udacity-data-science-nanodegree-capstone-project-fd365c9ba059
LIME - MLI for classification models
As a part of the Data Science Nanodegree in Udacity, we are required to develop a final project using public/given relevant data, I decided to use the GiveMeSomeCredit dataset given my current role in the organization financing my studies. The results of this project are summarized on the file Capstone_project.html
The full set of files related to this repo are public and free of use.
There are 2 notebooks available here to showcase work related to the project, the first notebook Credit_EDA is an EDA of the dataset, we seek to understand the dataset and to plot our insights. After that the Second notebook aims at answering the main question, how to implement explanations at the correct level of granularity. Capstone_Project.
To recapitulate what we’ve done we:
- Used relevant Credit Risk data from a global competition
- Explored, and processed the data
- Trained a Random Forest classifier and then iterated to find the best parameters
- Displayed Variable Importance for the results
- Implemented a Machine Learning Interpretability technique to fully explain the effect of each feature on the predicted probability using LIME
The latest part seemed the most difficult. LIME is still on its early releases, it is still not fully scalable and the GUI elements from the explainer are still a bit unflexible but the power of explaining at the record level is indeed impressive.
The results are rendered on an HTML file named Capstone_Project.html
Must give credit to Kaggle for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!