Skip to content
Branch: master
Find file History
Latest commit 68426d5 Dec 19, 2018
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
images Ran with newest commit of featuretools Aug 27, 2018
input Delete requirements.txt Dec 19, 2018
notebooks removed ipynb_checkpoints Dec 19, 2018
scripts Dask notebook ran 8000 seconds Aug 6, 2018
readme.md Update readme.md Aug 6, 2018

readme.md

Loan Repayment Prediction

A comparison of automated feature engineering using Featuretools and manual feature engineering for the Home Credit Default Risk machine learning competition currently running on Kaggle.

Notebooks

The notebooks are as follows:

  1. Manual Loan Repayment.ipynb
  2. Automated Loan Repayment.ipynb
  3. Featuretools on Dask.ipynb
  4. Semi-Automated Loan Repayment.ipynb
  5. Feature Selection.ipynb
  6. Results.ipynb

utils.py contains a number of useful helper functions and random_search.py in the scripts directory was used for the random search implementation. To generate the final feature matrix, use the Featuretools on Dask notebook or run the ft.py script. The script takes nearly a full day to run, while depending on your system, the notebook can run in a few hours.

Data

The data can be downloaded here.

To run the notebooks, place the following data files in the input directory: application_train.csv, application_test.csv, bureau.csv, bureau_balance.csv, POS_CASH_balance.csv, credit_card_balance.csv, previous_application.csv, and installments_payments.csv. The HomeCredit_columns_description.csv file may be helpful as it contains the data decscriptions.

You can’t perform that action at this time.