Lending-Club

This Project was developed as a part of STAT 542 Practical Statistical Learning at University of Illinois, Urbana-Champaign.

We have been provided with the historical loan data from Lending club . The dataset recorded categorical target variable hence posed a Supervised Classification problem. We worked on the filtered dataset that was provided to us from the staff members that contain only 30 features and 3 target classes( Default, Charged Off, Fully Paid).

Programming Language:

Python

Source

The dataset comes from the following Kaggle competition https://www.kaggle.com/wordsforthewise/lending-club

Following are the steps that I followed:

We start by reading in the train and test data into memory and setting the id field as index as it was used to uniquely identify every observation.
We treat Default and Charged Off as one class , hence I replaced all occurrence of Default with Charged Off such that we now have only 2 classes in target variable.
Next I created a function to keep track of various types of variables that are there in our dataset (Categorical, ordinal, nominal, discrete, continuous).
Next I imputed the missing value in the dataset with mean for continuous variable and mode for categorical variable.
Then I went ahead doing some transformation in the features like extracting the integer part from term , emp_length. I created an ordinal encoding dictionary for Grade and Sub Grade.
I decided to drop 'title', 'zip_code', 'emp_title' as it contained many levels and an one hot encoding would just blow up the design matrix.
I extracted year and month from the earliest_cr_line and used them as additional features.
Next for all the categorical variables left in the dataset , I did one hot encoding.
I used inter quartile range for clipping extreme values in continuous variables.
Next I encoded my target classes as 0 for Fully paid and 1 for Charged Off.
I used standard scaler to scale all my variables.
Next I decided to build 3 models out of which my first model was xgboost where I set the boosting rounds to be 70, objective function to be binary logistic regression, maximum dept of tree to be 7 and learning rate to be 0.27. These parameters were chosen by going Grid search and Cross validation on train data.
My second model was random forest with tree depth equal to 18 and no of trees equal to 80, again the parameters were chosen by doing Grid search and Cross validation on train data.
My third model was Logistic regression with default params.
Out of the 3 models , Model 1(Xgboost) seemed to work best and gave the lowest log loss.
Predictions from every model is saved to disk as per the format asked in instructions.

Results:

The evaluation metric was Log-Loss:

model1 0.448233 model2 0.454100 model3 0.458367

Running Time:

Script executed successfully in 7 minutes.

Steps to Execute:

All the modelling code is present in mymain.py . Execute this python file in a directory that contains train.csv and test.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Project3_report.docx		Project3_report.docx
Project3_report.pdf		Project3_report.pdf
README.md		README.md
eval.py		eval.py
mymain.py		mymain.py
mysubmission1.txt		mysubmission1.txt
mysubmission2.txt		mysubmission2.txt
mysubmission3.txt		mysubmission3.txt
test.csv.zip		test.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project3_report.docx

Project3_report.docx

Project3_report.pdf

Project3_report.pdf

README.md

README.md

eval.py

eval.py

mymain.py

mymain.py

mysubmission1.txt

mysubmission1.txt

mysubmission2.txt

mysubmission2.txt

mysubmission3.txt

mysubmission3.txt

test.csv.zip

test.csv.zip

Repository files navigation

Lending-Club

Programming Language:

Source

Following are the steps that I followed:

Results:

Running Time:

Steps to Execute:

About

Releases

Packages

Languages

parmarjh/Lending-Club

Folders and files

Latest commit

History

Repository files navigation

Lending-Club

Programming Language:

Source

Following are the steps that I followed:

Results:

Running Time:

Steps to Execute:

About

Resources

Stars

Watchers

Forks

Languages