Skip to content


Repository files navigation



LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.


Data merge and preliminary cleaning

1. Data collection, import and concatenation.ipynb

This notebook includes:

  1. Download the raw data from LendingClub
  2. Concatenate the data from each year/quarter into one dataset
  3. Delete some features with more than 27% missing data. Meanwhile, this can reduce the memory cost and improve the computing efficiency

Get to know all the features

2. Feature explore This notebook includes:

  1. Explore the meanings of all features from LendingClub. Divided the features into 2 categories: Borrower relevant and loan relevant features.
  2. Further delete a few unrelated features after better understanding the dataset.
  3. Encode the target feature, classify whether it is a good or bad loan based on the loan_status.

Missing data impute

3. Missing data imputation This notebook includes:

  1. split the training and test dataset.
  2. Impute the missing data, depending on the category of each feature

EDA (Exploratory data analysis)

4. Categorical variable encode: Explore each categorical feature, and encode if its order matters

To be continued


No description, website, or topics provided.






No releases published


No packages published