LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.
1. Data collection, import and concatenation.ipynb
This notebook includes:
- Download the raw data from LendingClub
- Concatenate the data from each year/quarter into one dataset
- Delete some features with more than 27% missing data. Meanwhile, this can reduce the memory cost and improve the computing efficiency
2. Feature explore This notebook includes:
- Explore the meanings of all features from LendingClub. Divided the features into 2 categories: Borrower relevant and loan relevant features.
- Further delete a few unrelated features after better understanding the dataset.
- Encode the target feature, classify whether it is a good or bad loan based on the loan_status.
3. Missing data imputation This notebook includes:
- split the training and test dataset.
- Impute the missing data, depending on the category of each feature
4. Categorical variable encode: Explore each categorical feature, and encode if its order matters