Skip to content

Latest commit

 

History

History
43 lines (30 loc) · 1.91 KB

02-data-prep.md

File metadata and controls

43 lines (30 loc) · 1.91 KB

6.2 Data cleaning and preparation

Slides

Notes

In this section we clean and prepare the dataset for the model which involves the following steps:

  • Download the data from the given link.
  • Reformat categorical columns (status, home, marital, records, and job) by mapping with appropriate values.
  • Replace the maximum value of income, assests, and debt columns with NaNs.
  • Replace the NaNs in the dataframe with 0 (will be shown in the next lesson).
  • Extract only those rows in the column status who are either ok or default as value.
  • Split the data in a two-step process which finally leads to the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11.
  • Prepare target variable status by converting it from categorical to binary, where 0 represents ok and 1 represents default.
  • Finally delete the target variable from the train/val/test dataframe.

Add notes from the video (PRs are welcome)

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation