This repository contains various methods/techniques for handling different sorts of data problems.
Each notebook has an explanation about the particular problem and the possible solutions. Each of the presented solutions are explanied and the pros/cons are also discussed.
Missing Values:
- Delete missing rows
- Delete features that contain more than X% of missing values
- Replace with the next value
- Replace with mean (for numerical)
- Replace with mode (for categorical)
- Replace with median (for numerical)
- Impute with KNN Imputer
- Impute with Iterative Imputer
Categorical Values:
- Ordinal Encoding
- One hot encoding
Duplicate Values:
- Remove
- Don't remove
Outliers:
- Z-score test
- Local Outlier Factor