Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 682 Bytes

File metadata and controls

13 lines (11 loc) · 682 Bytes

Data Filling and Cleaning Techniques

  1. Dropping Row/Column -> Having more than 15% or 20% missing data
  2. Measure Central Tendency -> Mean/Median
  3. Measure Central Tendency for Each Class -> Mean & Median
  4. Measure Central Tendency for Each Class -> Mode
  5. SimpleImputer -> Scikit Learn Module
  6. ColumnTransformer & Pipeline -> Scikit Learn Module
  7. KNN imputer (resulted in more accuracy than simple imputer) -> Scikit Learn Module
  8. One hot Encoding (Especially for categorical columns)
  9. Random sample Imputation

Note : You can execute the KNN imputer and random sample imputation notebooks by using the train.csv files