Skip to content

FilipeGood/Data-Cleaning

Repository files navigation

Data-Cleaning

This repository contains various methods/techniques for handling different sorts of data problems.

Each notebook has an explanation about the particular problem and the possible solutions. Each of the presented solutions are explanied and the pros/cons are also discussed.

Missing Values:

  • Delete missing rows
  • Delete features that contain more than X% of missing values
  • Replace with the next value
  • Replace with mean (for numerical)
  • Replace with mode (for categorical)
  • Replace with median (for numerical)
  • Impute with KNN Imputer
  • Impute with Iterative Imputer

Categorical Values:

  • Ordinal Encoding
  • One hot encoding

Duplicate Values:

  • Remove
  • Don't remove

Outliers:

  • Z-score test
  • Local Outlier Factor

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published