Skip to content

Among the beginning steps for Data Analyis, Data Preparation plays an important role to have clean, error free, clear formatted dataset to train/test the model on.

Notifications You must be signed in to change notification settings

Tanishk-Sharma/Data-Preparation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Data Preparation

Data Peparation is an important step of any Data Science project life cycle. It acts as a precursor to the Model Planning step.

Since in the model planning step we explore relationships in the dataset, we need a good dataset to begin with.

So, in the Data Preparation step, we perform different operations on the data so that our model can easily learn from it.

Even in my work life, I always catch data feeds having problems that need to be cleared in this step before moving on like:

  • NULL values
  • Misspelled headers
  • Unreal values (out of bounds, for example, 500 m cannot be the height of a person 😆)
  • Leading/Trailing whitespace
  • Invalid values ("Random String" cannot be the price of a product.)

So. in this repo, we will deal with these issues and at the end of it we should have a good, clean dataset to build our model on! :)

About

Among the beginning steps for Data Analyis, Data Preparation plays an important role to have clean, error free, clear formatted dataset to train/test the model on.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages