Skip to content

SodiqSrb/Data-cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Data-cleaning

Model Prediction

Costa Rican Household Poverty Level Prediction]

Roadmap

I spent the last couple of months analyzing data from sensors, surveys, and logs. No matter how many charts I created, how well sophisticated the algorithms are, the results are always misleading.

Throwing a random forest at the data is the same as injecting it with a virus. A virus that has no intention other than hurting your insights as if your data is spewing garbage.

Even worse, when you show your new findings to the CEO, and Oops guess what? He/she found a flaw, something that doesn’t smell right, your discoveries don’t match their understanding about the domain — After all, they are domain experts who know better than you, you as an analyst or a developer.

Right away, the blood rushed into your face, your hands are shaken, a moment of silence, followed by, probably, an apology.

That’s not bad at all. What if your findings were taken as a guarantee, and your company ended up making a decision based on them?.

You ingested a bunch of dirty data, didn’t clean it up, and you told your company to do something with these results that turn out to be wrong. To avoid these, here is asimple but very effective way of how I clean data no matter how big it is.

Languages and Utilities Used

  • Python
  • Anaconda
  • Jupyter Notebook

Environments Used

  • Windows 10 (21H2)

Program walk-through:

Imp data:
Disk Sanitization Steps

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published