Data-cleaning

Model Prediction

Costa Rican Household Poverty Level Prediction]

Roadmap

I spent the last couple of months analyzing data from sensors, surveys, and logs. No matter how many charts I created, how well sophisticated the algorithms are, the results are always misleading.

Throwing a random forest at the data is the same as injecting it with a virus. A virus that has no intention other than hurting your insights as if your data is spewing garbage.

Even worse, when you show your new findings to the CEO, and Oops guess what? He/she found a flaw, something that doesn’t smell right, your discoveries don’t match their understanding about the domain — After all, they are domain experts who know better than you, you as an analyst or a developer.

Right away, the blood rushed into your face, your hands are shaken, a moment of silence, followed by, probably, an apology.

That’s not bad at all. What if your findings were taken as a guarantee, and your company ended up making a decision based on them?.

You ingested a bunch of dirty data, didn’t clean it up, and you told your company to do something with these results that turn out to be wrong. To avoid these, here is asimple but very effective way of how I clean data no matter how big it is.

Languages and Utilities Used

Python
Anaconda
Jupyter Notebook

Environments Used

Windows 10 (21H2)

Program walk-through:

Imp data:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
data-cleaning-without-leakage.ipynb		data-cleaning-without-leakage.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data-cleaning-without-leakage.ipynb

data-cleaning-without-leakage.ipynb

Repository files navigation

Data-cleaning

Model Prediction

Costa Rican Household Poverty Level Prediction]

Roadmap

Languages and Utilities Used

Environments Used

Program walk-through:

About

Releases

Packages

Languages

SodiqSrb/Data-cleaning

Folders and files

Latest commit

History

README.md

README.md

data-cleaning-without-leakage.ipynb

data-cleaning-without-leakage.ipynb

Repository files navigation

Data-cleaning

Model Prediction

Costa Rican Household Poverty Level Prediction]

Roadmap

Languages and Utilities Used

Environments Used

Program walk-through:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages