Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type detection #48

Open
Olamyy opened this issue Apr 5, 2019 · 1 comment
Open

Type detection #48

Olamyy opened this issue Apr 5, 2019 · 1 comment

Comments

@Olamyy
Copy link

Olamyy commented Apr 5, 2019

Hi @amueller, I want to try my hands on the Type Detection tasks in the todo and was wondering if you could shed a little more light on it.

Am I correct to assume that we're looking for a Detection object that has methods that detects the various things listed in the list?

@amueller
Copy link
Collaborator

amueller commented Apr 5, 2019

All the type detection is done in detect_types right now.
How to store the results is a different question. For now I'd recommend sticking to the existing structure until I come up with something better.

One thing that would be quite easy to add would be adding detection for duplicate rows and duplicate columns to detect_types - or potentially to clean if that makes more sense. If two continuous features or two categorical features are perfectly correlated (even with different categories) we should detect that and mark one as "useless".
That's a bit dirty, and maybe doing it in clean would be nicer?

If one column is a monotonous transformation of another column, we might also want to warn, but maybe not drop.
It would be nice to have an option for dropping this, as for tree-based models we could drop it.

Let me know if this clarifies things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants