Type detection #48

Olamyy · 2019-04-05T07:11:41Z

Hi @amueller, I want to try my hands on the Type Detection tasks in the todo and was wondering if you could shed a little more light on it.

Am I correct to assume that we're looking for a Detection object that has methods that detects the various things listed in the list?

The text was updated successfully, but these errors were encountered:

amueller · 2019-04-05T15:33:35Z

All the type detection is done in detect_types right now.
How to store the results is a different question. For now I'd recommend sticking to the existing structure until I come up with something better.

One thing that would be quite easy to add would be adding detection for duplicate rows and duplicate columns to detect_types - or potentially to clean if that makes more sense. If two continuous features or two categorical features are perfectly correlated (even with different categories) we should detect that and mark one as "useless".
That's a bit dirty, and maybe doing it in clean would be nicer?

If one column is a monotonous transformation of another column, we might also want to warn, but maybe not drop.
It would be nice to have an option for dropping this, as for tree-based models we could drop it.

Let me know if this clarifies things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type detection #48

Type detection #48

Olamyy commented Apr 5, 2019

amueller commented Apr 5, 2019

Type detection #48

Type detection #48

Comments

Olamyy commented Apr 5, 2019

amueller commented Apr 5, 2019