You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All the type detection is done in detect_types right now.
How to store the results is a different question. For now I'd recommend sticking to the existing structure until I come up with something better.
One thing that would be quite easy to add would be adding detection for duplicate rows and duplicate columns to detect_types - or potentially to clean if that makes more sense. If two continuous features or two categorical features are perfectly correlated (even with different categories) we should detect that and mark one as "useless".
That's a bit dirty, and maybe doing it in clean would be nicer?
If one column is a monotonous transformation of another column, we might also want to warn, but maybe not drop.
It would be nice to have an option for dropping this, as for tree-based models we could drop it.
Hi @amueller, I want to try my hands on the
Type Detection
tasks in the todo and was wondering if you could shed a little more light on it.Am I correct to assume that we're looking for a
Detection
object that has methods that detects the various things listed in the list?The text was updated successfully, but these errors were encountered: