Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update detect_problem_types implementation #1476

Merged
merged 3 commits into from
Nov 30, 2020
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Nov 30, 2020

fix #1469

Updated this implementation to catch Int64 dtypes. Previously, using analyze_metadata in looking_glass, if the target data was Int64, the problem_type would be classified as multiclass as long as there were >2 unique values.

After using is_numeric_dtype:
image

Since we drop NaN data, classifying Boolean (nullable) as a numeric dtype is ok, since we'll catch this binary case before determining if it is regression or multiclass.

@bchen1116 bchen1116 self-assigned this Nov 30, 2020
@codecov
Copy link

codecov bot commented Nov 30, 2020

Codecov Report

Merging #1476 (833b0f1) into main (94a816d) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1476     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         223      223             
  Lines       15019    15024      +5     
=========================================
+ Hits        15012    15017      +5     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/problem_types/utils.py 100.0% <100.0%> (ø)
...lml/tests/problem_type_tests/test_problem_types.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94a816d...833b0f1. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review November 30, 2020 18:17
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! LGTM

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This looks good to me!

@bchen1116 bchen1116 merged commit 314cb63 into main Nov 30, 2020
@@ -46,7 +45,7 @@ def detect_problem_type(y):
raise ValueError("Less than 2 classes detected! Target unusable for modeling")
if num_classes == 2:
return ProblemTypes.BINARY
if y.dtype in numeric_dtypes:
if is_numeric_dtype(y.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks @bchen1116 !

I think the ultimate goal is to update this (and all our utilities) to standardize to woodwork (#1229 ) and then check if the "numeric" semantic tag has been applied to the target. (@angela97lin FYI)

@dsherry dsherry mentioned this pull request Dec 1, 2020
@freddyaboulton freddyaboulton deleted the bc_1469_problem_types branch May 13, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update numeric_dtypes
4 participants