Update `detect_problem_types` implementation #1476

bchen1116 · 2020-11-30T17:05:02Z

Updated this implementation to catch Int64 dtypes. Previously, using analyze_metadata in looking_glass, if the target data was Int64, the problem_type would be classified as multiclass as long as there were >2 unique values.

After using is_numeric_dtype:

Since we drop NaN data, classifying Boolean (nullable) as a numeric dtype is ok, since we'll catch this binary case before determining if it is regression or multiclass.

codecov · 2020-11-30T17:13:17Z

Codecov Report

Merging #1476 (833b0f1) into main (94a816d) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1476     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         223      223             
  Lines       15019    15024      +5     
=========================================
+ Hits        15012    15017      +5     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/problem_types/utils.py	`100.0% <100.0%> (ø)`
...lml/tests/problem_type_tests/test_problem_types.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94a816d...833b0f1. Read the comment docs.

jeremyliweishih

good catch! LGTM

freddyaboulton

@bchen1116 This looks good to me!

dsherry · 2020-11-30T23:29:28Z

evalml/problem_types/utils.py

@@ -46,7 +45,7 @@ def detect_problem_type(y):
        raise ValueError("Less than 2 classes detected! Target unusable for modeling")
    if num_classes == 2:
        return ProblemTypes.BINARY
-    if y.dtype in numeric_dtypes:
+    if is_numeric_dtype(y.dtype):


Got it, thanks @bchen1116 !

I think the ultimate goal is to update this (and all our utilities) to standardize to woodwork (#1229 ) and then check if the "numeric" semantic tag has been applied to the target. (@angela97lin FYI)

evalml/tests/problem_type_tests/test_problem_types.py

change implementation

334ad25

bchen1116 self-assigned this Nov 30, 2020

update release notes

6fffe29

lint

833b0f1

bchen1116 marked this pull request as ready for review November 30, 2020 18:17

bchen1116 requested review from dsherry, angela97lin, freddyaboulton, christopherbunn, eccabay and jeremyliweishih and removed request for dsherry and angela97lin November 30, 2020 18:18

jeremyliweishih approved these changes Nov 30, 2020

View reviewed changes

freddyaboulton approved these changes Nov 30, 2020

View reviewed changes

bchen1116 merged commit 314cb63 into main Nov 30, 2020

dsherry reviewed Nov 30, 2020

View reviewed changes

evalml/tests/problem_type_tests/test_problem_types.py Show resolved Hide resolved

dsherry mentioned this pull request Dec 1, 2020

Release v0.16.1 #1486

Merged

freddyaboulton deleted the bc_1469_problem_types branch May 13, 2022 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `detect_problem_types` implementation #1476

Update `detect_problem_types` implementation #1476

bchen1116 commented Nov 30, 2020 •

edited

Loading

codecov bot commented Nov 30, 2020 •

edited

Loading

jeremyliweishih left a comment

freddyaboulton left a comment

dsherry Nov 30, 2020

Update detect_problem_types implementation #1476

Update detect_problem_types implementation #1476

Conversation

bchen1116 commented Nov 30, 2020 • edited Loading

codecov bot commented Nov 30, 2020 • edited Loading

Codecov Report

jeremyliweishih left a comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

dsherry Nov 30, 2020

Choose a reason for hiding this comment

Update `detect_problem_types` implementation #1476

Update `detect_problem_types` implementation #1476

bchen1116 commented Nov 30, 2020 •

edited

Loading

codecov bot commented Nov 30, 2020 •

edited

Loading