New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make usage of data types consistent across codebase #1039
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1039 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 183 183
Lines 10143 10147 +4
=======================================
+ Hits 10134 10138 +4
Misses 9 9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Looks good! I wonder if we should create a datetime_dtypes = [np.datetime64]
too. This might be useful for future time series work.
@freddyaboulton I think that's a great suggestion! Will update to include :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I still see 'category'
showing up in some SimpleImputer
and DateTimeFeaturizer
and the numeric types in IDColumnsDataCheck
. Should those be included as well? Otherwise looks great!
@jeremyliweishih Yeah, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -815,20 +819,20 @@ def test_results_getter(mock_fit, mock_score, caplog, X_y_binary): | |||
|
|||
|
|||
@pytest.mark.parametrize("automl_type", [ProblemTypes.BINARY, ProblemTypes.MULTICLASS]) | |||
@pytest.mark.parametrize("target_type", ["categorical", "string", "bool", "float", "int"]) | |||
@pytest.mark.parametrize("target_type", numeric_and_boolean_dtypes + categorical_dtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "string" is not included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is a little confusing but previously, the "string" parametrized case was covering the default case (aka no conversion to another type), where the default target type is object. So yup, numeric_and_boolean_dtypes
contains "object" instead of "string" but since there are no checks to convert data types if target_type == object, this still covers the same case :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin thank you for cleaning this up!! I left one question on a test
Closes #1006 by cleaning up some of our calls to selecting dtypes to use what we have in place in
gen_utils
instead.