-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: fix needless ndarray creation #63
Conversation
Codecov Report
@@ Coverage Diff @@
## master #63 +/- ##
==========================================
+ Coverage 99.51% 99.76% +0.25%
==========================================
Files 3 3
Lines 413 432 +19
==========================================
+ Hits 411 431 +20
+ Misses 2 1 -1
Continue to review full report at Codecov.
|
Why not this implementation of dtype = None if isinstance(X, np.ndarray) and X.dtype.kind != "O" else tf.keras.backend.floatx()
X = check_array(X, allow_nd=True, dtype=dtype) |
I think this would not work for |
Why not this implementation? def _check_array_dtype(arr):
if not isinstance(arr, np.ndarray):
return _check_array_dtype(np.asarray(arr))
if arr.dtype.kind == "O":
return tf.keras.backend.floatx()
else:
return None # check_array won't do any casting with dtype=None |
That is a bit cleaner, the single recursion helps. Thank you. |
def _validate_data(self, X, y=None, reset=True): | ||
"""Convert y to float, regressors cannot accept int.""" | ||
if y is not None: | ||
y = check_array(y, ensure_2d=False) | ||
return super()._validate_data(X=X, y=y, reset=reset) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually be removed now, thanks to the changes in BaseWrapper._validate_data
!
dtype_y_true = np.dtype(y_true.dtype.as_numpy_dtype()) | ||
dtype_y_pred = np.dtype(y_pred.dtype.as_numpy_dtype()) | ||
dest_dtype = np.promote_types(dtype_y_pred, dtype_y_true) | ||
y_true = tf.cast(y_true, dtype=dest_dtype) | ||
y_pred = tf.cast(y_pred, dtype=dest_dtype) | ||
# y_pred will always be float32 so we cast y_true to float32 | ||
y_true = tf.cast(y_true, dtype=y_pred.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always casting to float32
works. I think that trying to match the dtypes here was overkill and probably less efficient.
"X_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"] | ||
"X_dtype", | ||
["float32", "float64", "int64", "int32", "uint8", "uint16", "object"], | ||
) | ||
@pytest.mark.parametrize( | ||
"y_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"] | ||
"y_dtype", | ||
["float32", "float64", "int64", "int32", "uint8", "uint16", "object"], | ||
) | ||
@pytest.mark.parametrize( | ||
"s_w_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"] | ||
"s_w_dtype", | ||
["float32", "float64", "int64", "int32", "uint8", "uint16", "object"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests now take a long time with all of these dtypes. I think we need to split this up into:
- Test a single dtype at a time for all inputs (i.e.
y
,X
andsw
all havefloat32
, etc.). - Test that mixed dtypes work (we can do 1 float and 1 int for each input, giving 6 tests)
I'm not sure where/how we need to be testing for run_eagerly
, but if we can reduce that parametrization as well I think that would be nice.
I'll leave this for now, but I'll probably make a separate PR to try to address this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've noticed that these tests take a long time too.
I think run_eagerly
should be tested, certainly with different input types. run_eagerly
controls if the program is compiled in Tensorflow ahead of time, and if TF can do dynamic type inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it needs to be tested against both of the tests proposed above, or can we pick the higher risk one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think run_eagerly
only needs to be tested on (2) above. But you could include one set of parameters with the same dtype just make sure it passes:
@pytest.mark.parametrize("y_dtype", ["float32", "float64", "uint8", "int16"])
@pytest.mark.parametrize("run_eagerly", [True, False])
def test_mixed_dtypes(y_dtype, run_eagerly):
n, d = 100, 10
n_classes = 4
X = np.random.uniform(size=(n, d)).astype("float32")
y = np.random.choice(n_classes, size=n).astype(y_dtype)
est = KerasClassifier(..., run_eagerly=run_eagerly)
est.fit(X, y)
est.score(X, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that. Let's do that next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way @stsievert, if you want tests to run faster, you can do:
pip install pytest-xdist
pytest -n auto
I think this is ready to merge @stsievert ? |
Yup, this looks good to me. |
As per:
@stsievert let me know if there's anything else you catch from #59 . I should have waited for you to merge, sorry about that!