MAINT: fix needless ndarray creation #63

adriangb · 2020-08-17T21:25:57Z

As per:

@stsievert let me know if there's anything else you catch from #59 . I should have waited for you to merge, sorry about that!

codecov-commenter · 2020-08-17T21:32:20Z

Codecov Report

Merging #63 into master will increase coverage by 0.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   99.51%   99.76%   +0.25%     
==========================================
  Files           3        3              
  Lines         413      432      +19     
==========================================
+ Hits          411      431      +20     
+ Misses          2        1       -1

Impacted Files	Coverage Δ
scikeras/_utils.py	`98.70% <100.00%> (+0.03%)`	⬆️
scikeras/wrappers.py	`100.00% <100.00%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e616c1...637fc04. Read the comment docs.

stsievert · 2020-08-17T21:39:12Z

Why not this implementation of get_dtype? If I'm correct, it the same features expressed in #59 (comment)

dtype = None if isinstance(X, np.ndarray) and X.dtype.kind != "O" else tf.keras.backend.floatx()
X = check_array(X, allow_nd=True, dtype=dtype)

adriangb · 2020-08-17T21:56:23Z

Why not this implementation of get_dtype? If I'm correct, it the same features expressed in #59 (comment)
dtype = None if isinstance(X, np.ndarray) and X.dtype.kind != "O" else tf.keras.backend.floatx()
X = check_array(X, allow_nd=True, dtype=dtype)

I think this would not work for X = [np.array([0], dtype=np.int32)] (i.e. list of arrays)? dtype would be tf.keras.backend.floatx() when it should be int32

stsievert · 2020-08-18T02:01:59Z

Why not this implementation?

def _check_array_dtype(arr):
    if not isinstance(arr, np.ndarray):
        return _check_array_dtype(np.asarray(arr))
    
    if arr.dtype.kind == "O":
        return tf.keras.backend.floatx()
    else:
        return None  # check_array won't do any casting with dtype=None

adriangb · 2020-08-18T05:20:40Z

Why not this implementation?

That is a bit cleaner, the single recursion helps. Thank you.

adriangb · 2020-08-18T05:21:49Z

scikeras/wrappers.py

-    def _validate_data(self, X, y=None, reset=True):
-        """Convert y to float, regressors cannot accept int."""
-        if y is not None:
-            y = check_array(y, ensure_2d=False)
-        return super()._validate_data(X=X, y=y, reset=reset)
-


This can actually be removed now, thanks to the changes in BaseWrapper._validate_data!

adriangb · 2020-08-18T05:22:54Z

scikeras/wrappers.py

-        dtype_y_true = np.dtype(y_true.dtype.as_numpy_dtype())
-        dtype_y_pred = np.dtype(y_pred.dtype.as_numpy_dtype())
-        dest_dtype = np.promote_types(dtype_y_pred, dtype_y_true)
-        y_true = tf.cast(y_true, dtype=dest_dtype)
-        y_pred = tf.cast(y_pred, dtype=dest_dtype)
+        # y_pred will always be float32 so we cast y_true to float32
+        y_true = tf.cast(y_true, dtype=y_pred.dtype)


Always casting to float32 works. I think that trying to match the dtypes here was overkill and probably less efficient.

adriangb · 2020-08-18T05:27:58Z

tests/test_input_outputs.py

-    "X_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"]
+    "X_dtype",
+    ["float32", "float64", "int64", "int32", "uint8", "uint16", "object"],
 )
 @pytest.mark.parametrize(
-    "y_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"]
+    "y_dtype",
+    ["float32", "float64", "int64", "int32", "uint8", "uint16", "object"],
 )
 @pytest.mark.parametrize(
-    "s_w_dtype", ["float32", "float64", "int64", "int32", "uint8", "uint16"]
+    "s_w_dtype",
+    ["float32", "float64", "int64", "int32", "uint8", "uint16", "object"],


These tests now take a long time with all of these dtypes. I think we need to split this up into:

Test a single dtype at a time for all inputs (i.e. y, X and sw all have float32, etc.).

Test that mixed dtypes work (we can do 1 float and 1 int for each input, giving 6 tests)

I'm not sure where/how we need to be testing for run_eagerly, but if we can reduce that parametrization as well I think that would be nice.

I'll leave this for now, but I'll probably make a separate PR to try to address this

Yeah, I've noticed that these tests take a long time too.

I think run_eagerly should be tested, certainly with different input types. run_eagerly controls if the program is compiled in Tensorflow ahead of time, and if TF can do dynamic type inference.

Do you think it needs to be tested against both of the tests proposed above, or can we pick the higher risk one?

I think run_eagerly only needs to be tested on (2) above. But you could include one set of parameters with the same dtype just make sure it passes:

@pytest.mark.parametrize("y_dtype", ["float32", "float64", "uint8", "int16"]) @pytest.mark.parametrize("run_eagerly", [True, False]) def test_mixed_dtypes(y_dtype, run_eagerly): n, d = 100, 10 n_classes = 4 X = np.random.uniform(size=(n, d)).astype("float32") y = np.random.choice(n_classes, size=n).astype(y_dtype) est = KerasClassifier(..., run_eagerly=run_eagerly) est.fit(X, y) est.score(X, y)

I like that. Let's do that next PR.

By the way @stsievert, if you want tests to run faster, you can do:

pip install pytest-xdist pytest -n auto

adriangb · 2020-08-18T14:10:39Z

I think this is ready to merge @stsievert ?

stsievert · 2020-08-18T14:12:13Z

Yup, this looks good to me.

adriangb added 2 commits August 17, 2020 16:02

remove array creation

e76f74d

add comment

bebca5b

improve dtype checker

99ca8fe

adriangb commented Aug 18, 2020

View reviewed changes

remove debug code

637fc04

adriangb merged commit 6156c29 into master Aug 18, 2020

adriangb deleted the hotfix/fix-needless-ndarray-creation branch August 18, 2020 14:13

stsievert mentioned this pull request Aug 27, 2020

Requirements for SciKeras v0.2.0 #68

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: fix needless ndarray creation #63

MAINT: fix needless ndarray creation #63

adriangb commented Aug 17, 2020

codecov-commenter commented Aug 17, 2020 •

edited

Loading

stsievert commented Aug 17, 2020

adriangb commented Aug 17, 2020 •

edited

Loading

stsievert commented Aug 18, 2020 •

edited

Loading

adriangb commented Aug 18, 2020

adriangb Aug 18, 2020

adriangb Aug 18, 2020

adriangb Aug 18, 2020 •

edited

Loading

stsievert Aug 18, 2020

adriangb Aug 18, 2020

stsievert Aug 18, 2020

adriangb Aug 18, 2020

adriangb Aug 18, 2020

adriangb commented Aug 18, 2020

stsievert commented Aug 18, 2020

MAINT: fix needless ndarray creation #63

MAINT: fix needless ndarray creation #63

Conversation

adriangb commented Aug 17, 2020

codecov-commenter commented Aug 17, 2020 • edited Loading

Codecov Report

stsievert commented Aug 17, 2020

adriangb commented Aug 17, 2020 • edited Loading

stsievert commented Aug 18, 2020 • edited Loading

adriangb commented Aug 18, 2020

adriangb Aug 18, 2020

Choose a reason for hiding this comment

adriangb Aug 18, 2020

Choose a reason for hiding this comment

adriangb Aug 18, 2020 • edited Loading

Choose a reason for hiding this comment

stsievert Aug 18, 2020

Choose a reason for hiding this comment

adriangb Aug 18, 2020

Choose a reason for hiding this comment

stsievert Aug 18, 2020

Choose a reason for hiding this comment

adriangb Aug 18, 2020

Choose a reason for hiding this comment

adriangb Aug 18, 2020

Choose a reason for hiding this comment

adriangb commented Aug 18, 2020

stsievert commented Aug 18, 2020

codecov-commenter commented Aug 17, 2020 •

edited

Loading

adriangb commented Aug 17, 2020 •

edited

Loading

stsievert commented Aug 18, 2020 •

edited

Loading

adriangb Aug 18, 2020 •

edited

Loading