[BUG] FROLS creates nan entries and then fails #191

Jacob-Stevens-Haas · 2022-06-01T21:07:08Z

While refactoring AxesArray to carry around axes labels around pysindy, I ran into a failing FROLS test test_ensemble_pdes(optimizer). I cut down the data to a size that could be posted here, but in that test, x.shape is (100, 3).

MWE

from pysindy.optimizers.frols import FROLS
import numpy as np
opt = FROLS(normalize_columns=True)
x = np.array([[ 0.1       ,  0.10836917,  0.0534757 ,  0.07025105,  0.01762627,
         0.10836917,  0.0534757 , -0.17826644,  0.13281419,  0.24178254,
        -0.26582002, -0.37734489,  0.28005519,  0.07025105,  0.06386157,
        -0.22463415,  0.16391345,  0.24497855, -0.34461872, -0.53928166,
         0.41497356,  0.06386157,  0.01762627, -0.09327816,  0.0498191 ,
         0.14082049, -0.11306455, -0.23267339,  0.15422543,  0.03857632,
         0.04649183, -0.17129227,  0.13303191,  0.1544556 , -0.27712502,
        -0.44314542,  0.33900964,  0.02227141,  0.00461738, -0.02784466,
         0.01474788,  0.05662464, -0.0340879 , -0.10003883,  0.05383173]])
y = np.array([[-0.46592807, -0.40880365]])
opt.fit(x, y)

Error message:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 opt.fit(x,y)

File ~/github/pysindy/pysindy/optimizers/base.py:171, in BaseOptimizer.fit(self, x_, y, sample_weight, **reduce_kws)
    [167](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=166)     self.coef_ = self.initial_guess
    [169](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=168) self.history_ = [self.coef_]
--> [171](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=170) self._reduce(x_normed, y, **reduce_kws)
    [172](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=171) self.ind_ = np.abs(self.coef_) > 1e-14
    [174](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=173) # Rescale coefficients to original units

File ~/github/pysindy/pysindy/optimizers/frols.py:212, in FROLS._reduce(self, x, y)
    [210](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=209) if i != 0:
    [211](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=210)     kw = self.ridge_kw or {}
--> [212](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=211)     alpha = ridge_regression(A[:i, :i], g_global[:i], self.alpha, **kw)
    [213](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=212) else:
    [214](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=213)     alpha = []

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py:451, in ridge_regression(X, y, alpha, sample_weight, solver, max_iter, tol, verbose, positive, random_state, return_n_iter, return_intercept, check_input)
    [305](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=304) def ridge_regression(
    [306](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=305)     X,
    [307](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=306)     y,
   (...)
    [319](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=318)     check_input=True,
    [320](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=319) ):
    [321](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=320)     """Solve the ridge equation by the method of normal equations.
    [322](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=321) 
    [323](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=322)     Read more in the :ref:`User Guide <ridge_regression>`.
   (...)
    [449](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=448)     This function won't compute the intercept.
    [450](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=449)     """
--> [451](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=450)     return _ridge_regression(
    [452](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=451)         X,
    [453](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=452)         y,
    [454](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=453)         alpha,
    [455](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=454)         sample_weight=sample_weight,
    [456](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=455)         solver=solver,
    [457](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=456)         max_iter=max_iter,
    [458](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=457)         tol=tol,
    [459](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=458)         verbose=verbose,
    [460](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=459)         positive=positive,
    [461](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=460)         random_state=random_state,
    [462](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=461)         return_n_iter=return_n_iter,
    [463](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=462)         return_intercept=return_intercept,
    [464](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=463)         X_scale=None,
    [465](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=464)         X_offset=None,
    [466](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=465)         check_input=check_input,
    [467](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=466)     )

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py:531, in _ridge_regression(X, y, alpha, sample_weight, solver, max_iter, tol, verbose, positive, random_state, return_n_iter, return_intercept, X_scale, X_offset, check_input)
    [529](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=528)     _accept_sparse = _get_valid_accept_sparse(sparse.issparse(X), solver)
    [530](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=529)     X = check_array(X, accept_sparse=_accept_sparse, dtype=_dtype, order="C")
--> [531](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=530)     y = check_array(y, dtype=X.dtype, ensure_2d=False, order=None)
    [532](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=531) check_consistent_length(X, y)
    [534](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=533) n_samples, n_features = X.shape

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py:800, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    [794](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=793)         raise ValueError(
    [795](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=794)             "Found array with dim %d. %s expected <= 2."
    [796](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=795)             % (array.ndim, estimator_name)
    [797](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=796)         )
    [799](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=798)     if force_all_finite:
--> [800](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=799)         _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    [802](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=801) if ensure_min_samples > 0:
    [803](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=802)     n_samples = _num_samples(array)

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py:114, in _assert_all_finite(X, allow_nan, msg_dtype)
    [107](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=106)     if (
    [108](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=107)         allow_nan
    [109](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=108)         and np.isinf(X).any()
    [110](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=109)         or not allow_nan
    [111](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=110)         and not np.isfinite(X).all()
    [112](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=111)     ):
    [113](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=112)         type_err = "infinity" if allow_nan else "NaN, infinity"
--> [114](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=113)         raise ValueError(
    [115](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=114)             msg_err.format(
    [116](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=115)                 type_err, msg_dtype if msg_dtype is not None else X.dtype
    [117](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=116)             )
    [118](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=117)         )
    [119](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=118) # for object dtype data, we only check for NaNs (GH-13254)
    [120](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=119) elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

PySINDy/Python version information:

Working in my own branch off of master, but haven't modified anything in pysindy.optimizers (and pysindy.optimizers doesn't import from outside the package, e.g. utils.)

That said, tested on 1.7 and MWE still raises error.

The text was updated successfully, but these errors were encountered:

Jacob-Stevens-Haas · 2022-06-01T21:10:52Z

It appears that line 194 of frols.py tries to orthogonalize against a column that has not been initialized yet (and is thus all zeros). Qs becomes all NaNs then, select_function() returns NaNs, and then a few lines later sklearn.linear_model.ridge_regression() triggers the error.

@akaptano , @jcallaham , looks like you guys were the ones who worked on FROLS... any ideas?

akaptano · 2022-06-02T01:36:24Z

@jcallaham is our resident expert on FROLS... I'll let him take this one if he is available.

jcallaham · 2022-06-02T12:02:10Z

Sure, I can take a look. From your MWE it seems like the problem isn't that the column hasn't been initialized yet (since _orthogonalize should do nothing on the first iteration), but that the features aren't linearly independent: x has shape (1, 45). So whichever "column" FROLS picks first, when it orthogonalizes the other features with respect to that the other features all become zero.

I'm not sure what this test does, or how the ensembling works exactly - would you expect linearly dependent features to be common for that use case? I'd think it would be unusual for standard SINDy, but we could add an exception or warning for the edge case of rank-deficient libraries.

I'm not getting any test failures on the latest master branch, and can run the example notebook that uses FROLS okay (14_cavity_flow). I also can't find the test_ensemble_pdes function in any of the tests I have... would you mind posting the full test where x is 100x3? Or permalinking to it?

Jacob-Stevens-Haas · 2022-06-02T23:47:38Z

Current test_ensemble_pdes test

The test failed while refactoring to get #185 to pass on the notebooks... It looks like I was creating ten extra columns: x had shape (100, 45), but debugging it in master shows it should've been (100, 35). I must have been passing in redundant library terms, which would explain why columns were linearly dependent.

If we want to raise a more explicit error, this version of _normed_cov() seems to work.

    def _normed_cov(self, a, b):
        with warnings.catch_warnings():
            warnings.filterwarnings("error", category=RuntimeWarning)
            try:
                return np.vdot(a, b) / np.vdot(a, a)
            except RuntimeWarning:
                raise ValueError("Trying to orthogonalize linearly dependent columns created NaNs")

But not sure whether it's better to raise it here or in _orthogonalize(). A test for this is simple:

opt = FROLS(normalize_columns=True)
x = np.array([[1., 1.]])
y = np.array([[1., 1.]])
opt.fit(x, y)

jcallaham · 2022-06-03T10:45:57Z

That makes sense. Yeah I had run into a similar issue trying to debug a related issue a while ago - I like your explicit test.

I think having it in _normed_cov also makes sense, since it's the lowest-level place where this would happen, and testing for the RuntimeWarning is cheaper than a rank test.

Jacob-Stevens-Haas mentioned this issue Jun 3, 2022

ENH: Raise explicit error when FROLS has linearly dependent columns. #193

Merged

akaptano closed this as completed in #193 Jun 5, 2022

ghost mentioned this issue Jul 26, 2022

Polynomial Library Error #233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] FROLS creates nan entries and then fails #191

[BUG] FROLS creates nan entries and then fails #191

Jacob-Stevens-Haas commented Jun 1, 2022

Jacob-Stevens-Haas commented Jun 1, 2022 •

edited

akaptano commented Jun 2, 2022

jcallaham commented Jun 2, 2022

Jacob-Stevens-Haas commented Jun 2, 2022

jcallaham commented Jun 3, 2022

[BUG] FROLS creates nan entries and then fails #191

[BUG] FROLS creates nan entries and then fails #191

Comments

Jacob-Stevens-Haas commented Jun 1, 2022

MWE

Error message:

PySINDy/Python version information:

Jacob-Stevens-Haas commented Jun 1, 2022 • edited

akaptano commented Jun 2, 2022

jcallaham commented Jun 2, 2022

Jacob-Stevens-Haas commented Jun 2, 2022

jcallaham commented Jun 3, 2022

Jacob-Stevens-Haas commented Jun 1, 2022 •

edited