Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FROLS creates nan entries and then fails #191

Closed
Jacob-Stevens-Haas opened this issue Jun 1, 2022 · 5 comments · Fixed by #193
Closed

[BUG] FROLS creates nan entries and then fails #191

Jacob-Stevens-Haas opened this issue Jun 1, 2022 · 5 comments · Fixed by #193

Comments

@Jacob-Stevens-Haas
Copy link
Collaborator

While refactoring AxesArray to carry around axes labels around pysindy, I ran into a failing FROLS test test_ensemble_pdes(optimizer). I cut down the data to a size that could be posted here, but in that test, x.shape is (100, 3).

MWE

from pysindy.optimizers.frols import FROLS
import numpy as np
opt = FROLS(normalize_columns=True)
x = np.array([[ 0.1       ,  0.10836917,  0.0534757 ,  0.07025105,  0.01762627,
         0.10836917,  0.0534757 , -0.17826644,  0.13281419,  0.24178254,
        -0.26582002, -0.37734489,  0.28005519,  0.07025105,  0.06386157,
        -0.22463415,  0.16391345,  0.24497855, -0.34461872, -0.53928166,
         0.41497356,  0.06386157,  0.01762627, -0.09327816,  0.0498191 ,
         0.14082049, -0.11306455, -0.23267339,  0.15422543,  0.03857632,
         0.04649183, -0.17129227,  0.13303191,  0.1544556 , -0.27712502,
        -0.44314542,  0.33900964,  0.02227141,  0.00461738, -0.02784466,
         0.01474788,  0.05662464, -0.0340879 , -0.10003883,  0.05383173]])
y = np.array([[-0.46592807, -0.40880365]])
opt.fit(x, y)

Error message:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 opt.fit(x,y)

File ~/github/pysindy/pysindy/optimizers/base.py:171, in BaseOptimizer.fit(self, x_, y, sample_weight, **reduce_kws)
    [167](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=166)     self.coef_ = self.initial_guess
    [169](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=168) self.history_ = [self.coef_]
--> [171](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=170) self._reduce(x_normed, y, **reduce_kws)
    [172](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=171) self.ind_ = np.abs(self.coef_) > 1e-14
    [174](file:///home/xenophon/github/pysindy/pysindy/optimizers/base.py?line=173) # Rescale coefficients to original units

File ~/github/pysindy/pysindy/optimizers/frols.py:212, in FROLS._reduce(self, x, y)
    [210](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=209) if i != 0:
    [211](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=210)     kw = self.ridge_kw or {}
--> [212](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=211)     alpha = ridge_regression(A[:i, :i], g_global[:i], self.alpha, **kw)
    [213](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=212) else:
    [214](file:///home/xenophon/github/pysindy/pysindy/optimizers/frols.py?line=213)     alpha = []

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py:451, in ridge_regression(X, y, alpha, sample_weight, solver, max_iter, tol, verbose, positive, random_state, return_n_iter, return_intercept, check_input)
    [305](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=304) def ridge_regression(
    [306](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=305)     X,
    [307](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=306)     y,
   (...)
    [319](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=318)     check_input=True,
    [320](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=319) ):
    [321](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=320)     """Solve the ridge equation by the method of normal equations.
    [322](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=321) 
    [323](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=322)     Read more in the :ref:`User Guide <ridge_regression>`.
   (...)
    [449](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=448)     This function won't compute the intercept.
    [450](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=449)     """
--> [451](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=450)     return _ridge_regression(
    [452](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=451)         X,
    [453](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=452)         y,
    [454](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=453)         alpha,
    [455](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=454)         sample_weight=sample_weight,
    [456](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=455)         solver=solver,
    [457](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=456)         max_iter=max_iter,
    [458](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=457)         tol=tol,
    [459](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=458)         verbose=verbose,
    [460](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=459)         positive=positive,
    [461](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=460)         random_state=random_state,
    [462](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=461)         return_n_iter=return_n_iter,
    [463](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=462)         return_intercept=return_intercept,
    [464](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=463)         X_scale=None,
    [465](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=464)         X_offset=None,
    [466](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=465)         check_input=check_input,
    [467](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=466)     )

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py:531, in _ridge_regression(X, y, alpha, sample_weight, solver, max_iter, tol, verbose, positive, random_state, return_n_iter, return_intercept, X_scale, X_offset, check_input)
    [529](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=528)     _accept_sparse = _get_valid_accept_sparse(sparse.issparse(X), solver)
    [530](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=529)     X = check_array(X, accept_sparse=_accept_sparse, dtype=_dtype, order="C")
--> [531](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=530)     y = check_array(y, dtype=X.dtype, ensure_2d=False, order=None)
    [532](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=531) check_consistent_length(X, y)
    [534](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py?line=533) n_samples, n_features = X.shape

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py:800, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    [794](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=793)         raise ValueError(
    [795](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=794)             "Found array with dim %d. %s expected <= 2."
    [796](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=795)             % (array.ndim, estimator_name)
    [797](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=796)         )
    [799](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=798)     if force_all_finite:
--> [800](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=799)         _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    [802](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=801) if ensure_min_samples > 0:
    [803](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=802)     n_samples = _num_samples(array)

File ~/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py:114, in _assert_all_finite(X, allow_nan, msg_dtype)
    [107](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=106)     if (
    [108](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=107)         allow_nan
    [109](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=108)         and np.isinf(X).any()
    [110](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=109)         or not allow_nan
    [111](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=110)         and not np.isfinite(X).all()
    [112](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=111)     ):
    [113](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=112)         type_err = "infinity" if allow_nan else "NaN, infinity"
--> [114](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=113)         raise ValueError(
    [115](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=114)             msg_err.format(
    [116](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=115)                 type_err, msg_dtype if msg_dtype is not None else X.dtype
    [117](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=116)             )
    [118](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=117)         )
    [119](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=118) # for object dtype data, we only check for NaNs (GH-13254)
    [120](file:///home/xenophon/github/pysindy/env/lib/python3.8/site-packages/sklearn/utils/validation.py?line=119) elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

PySINDy/Python version information:

Working in my own branch off of master, but haven't modified anything in pysindy.optimizers (and pysindy.optimizers doesn't import from outside the package, e.g. utils.)

That said, tested on 1.7 and MWE still raises error.

@Jacob-Stevens-Haas
Copy link
Collaborator Author

Jacob-Stevens-Haas commented Jun 1, 2022

It appears that line 194 of frols.py tries to orthogonalize against a column that has not been initialized yet (and is thus all zeros). Qs becomes all NaNs then, select_function() returns NaNs, and then a few lines later sklearn.linear_model.ridge_regression() triggers the error.

@akaptano , @jcallaham , looks like you guys were the ones who worked on FROLS... any ideas?

@akaptano
Copy link
Collaborator

akaptano commented Jun 2, 2022

@jcallaham is our resident expert on FROLS... I'll let him take this one if he is available.

@jcallaham
Copy link
Collaborator

Sure, I can take a look. From your MWE it seems like the problem isn't that the column hasn't been initialized yet (since _orthogonalize should do nothing on the first iteration), but that the features aren't linearly independent: x has shape (1, 45). So whichever "column" FROLS picks first, when it orthogonalizes the other features with respect to that the other features all become zero.

I'm not sure what this test does, or how the ensembling works exactly - would you expect linearly dependent features to be common for that use case? I'd think it would be unusual for standard SINDy, but we could add an exception or warning for the edge case of rank-deficient libraries.

I'm not getting any test failures on the latest master branch, and can run the example notebook that uses FROLS okay (14_cavity_flow). I also can't find the test_ensemble_pdes function in any of the tests I have... would you mind posting the full test where x is 100x3? Or permalinking to it?

@Jacob-Stevens-Haas
Copy link
Collaborator Author

Current test_ensemble_pdes test

The test failed while refactoring to get #185 to pass on the notebooks... It looks like I was creating ten extra columns: x had shape (100, 45), but debugging it in master shows it should've been (100, 35). I must have been passing in redundant library terms, which would explain why columns were linearly dependent.

If we want to raise a more explicit error, this version of _normed_cov() seems to work.

    def _normed_cov(self, a, b):
        with warnings.catch_warnings():
            warnings.filterwarnings("error", category=RuntimeWarning)
            try:
                return np.vdot(a, b) / np.vdot(a, a)
            except RuntimeWarning:
                raise ValueError("Trying to orthogonalize linearly dependent columns created NaNs")

But not sure whether it's better to raise it here or in _orthogonalize(). A test for this is simple:

opt = FROLS(normalize_columns=True)
x = np.array([[1., 1.]])
y = np.array([[1., 1.]])
opt.fit(x, y)

@jcallaham
Copy link
Collaborator

That makes sense. Yeah I had run into a similar issue trying to debug a related issue a while ago - I like your explicit test.

I think having it in _normed_cov also makes sense, since it's the lowest-level place where this would happen, and testing for the RuntimeWarning is cheaper than a rank test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants