Skip to content

Commit

Permalink
Improve the check for multicollinear columns in .covariance_matrix() (
Browse files Browse the repository at this point in the history
#658)

* Improve checking multicollinearity in covariance_matrix()

* Handle CategoricalMatrix.sandwich returning sparse

* Use _safe_toarray instead of todense
  • Loading branch information
MartinStancsicsQC committed Jul 14, 2023
1 parent 4cb5164 commit 28aad1f
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Changelog

- Added the complementary log-log (`cloglog`) link function.

**Other changes:**

- When computing the covariance matrix, check for ill-conditionedness for all types of input. Furthermore, do it in a more efficient way.

2.5.2 - 2023-06-02
------------------

Expand Down
12 changes: 6 additions & 6 deletions src/glum/_glm.py
Original file line number Diff line number Diff line change
Expand Up @@ -1490,13 +1490,13 @@ def covariance_matrix(
method="pearson",
)

if not (
sparse.issparse(X) or isinstance(X, (tm.SplitMatrix, tm.CategoricalMatrix))
if (
np.linalg.cond(_safe_toarray(X.sandwich(np.ones(X.shape[0]))))
> 1 / sys.float_info.epsilon**2
):
if np.linalg.cond(X) > 1 / sys.float_info.epsilon:
raise np.linalg.LinAlgError(
"Matrix is singular. Cannot estimate standard errors."
)
raise np.linalg.LinAlgError(
"Matrix is singular. Cannot estimate standard errors."
)

if robust or clusters is not None:
if expected_information:
Expand Down

0 comments on commit 28aad1f

Please sign in to comment.