Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to project new data using existing VSP loadings? #68

Closed
ravwojdyla opened this issue Jul 6, 2022 · 13 comments
Closed

How to project new data using existing VSP loadings? #68

ravwojdyla opened this issue Jul 6, 2022 · 13 comments

Comments

@ravwojdyla
Copy link

Thank you so much for a great paper and this library!

I'm limited to python and have reimplemented VSP in python atop sklearn PCA, by adding the varimax rotation. I get the same results by following your exact implementation in python. But I have a follow up use case, what to do when I want to project new data using existing VSP loadings? This question is related to https://stats.stackexchange.com/questions/59213/how-to-compute-varimax-rotated-principal-components-in-r but in the context of VSP.

Would appreciate your help or hints.

@ravwojdyla
Copy link
Author

And to follow up, I have tried a couple of approaches without success. And I'm testing this by "fitting" VSP on X to obtain the loadings/rotation matrices, and then using those (and X) I would like to reproduce the Z without s$u.

@alexpghayes
Copy link
Collaborator

Can you share a code example? My guess is that you aren't matching the pre-processing and post-processing steps, and I can provide some pointers on how to do this.

@ravwojdyla
Copy link
Author

Hi @alexpghayes, thanks for prompt response, I appreciate that. Here's our current code as a sklearn transformer:

from typing import Any, Literal, Optional, Union

import numpy as np
import numpy.typing as npt
from sklearn.decomposition import PCA


class VSP(PCA):  # type:ignore[misc]
    """
    R VSP implementation is at: https://github.com/RoheLab/vsp
    """

    def __init__(
        self,
        n_components: Union[None, int, float, str] = None,
        *,
        copy: bool = True,
        svd_solver: str = "auto",
        tol: float = 0.0,
        iterated_power: str = "auto",
        random_state: Optional[float] = None,
        rotation: Literal["varimax", "quartimax"] = "varimax"
    ):
        """
        NOTE:
          * R tol/eps is higher by default
          * VSP will use always use svds
            https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L126
          * VSP is whitening by default?
            https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L141-L142
        """
        super().__init__(
            n_components=n_components,
            copy=copy,
            whiten=True,
            svd_solver=svd_solver,
            tol=tol,
            iterated_power=iterated_power,
            random_state=random_state,
        )
        self.rotation = rotation

    @classmethod
    def _ortho_matrix(
        cls,
        components: Any,
        method: str = "varimax",
        tol: float = 1e-6,
        max_iter: int = 100,
    ) -> npt.ArrayLike:
        """
        Returns rotation matrix.

        `components` is either components or scores. See:
        https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L141

        This is copied from sklearn.decomposition._factor_analysis._ortho_rotation
        but doesn't perform final dot product, which we will do inside fit/transform.
        """
        nrow, ncol = components.shape
        rotation_matrix = np.eye(ncol)
        var = 0

        for _ in range(max_iter):
            comp_rot = np.dot(
                components, rotation_matrix
            )  # type:ignore[no-untyped-call]
            if method == "varimax":
                tmp = comp_rot * np.transpose((comp_rot**2).sum(axis=0) / nrow)
            elif method == "quartimax":
                tmp = 0
            u, s, v = np.linalg.svd(
                np.dot(
                    components.T, comp_rot**3 - tmp
                )  # type:ignore[no-untyped-call]
            )
            rotation_matrix = np.dot(u, v)
            var_new = np.sum(s)
            if var != 0 and var_new < var * (1 + tol):
                break
            var = var_new

        return rotation_matrix

    def _fit(
        self, X: npt.ArrayLike
    ) -> tuple[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike]:
        U, S, _ = super()._fit(X)
        # NOTE: R_v is NOT normalized, as opposed to default R's varimax(rawLoadings)$rot
        R_v = self._ortho_matrix(self.components_.T, method=self.rotation, tol=self.tol)
        U_c = U[:, : self.n_components_]
        R_u = self._ortho_matrix(U_c, method=self.rotation, tol=self.tol)
        # NOTE: same as varimax(rawLoadings, normalize = F)$loadings, + sqrt(n)
        Y = np.sqrt(self.n_features_) * np.dot(
            self.components_.T, R_v
        )
        self.rot_components_ = Y.T
        self.R_v_ = R_v
        self.R_u_ = R_u
        return U, S, Y

    def fit_transform(
        self, X: npt.ArrayLike, y: Optional[npt.ArrayLike] = None
    ) -> npt.ArrayLike:
        U, S, Vt = self._fit(X)
        U = U[:, : self.n_components_]  # type:ignore
        # TODO: make_skew_positive(fa)
        return np.sqrt(self.n_samples_) * np.dot(
            U, self.R_u_
        )

    def transform(self, X: npt.ArrayLike) -> npt.ArrayLike:
        # TODO: ???
        raise NotImplementedError()

    def inverse_transform(self, X: npt.ArrayLike) -> npt.ArrayLike:
        raise NotImplementedError()

So to be clear fit_transform returns exactly the same results as your R code, but I couldn't come up with a formula for transform, such that I can fit and then project a different (or the same) x.

Here's usage on Iris:

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

iris = load_iris()["data"]

x = StandardScaler().fit_transform(iris)
# NOTE: use 1e-5 tol to match R
x_scores = VSP(2, tol=1e-5).fit_transform(x)

# would like to:
VSP(2, tol=1e-5).fit(x).transform(x)
# or `new_x`, like `.transform(new_x)`

@ravwojdyla
Copy link
Author

ravwojdyla commented Jul 6, 2022

And x_scores above will match this R code:

library(datasets)
library(vsp)

data(iris)

x = scale(data.matrix(iris[1:4]), scale = TRUE)
x_scores = vsp(x, rank=2, degree_normalize = FALSE)$Z

@alexpghayes
Copy link
Collaborator

Roughly you'll want something like

    def transform(self, X: npt.ArrayLike) -> npt.ArrayLike:
        return np.dot(super().transform(X), self.R_u_)

It looks like there is a little bit of a difference between VSP(2, tol=1e-5).fit_transform(x) and VSP(2, tol=1e-5).fit(x).transform(x) under this implementation and I'm not sure if comes down to numerical stuff or not quite understanding how super().transform() is working in full glory. The TL; DR is project onto the principle components like you usually would and then rotate. Trick is to be careful about what is scaled by S or S^{1/2} in the process. I think this implementation should be fine since you passed whiten=True to PCA() but I would be sure to test it a little before committing to it.

@ravwojdyla
Copy link
Author

@alexpghayes this is great, thank you so much! The difference is too big to be numerical error 🤔

@alexpghayes
Copy link
Collaborator

It might come from the transformation of singular values into variance explained and back in the whitening logic. Personally I found the sklearn implementation rather frustrating to read this morning, and would just roll a Python carbon copy of vsp() if it were me. May or may not be a good idea for you depending on whether you're using any PCA() methods.

@ravwojdyla
Copy link
Author

@alexpghayes I think you are right, the problem was the explained_variance_, sklearn implementation:

explained_variance_ = (S**2) / (n_samples - 1)

from https://github.com/scikit-learn/scikit-learn/blob/dfaef0c6c3aef0d00c72573728c90c1d542e2957/sklearn/decomposition/_pca.py#L532

if we adjust the denominator to n_samples_ instead of (n_samples - 1):

explained_variance_ = ((S ** 2) / self.n_samples_)[:self.n_components_]

the results are the same.

@ravwojdyla
Copy link
Author

Thanks again @alexpghayes, will close this now.

@kennyjoseph
Copy link

kennyjoseph commented Apr 11, 2023

@ravwojdyla Been looking at this thread and wondering if you'd be willing/able to share any updates to your python code here (for academic research). Only return would be that we are trying to get this to work with sparse input (afaik sklearn PCA does not handle sparse input matrices? EDIT: see here) and obviously can share back!

@ravwojdyla
Copy link
Author

ravwojdyla commented Apr 11, 2023

@kennyjoseph sure, here it is:

from typing import Any, Literal

import numpy as np
import numpy.typing as npt
from sklearn.decomposition import PCA


class VSP(PCA):  # type:ignore[misc]
    """
    R VSP implementation is at: https://github.com/RoheLab/vsp
    """

    def __init__(
        self,
        n_components: None | int | float | str = None,
        *,
        copy: bool = True,
        svd_solver: str = "auto",
        tol: float = 0.0,
        iterated_power: str = "auto",
        random_state: float | None = None,
        rotation: Literal["varimax", "quartimax"] = "varimax"
    ):
        """
        NOTE:
          * R tol/eps is higher by default
          * VSP will use always use svds
            https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L126
          * VSP is whitening by default?
            https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L141-L142
        """
        super().__init__(
            n_components=n_components,
            copy=copy,
            whiten=True,
            svd_solver=svd_solver,
            tol=tol,
            iterated_power=iterated_power,
            random_state=random_state,
        )
        self.rotation = rotation

    @classmethod
    def _ortho_matrix(
        cls,
        components: Any,
        method: str = "varimax",
        tol: float = 1e-6,
        max_iter: int = 100,
    ) -> npt.ArrayLike:
        """
        Returns rotation matrix.

        `components` is either components or scores. See:
        https://github.com/RoheLab/vsp/blob/d57d6e1858444035e1ab78fed371eb68ebdee0eb/R/vsp.R#L141

        This is copied from sklearn.decomposition._factor_analysis._ortho_rotation
        but doesn't perform final dot product, which we will do inside fit/transform.
        """
        nrow, ncol = components.shape
        rotation_matrix = np.eye(ncol)
        var = 0

        for _ in range(max_iter):
            comp_rot = np.dot(components, rotation_matrix)
            if method == "varimax":
                tmp = comp_rot * np.transpose((comp_rot**2).sum(axis=0) / nrow)
            elif method == "quartimax":
                tmp = 0
            u, s, v = np.linalg.svd(np.dot(components.T, comp_rot**3 - tmp))
            rotation_matrix = np.dot(u, v)
            var_new = np.sum(s)
            if var != 0 and var_new < var * (1 + tol):
                break
            var = var_new

        return rotation_matrix

    def _fit(
        self, X: npt.ArrayLike
    ) -> tuple[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike]:
        U, S, _ = super()._fit(X)

        explained_variance = (S**2) / self.n_samples_
        total_var = explained_variance.sum()
        explained_variance_ratio = explained_variance / total_var

        # NOTE: R_v is NOT normalized, as opposed to default R's varimax(rawLoadings)$rot
        R_v = self._ortho_matrix(self.components_.T, method=self.rotation, tol=self.tol)
        U_c = U[:, : self.n_components_]
        R_u = self._ortho_matrix(U_c, method=self.rotation, tol=self.tol)
        # NOTE: same as varimax(rawLoadings, normalize = F)$loadings, but also includes
        #       the multiplication by sqrt(n)
        Y = np.sqrt(self.n_features_) * np.dot(self.components_.T, R_v)
        self.rot_components_ = Y.T
        self.R_v_ = R_v
        self.R_u_ = R_u
        self.explained_variance_ = explained_variance[: self.n_components_]
        self.total_var_ = total_var
        self.explained_variance_ratio_ = explained_variance_ratio[: self.n_components_]
        return U, S, Y

    def fit_transform(
        self, X: npt.ArrayLike, y: npt.ArrayLike | None = None
    ) -> npt.ArrayLike:
        U, S, Vt = self._fit(X)
        U = U[:, : self.n_components_]  # type:ignore
        # TODO: make_skew_positive(fa)
        return np.sqrt(self.n_samples_) * np.dot(  # type:ignore[no-any-return]
            U, self.R_u_
        )

    def transform(self, X: npt.ArrayLike) -> npt.ArrayLike:
        return np.dot(super().transform(X), self.R_u_)  # type:ignore[no-any-return]

    def inverse_transform(self, X: npt.ArrayLike) -> npt.ArrayLike:
        raise NotImplementedError()

@kennyjoseph
Copy link

Ah awesome, thanks so much!

@ravwojdyla
Copy link
Author

@kennyjoseph np, if you have any updates or notice any issues please let us know, also would love to hear the updates about the sparse input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants