Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prince MCA transformation error on new data #135

Closed
EVAUTOAI opened this issue Nov 24, 2022 · 1 comment
Closed

Prince MCA transformation error on new data #135

EVAUTOAI opened this issue Nov 24, 2022 · 1 comment

Comments

@EVAUTOAI
Copy link

I am using the newest version of prince (0.7.1), but this seems to be an issue on previous versions as well.

When I run the below code, i get no error.
mca = prince.MCA(n_components=2)
mca.fit(train_data)
x = mca.transform(train_data)

But when i try to apply the same model on test data:
y = mca.transform(test_data)

I get the below error:
_ValueError Traceback (most recent call last)
in
----> 1 y = mca.transform(test_data)
2 print(y)

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/prince/mca.py in transform(self, X)
48 if self.check_input:
49 utils.check_array(X, dtype=[str, np.number])
---> 50 return self.row_coordinates(X)
51
52 def plot_coordinates(self, X, ax=None, figsize=(6, 6), x_component=0, y_component=1,

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/prince/mca.py in row_coordinates(self, X)
36 if not isinstance(X, pd.DataFrame):
37 X = pd.DataFrame(X)
---> 38 return super().row_coordinates(pd.get_dummies(X))
39
40 def column_coordinates(self, X):

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/prince/ca.py in row_coordinates(self, X)
132
133 return pd.DataFrame(
--> 134 data=X @ sparse.diags(self.col_masses_.to_numpy() ** -0.5) @ self.V_.T,
135 index=row_names
136 )

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/scipy/sparse/base.py in rmatmul(self, other)
564 raise ValueError("Scalar operands are not allowed, "
565 "use '*' instead")
--> 566 return self.rmul(other)
567
568 ####################

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/scipy/sparse/base.py in rmul(self, other)
548 except AttributeError:
549 tr = np.asarray(other).transpose()
--> 550 return (self.transpose() * tr).transpose()
551
552 #######################

/usr/local/anaconda3/envs/upe-pipeline/lib/python3.8/site-packages/scipy/sparse/base.py in mul(self, other)
514
515 if other.shape[0] != self.shape[1]:
--> 516 raise ValueError('dimension mismatch')
517
518 result = self._mul_multivector(np.asarray(other))

ValueError: dimension mismatch_

I am not able to use the same transformation on new data, could you please help with this?

@MaxHalford
Copy link
Owner

Hello there 👋

I apologise for not answering earlier. I was not maintaining Prince anymore. However, I have just refactored the entire codebase. This refactoring should have fixed many bugs.

I don’t have time and energy to check if this fixes your issue, but there is a good chance it does. Feel free to reopen this issue if the problem persists after installing the new version — that is, version 0.8.0 and onwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants