Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to transform new data after fitting with FAMD? #56

Closed
kuhanw opened this issue Mar 14, 2019 · 12 comments
Closed

Is there a way to transform new data after fitting with FAMD? #56

kuhanw opened this issue Mar 14, 2019 · 12 comments

Comments

@kuhanw
Copy link

kuhanw commented Mar 14, 2019

Hello,

I just discovered this package and it seems very interesting. I was wondering is there a way to apply the transform function to new unseen data after calling FAMD fit? Analogous to how PCA works in sklearn.

When I try to do this I get an error:

X)
102 X = self.scaler_.transform(X)
103
--> 104 return pd.DataFrame(data=X.dot(self.V_.T), index=index)
105
106 def row_standard_coordinates(self, X):

ValueError: shapes (2,20) and (49,2) not aligned: 20 (dim 1) != 49 (dim 0)

Basically it looks like it doesn't understand there are a different number of "training examples" as opposed to when the fit occurred.

Cheers,

Kuhan

@MaxHalford
Copy link
Owner

Hey,

I'm not exactly sure what you're trying to do, could you possible provide a more complete example? Normally if you call the transform method of the FAMD object it should work as long as the number of columns is the same in the new dataset.

@kuhanw
Copy link
Author

kuhanw commented Mar 14, 2019

Hi Max,

In sklearn I could have:

a = [some data]

a0 = a[:n]
a1 = a[n:]
pca = PCA()
pca.fit(a0)
pca.transform(a1).

Basically performing the PCA on part of the data and then using it to transform another unseen portion of data. Is it possible to do this with the MCA or FAMD modules in Prince?

Cheers,

Kuhan

@MaxHalford
Copy link
Owner

It should be possible, but it looks something is going wrong... I'll look into now.

@kuhanw
Copy link
Author

kuhanw commented Mar 14, 2019

Thanks for the reply. Let us know how it goes!

Cheers,

Kuhan

@MaxHalford
Copy link
Owner

Well I think I fixed the issue for the FAMD, is that what you had a problem with? If not can you copy/paste the entire code you're having a problem with?

@kuhanw
Copy link
Author

kuhanw commented Mar 15, 2019

Hi Max,

No. See below:

X = pd.DataFrame(
data=[
['A', 'A', 'A', 2, 5, 7, 6, 3, 6, 7],
['A', 'A', 'A', 4, 4, 4, 2, 4, 4, 3],
['B', 'A', 'B', 5, 2, 1, 1, 7, 1, 1],
['B', 'A', 'B', 7, 2, 1, 2, 2, 2, 2],
['B', 'B', 'B', 3, 5, 6, 5, 2, 6, 6],
['B', 'B', 'A', 3, 5, 4, 5, 1, 7, 5]
],
columns=['E1 fruity', 'E1 woody', 'E1 coffee',
'E2 red fruit', 'E2 roasted', 'E2 vanillin', 'E2 woody',
'E3 fruity', 'E3 butter', 'E3 woody'],
index=['Wine {}'.format(i+1) for i in range(6)]
)

famd.fit(X[:4])

famd.transform(X[4:])


ValueError Traceback (most recent call last)
in ()
----> 1 famd.transform(X[4:])

c:\users\wangku\appdata\local\continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\prince\mfa.py in transform(self, X)
121 def transform(self, X):
122 """Returns the row principal coordinates of a dataset."""
--> 123 return self.row_coordinates(X)
124
125 def _row_coordinates_from_global(self, X_global):

c:\users\wangku\appdata\local\continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\prince\mfa.py in row_coordinates(self, X)
135 # Prepare input
136 X = self._prepare_input(X)
--> 137 return self._row_coordinates_from_global(self._build_X_global(X))
138
139 def row_contributions(self, X):

c:\users\wangku\appdata\local\continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\prince\mfa.py in _row_coordinates_from_global(self, X_global)
125 def _row_coordinates_from_global(self, X_global):
126 """Returns the row principal coordinates."""
--> 127 return len(X_global) ** 0.5 * super().row_coordinates(X_global)
128
129 def row_coordinates(self, X):

c:\users\wangku\appdata\local\continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\prince\pca.py in row_coordinates(self, X)
102 X = self.scaler_.transform(X)
103
--> 104 return pd.DataFrame(data=X.dot(self.V_.T), index=index)
105
106 def row_standard_coordinates(self, X):

ValueError: shapes (2,11) and (12,2) not aligned: 11 (dim 1) != 12 (dim 0)

If you fit on the first three rows rows X[:3] and try to transform the last three X[3:] it will work. Something to do with the shape?

Cheers,

Kuhan

@MaxHalford
Copy link
Owner

MaxHalford commented Mar 15, 2019

I think I know what this is due to. I'll fix it over the weekend :)

@MaxHalford
Copy link
Owner

@kuhanw sorry for the delay, can you install the latest code from GitHub and tell me if it works? It should do. You can install it by running pip install git+https://github.com/MaxHalford/Prince.

@kuhanw
Copy link
Author

kuhanw commented Mar 26, 2019

I did some quick testing and it seemed to work. I will try it on some more complicated datasets later and see if it holds. Thanks for looking into this.

Cheers,

Kuhan

@ggjuancamilo
Copy link

Hi, i'm getting the same issue

@pauloeddias
Copy link

This is also happening to me now with FAMD

@msalem7777
Copy link

This issue is back in 0.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants