FAMD implementation #16

thusithaC · 2017-12-17T07:36:29Z

Hi,

Any updates on the FAMD? Trying to get some staticstical analysis work done using python, but unfortunately cant find many tools. Appreciate the effort you have put into this package though!

MaxHalford · 2018-04-25T15:23:05Z

Hey, I just got back into the project refactored all the code. FAMD is, I promise, imminent.

mlisovyi · 2018-08-09T07:41:29Z

Is the FAMD implementation complete by now? It seems to be a high-level wrapper around MFA treating each feature as a separate group. While this seems to be reasonable for categorical features, what does it mean for numerical features?
PS Thanks a lot for developing this package! It is awesome to have these tools with convenient interface. I was not able to find another python package with FAMD implementation.

MaxHalford · 2018-08-09T09:13:16Z

I'm pretty sure the implementation is wrong. Instead of using one group per variable I should be using I group for the numerical variables and one group for the categorical ones. I'll fix this ASAP.

No worries! It is a bit difficult to find reference implementations to compare with so sometimes I get things wrong. FactoMineR in R is nice but the source code is very difficult to read and there are barely any comments.

mlisovyi · 2018-08-09T09:26:42Z

Indeed, there are packages in R, but I have 0 knowledge of R so far, unfortunately :(

I also bumped into GLRM approach, which is documented here: https://web.stanford.edu/~boyd/papers/glrm.html. The paper is long, but the main idea is that instead of eigenvector decomposition, they solve a minimisation problem with loss function being different for numerical and categorical features. They also python, Julia and Spark implementations. The native python implementation is not advised to be used on medium or large datasets (I think, they mention O(100x100), but one should look it up in the paper). But there is a python wrapper around the Julia implementation available here: https://github.com/udellgroup/pyglrm, which is claimed to be able to work on large (in-memory) datasets. I do not have hands-on experience with it, but maybe it would be useful for you.

MaxHalford · 2018-08-09T09:32:23Z

Okay the implementation should be good in version 0.4.5. I'll close this issue once everyone seems happy with it and once I've added some more documentation.

Thanks for paper, I didn't know about this.

Arne-He · 2018-08-19T16:36:08Z

Hi,

what is the intended behaviour on datasets containing zero columns?
If I run FAMD on a mixed dataset like the one below (based on the doku) it crashes.

...
data=[
         ['A', 'A', 'A', 2, 5, 7, 0, 3, 6, 7],
         ['A', 'A', 'A', 4, 4, 4, 0, 4, 4, 3],
         ['B', 'A', 'B', 5, 2, 1, 0, 7, 1, 1],
         ['B', 'A', 'B', 7, 2, 1, 0, 2, 2, 2],
         ['B', 'B', 'B', 3, 5, 6, 0, 2, 6, 6],
         ['B', 'B', 'A', 3, 5, 4, 0, 1, 7, 5]
     ],
...

MaxHalford · 2018-08-20T11:24:01Z

Hey @Arne-He,

The data you provided crashes because one column only contains 0s. This causes a 0 division in the following piece of code in MFA.py:

if self.normalize:
    # Scale continuous variables to unit variance
    num = X.select_dtypes(np.number).columns
    normalize = lambda x: x / np.sqrt((x ** 2).sum())
    X.loc[:, num] = (X.loc[:, num] - X.loc[:, num].mean()).apply(normalize, axis='rows')

We should however be checking for this. I've started a dev branch where it's fixed. It will be available in Prince's next release.

Arne-He · 2018-08-20T20:04:19Z

Thanks for the quick reply and fix!

srinikprem · 2018-09-20T06:48:47Z

Hi,

How can we determine the variance explained by all original variable in a given famd component?

MaxHalford · 2018-09-20T11:05:21Z

Hey @srinikprem,

This hasn't been implemented yet, I'm sorry.

By the way I'm going to close this issue because it seems to be going stale. Feel free to open others if you have questions/bugs? I'm swamped at the moment but I plan to get back to prince and implement some more stuff. Please try to be precise in your demands.

MaxHalford closed this as completed Apr 25, 2018

MaxHalford reopened this Aug 9, 2018

MaxHalford closed this as completed Sep 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAMD implementation #16

FAMD implementation #16

thusithaC commented Dec 17, 2017

MaxHalford commented Apr 25, 2018

mlisovyi commented Aug 9, 2018

MaxHalford commented Aug 9, 2018

mlisovyi commented Aug 9, 2018

MaxHalford commented Aug 9, 2018

Arne-He commented Aug 19, 2018 •

edited

MaxHalford commented Aug 20, 2018

Arne-He commented Aug 20, 2018

srinikprem commented Sep 20, 2018

MaxHalford commented Sep 20, 2018

FAMD implementation #16

FAMD implementation #16

Comments

thusithaC commented Dec 17, 2017

MaxHalford commented Apr 25, 2018

mlisovyi commented Aug 9, 2018

MaxHalford commented Aug 9, 2018

mlisovyi commented Aug 9, 2018

MaxHalford commented Aug 9, 2018

Arne-He commented Aug 19, 2018 • edited

MaxHalford commented Aug 20, 2018

Arne-He commented Aug 20, 2018

srinikprem commented Sep 20, 2018

MaxHalford commented Sep 20, 2018

Arne-He commented Aug 19, 2018 •

edited