New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAMD implementation #16
Comments
Hey, I just got back into the project refactored all the code. FAMD is, I promise, imminent. |
Is the FAMD implementation complete by now? It seems to be a high-level wrapper around MFA treating each feature as a separate group. While this seems to be reasonable for categorical features, what does it mean for numerical features? |
I'm pretty sure the implementation is wrong. Instead of using one group per variable I should be using I group for the numerical variables and one group for the categorical ones. I'll fix this ASAP. No worries! It is a bit difficult to find reference implementations to compare with so sometimes I get things wrong. FactoMineR in R is nice but the source code is very difficult to read and there are barely any comments. |
Indeed, there are packages in R, but I have 0 knowledge of R so far, unfortunately :( I also bumped into GLRM approach, which is documented here: https://web.stanford.edu/~boyd/papers/glrm.html. The paper is long, but the main idea is that instead of eigenvector decomposition, they solve a minimisation problem with loss function being different for numerical and categorical features. They also python, Julia and Spark implementations. The native python implementation is not advised to be used on medium or large datasets (I think, they mention O(100x100), but one should look it up in the paper). But there is a python wrapper around the Julia implementation available here: https://github.com/udellgroup/pyglrm, which is claimed to be able to work on large (in-memory) datasets. I do not have hands-on experience with it, but maybe it would be useful for you. |
Okay the implementation should be good in version 0.4.5. I'll close this issue once everyone seems happy with it and once I've added some more documentation. Thanks for paper, I didn't know about this. |
Hi, what is the intended behaviour on datasets containing zero columns?
|
Hey @Arne-He, The data you provided crashes because one column only contains 0s. This causes a 0 division in the following piece of code in if self.normalize:
# Scale continuous variables to unit variance
num = X.select_dtypes(np.number).columns
normalize = lambda x: x / np.sqrt((x ** 2).sum())
X.loc[:, num] = (X.loc[:, num] - X.loc[:, num].mean()).apply(normalize, axis='rows') We should however be checking for this. I've started a |
Thanks for the quick reply and fix! |
Hi, How can we determine the variance explained by all original variable in a given famd component? |
Hey @srinikprem, This hasn't been implemented yet, I'm sorry. By the way I'm going to close this issue because it seems to be going stale. Feel free to open others if you have questions/bugs? I'm swamped at the moment but I plan to get back to |
Hi,
Any updates on the FAMD? Trying to get some staticstical analysis work done using python, but unfortunately cant find many tools. Appreciate the effort you have put into this package though!
The text was updated successfully, but these errors were encountered: