-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCA - Data order #61
Comments
By other settings name, do you mean you’re shuffling the order of the variables? Shuffling should not affect the fit quality of the overall model, but could affect the order of the parameters. It would be really helpful if you could provide a minimum example to reproduce what you observed, perhaps with one of the datasets in |
Sorry, I just realised that I made one mistake yesterday, so I have updated the example I used, please check @sachaMorin.
Every time I run the code, it gives me different crosstab results. For example, No 1.
0 | 50 | 0 | 0 | 0 No 2.
0 | 0 | 50 | 0 | 0 |
If my understanding is correct, I think if one LCA model with fixed hyperparameters can always reach the convergence after shuffling the data, then shuffling won't change the crosstab results. However, if the LCA can't reach the convergence, then shuffling did change the results. I tried n_component = 2 or 3, shuffling did not change results, once I changed it to 5, as the above example shows, it changed the results. am I correct? |
Looking at your previous results, the clusterings still look good. Each cluster captures a class (or a part of it if you have more clusters than classes). It's also possible that this is caused by numerical issues. For example, the sum of an ndarray may actually vary slightly if you shuffle the elements due to the summing order. See the following program: import numpy as np
np.random.seed(123)
a = np.random.random(100)
b = np.copy(a)
np.random.shuffle(b)
sum_a = np.sum(a)
sum_b = np.sum(b)
print(sum_a)
print(sum_b)
print(sum_a == sum_b) Output:
Given the numerous sums and means taken in the StepMix estimation, those small differences can compound over time and could potentially explain what we're seeing here. I'm not sure and would be interested in seeing how other libraries behave. |
Hi,
I have recently conducted a series of experiments, I found it is tricky that the results changed when I shuffled the data (other settings same).
I am curious that LCA should have same results, but the shuffled data may change the convergence? am I right? If we want to have the same results, we may need to change parameters of LCA.
The text was updated successfully, but these errors were encountered: