Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tricky Results - Potential Bug #60

Closed
yuanjames opened this issue Apr 11, 2024 · 4 comments
Closed

Tricky Results - Potential Bug #60

yuanjames opened this issue Apr 11, 2024 · 4 comments

Comments

@yuanjames
Copy link

yuanjames commented Apr 11, 2024

Hi,

I recently run LCA with measurement = binary, the results show there were 13 classess in total, however, I found there were 6 (i.e., classes: 1,2,4,5,6,9) classes are exactly same according to model.get_mm_df(). Then, I went to on model.predict(X), I found 1,2,4,5,9 class labels were missing, there were not any data (x) assigned to these classes. So, I manully merged them.

Also, I checked the crosstab, the forementioned classes were missing as well. The number of classes in total is identified by grid search, I assume 13 can produce better metric value, but the fact is there were only 8 classes in total.

Does anyone know the reason?

@yuanjames yuanjames changed the title Tricky Results Tricky Results - Potential Bug Apr 11, 2024
@sachaMorin
Copy link
Collaborator

Thanks for reporting this.

  1. Can you check the observations from classes 1,2,4,5,6,9? Specifically, are they identical or extremely similar?
  2. Have you tried fitting an estimator with fewer classes? I would consider setting n_components=8.
  3. Some classes never getting predicted can happen. The class prediction is an argmax over the probability of belonging to each class. You can check those probabilities directly with predict_proba.

@yuanjames
Copy link
Author

Thanks for reporting this.

  1. Can you check the observations from classes 1,2,4,5,6,9? Specifically, are they identical or extremely similar?
  2. Have you tried fitting an estimator with fewer classes? I would consider setting n_components=8.
  3. Some classes never getting predicted can happen. The class prediction is an argmax over the probability of belonging to each class. You can check those probabilities directly with predict_proba.

Hi,

  1. I could not check the observations from 1,2,4,5, and 9, because no observation is classfied with these labels. I checked observations in class 6, yes, they are identical.
  2. Yes, I tried grid search for the parameter of class number, it shows 13 is the best one. Also, I tried 8, then it only gives me 5 classes in crosstab.
  3. Thanks for your answer, I will check, much appreciated for great work, I like Stepmix.

@sachaMorin
Copy link
Collaborator

Given that the 6 classes are identical in terms of parameters, you should see very similar probabilities in predict_proba for the observations that get assigned to class 6. I suspect 6 gets predicted essentially because it's numerically slightly more likely.

What seems to be happening here is that multiple classes latch on to the same data cluster.

I would consider testing different validation metrics, including AIC or BIC to penalize unnecessarily complex models. You can also plot metrics for validation with different components (we did something similar in this tutorial). 13 components might get selected as the best fit, but you might observe an elbow at n_components < 13 and then a plateau with negligible improvements.

@sachaMorin
Copy link
Collaborator

@yuanjames are you still stuck with this? I will close, but feel free to reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants