Tricky Results - Potential Bug #60

yuanjames · 2024-04-11T10:28:35Z

Hi,

I recently run LCA with measurement = binary, the results show there were 13 classess in total, however, I found there were 6 (i.e., classes: 1,2,4,5,6,9) classes are exactly same according to model.get_mm_df(). Then, I went to on model.predict(X), I found 1,2,4,5,9 class labels were missing, there were not any data (x) assigned to these classes. So, I manully merged them.

Also, I checked the crosstab, the forementioned classes were missing as well. The number of classes in total is identified by grid search, I assume 13 can produce better metric value, but the fact is there were only 8 classes in total.

Does anyone know the reason?

sachaMorin · 2024-04-11T13:52:11Z

Thanks for reporting this.

Can you check the observations from classes 1,2,4,5,6,9? Specifically, are they identical or extremely similar?
Have you tried fitting an estimator with fewer classes? I would consider setting n_components=8.
Some classes never getting predicted can happen. The class prediction is an argmax over the probability of belonging to each class. You can check those probabilities directly with predict_proba.

yuanjames · 2024-04-11T14:03:12Z

Thanks for reporting this.

Can you check the observations from classes 1,2,4,5,6,9? Specifically, are they identical or extremely similar?

Have you tried fitting an estimator with fewer classes? I would consider setting n_components=8.

Some classes never getting predicted can happen. The class prediction is an argmax over the probability of belonging to each class. You can check those probabilities directly with predict_proba.

Hi,

I could not check the observations from 1,2,4,5, and 9, because no observation is classfied with these labels. I checked observations in class 6, yes, they are identical.
Yes, I tried grid search for the parameter of class number, it shows 13 is the best one. Also, I tried 8, then it only gives me 5 classes in crosstab.
Thanks for your answer, I will check, much appreciated for great work, I like Stepmix.

sachaMorin · 2024-04-11T14:17:02Z

Given that the 6 classes are identical in terms of parameters, you should see very similar probabilities in predict_proba for the observations that get assigned to class 6. I suspect 6 gets predicted essentially because it's numerically slightly more likely.

What seems to be happening here is that multiple classes latch on to the same data cluster.

I would consider testing different validation metrics, including AIC or BIC to penalize unnecessarily complex models. You can also plot metrics for validation with different components (we did something similar in this tutorial). 13 components might get selected as the best fit, but you might observe an elbow at n_components < 13 and then a plateau with negligible improvements.

sachaMorin · 2024-05-22T20:27:58Z

@yuanjames are you still stuck with this? I will close, but feel free to reopen if needed.

yuanjames changed the title ~~Tricky Results~~ Tricky Results - Potential Bug Apr 11, 2024

sachaMorin closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tricky Results - Potential Bug #60

Tricky Results - Potential Bug #60

yuanjames commented Apr 11, 2024 •

edited

sachaMorin commented Apr 11, 2024

yuanjames commented Apr 11, 2024

sachaMorin commented Apr 11, 2024

sachaMorin commented May 22, 2024

Tricky Results - Potential Bug #60

Tricky Results - Potential Bug #60

Comments

yuanjames commented Apr 11, 2024 • edited

sachaMorin commented Apr 11, 2024

yuanjames commented Apr 11, 2024

sachaMorin commented Apr 11, 2024

sachaMorin commented May 22, 2024

yuanjames commented Apr 11, 2024 •

edited