Mixture model converges to wrong result (randomly) #7

hberger · 2023-02-21T09:58:02Z

Hi,
I have been trying to model an approximately gaussian mixture (image data, background + foreground intensities) using normalmixEM with 2 or 3 components.

While this seems to work generally, quite often I get results that just do not seem to make any sense in terms of the initial data distribution. In particular, the input distribution is clearly bimodal with a 30/70 split in proportions, but the resulting mixture does not fit the modes nor does the sum of the mixtures seem to fit the overall distribution well.

This problem happens for certain data sets for k=2 about 1 in 5 times and about 1 in 100-200 times for k=3. I can reproduce this behavior using specific random seeds for both k=2 and k=3. This does not seem to depend so much on the initial data as removing data points or subsampling from 100k to 1k data points does not change the behavior. Removing the long tail to the right also does not change anything. I tried to play around with initial lambda values but that does not seem to change much.

Tested in mixtools 1.2.0 on R 4.1.2 (openblas) and mixtools 2.0 on R 4.2.2 (standard R BLAS), both on Linux.

To reproduce (example data have been attached below):

library(mixtools)

#d = readRDS("100k_vector_for_normalmixEM.rds")
# 1k subsample from 100k data
d = readRDS("1k_vector_for_normalmixEM.rds")

hist(d, 100)

# k=2; problem occurs about 1 every 5 runs without setting random seed
set.seed(1238)
mixmdl = normalmixEM(d, k=2)
summary(mixmdl)
plot(mixmdl, which=2)

# k=3; much less frequent without setting random seed
set.seed(10319)
mixmdl = normalmixEM(d, k=3)
summary(mixmdl)
plot(mixmdl, which=2)

Results from the 100k data set (1k looks similar) for k=2:

summary of normalmixEM object:
          comp 1     comp 2
lambda 0.8223187 0.17768126
mu     0.0119736 0.01614824
sigma  0.0050708 0.00156856
loglik at estimate:  391868.6

Output from a 'good' run (100k, k=2):

summary of normalmixEM object:
           comp 1     comp 2
lambda 0.29123448 0.70876552
mu     0.00687830 0.01511378
sigma  0.00119344 0.00370195
loglik at estimate:  402918.3

example_data.zip

The text was updated successfully, but these errors were encountered:

hberger · 2023-09-21T07:05:17Z

Just an additional comment: I can fix the problem when using custom initialization values derived from k-means on the data:

km <- kmeans(d, 2)
mu_km = km$centers[, 1]
sd_km = tapply(d, km$cluster, sd)
mixmdl = normalmixEM(d, k=2, fast=T, mu = mu_km, sigma = sd_km)

This leads to stable prediction of the correct population parameters.

hberger mentioned this issue Sep 21, 2023

Errror with normalmixEM and specific random seeds #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixture model converges to wrong result (randomly) #7

Mixture model converges to wrong result (randomly) #7

hberger commented Feb 21, 2023 •

edited

Loading

hberger commented Sep 21, 2023

Mixture model converges to wrong result (randomly) #7

Mixture model converges to wrong result (randomly) #7

Comments

hberger commented Feb 21, 2023 • edited Loading

hberger commented Sep 21, 2023

hberger commented Feb 21, 2023 •

edited

Loading