Comparison with kmeans clustering #23

kadyb · 2023-02-07T12:34:07Z

Have you tried comparing supercells with other raster segmentation methods? From my experience, I had very good results using simple clustering with pixel coordinates (rows, columns) in soil mapping project.

The text was updated successfully, but these errors were encountered:

kadyb · 2023-02-07T21:04:50Z

I adapted your example from the vignette and here are my results with code. Basically, I used kmeans with coordinates and pixel smoothing. If we omit the coordinates scaling, then they have more influence than the raster values (RGB bands), so the results are more like supercells. This approach should work fine for multiple bands and times as well. Of course, the number of clusters and smoothing can be tuned.

library("terra")
library("supercells")
set.seed(1)

ortho = rast(system.file("raster/ortho.tif", package = "supercells"))
df = as.data.frame(ortho, xy = TRUE, na.rm = FALSE)
idx = which(complete.cases(df))

## without data scaling X and Y have more influence on the results in kmeans
df_omit = scale(df[idx, ])

mdl = kmeans(df_omit, centers = 100)

vec = rep(NA_integer_, ncell(ortho))
vec[idx] = mdl$cluster
rcl = rast(ortho, nlyrs = 1, vals = vec)

rcl = focal(rcl, w = 5, fun = "modal") # smooth

vect = as.polygons(rcl)

plot(ortho)
plot(vect, add = TRUE)

Data nonscaled
Data scaled (it better detects larger / homogeneous objects)

kadyb · 2023-02-07T21:09:33Z

The following questions arise:

Does any method create better superpixels / clusters (what are the differences)?
What is the performance of these methods?
Can they be applied to huge datasets?

Nowosad · 2023-02-08T12:17:44Z

@kadyb thanks, it looks interesting. I have some initial comments and code, but will need a few days to prepare it (given other responsibilities). Could you also try to prepare a larger example (e.g, 10000 by 10000 cells)?

kadyb · 2023-02-08T12:46:18Z

Could you also try to prepare a larger example (e.g, 10000 by 10000 cells)?

Do you have such dataset? If not then we can use the Sentinel 2 image (R, G, B, NIR bands in 10 m resolution) or Landsat (7 bands, 30 m resolution).

Edit: Here is link to Landsat 8 scene. This is very nice example because there are clouds, snow, ice, shadows, rivers, black water and bright water, but no buildings.

Nowosad · 2023-02-08T13:07:04Z

👍🏻

kadyb · 2023-02-08T13:58:53Z

Some notes:

I think that for larger datasets we should train models on smaller sample and then predict on the whole dataset.
kmeans algorithm is probably not the best choice.
Maybe it would be better to use data.table instead of data.frame.
Reduce dimensionality using e.g. PCA.
Downsample input raster.
Use collapse::fscale() for fast data scaling.
Maybe there should be maximum image size? If it exceeds the limit, segmentation will be performed in independent smaller blocks.

kadyb · 2023-02-09T09:55:08Z

So I tested my workflow on Landsat scene. Segmentation took ~1 hour on raster with 7 bands (8261 x 8201 pixels; non scaled) and 2000 clusters (kmeans algorithm). Below is result preview. The scripts are here: https://github.com/kadyb/image-segmentation

kadyb · 2023-02-09T13:26:56Z

Some my observations from the comparison:

It seems supercells is ~10x faster than what I proposed. This is mainly due to the fact that the prediction function is very slow. Maybe it would be better to use hierarchical clustering (or rewrite this function to C++).
With the same number of polygons after vectorization, kmeans creates larger areas on homogeneous surfaces. supercells creates more smaller polygons (with compactness = 10). This is because only 2000 clusters were set up in kmeans. And it would probably be better to scale the spectral bands in kmeans.
In both methods, it sometimes happens that the obiect shapes are strange (i.e. human would draw the boundaries differently), but it is not surprising because these methods are automatic. One more thing, I see that supercells detected the river in the north, while kmeans didn't. I suspect that due to the fact that the river pixels were not in the training set.

## supercells
start_time = Sys.time()
files = list.files("LO08_L1TP_067017_20130722_20200925_02_T1/",
                   pattern = ".+B[1-7]\\.TIF$", full.names = TRUE)
ras = rast(files)
names(ras) = paste0("B", 1:7)
k = 180000 # eventually there should be 82845 polygons
slic = supercells(ras, k = k, compactness = 10)
end_time = Sys.time()
end_time - start_time #> Time difference of 6.097648 mins

Nowosad · 2023-02-09T19:47:56Z

See my calculations and some comments regarding the first example at kadyb/image-segmentation#1.

I will try to look at the large data examples sometime next week. If you want to discuss anything directly -- feel free to call me on Monday.

kadyb · 2023-02-09T22:31:17Z

Thanks! One more thing, in the distant future it would be nice considering more advanced approaches, e.g. region growing (in GRASS) or OBIA.

Nowosad · 2023-02-11T16:20:29Z

@kadyb you may be also interested in https://r.geocompx.org/gis.html#saga

kadyb changed the title ~~Comparison with other methods~~ Comparison with kmeans clustering Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with kmeans clustering #23

Comparison with kmeans clustering #23

kadyb commented Feb 7, 2023 •

edited

Loading

kadyb commented Feb 7, 2023 •

edited

Loading

kadyb commented Feb 7, 2023

Nowosad commented Feb 8, 2023

kadyb commented Feb 8, 2023 •

edited

Loading

Nowosad commented Feb 8, 2023

kadyb commented Feb 8, 2023 •

edited

Loading

kadyb commented Feb 9, 2023 •

edited

Loading

kadyb commented Feb 9, 2023 •

edited

Loading

Nowosad commented Feb 9, 2023

kadyb commented Feb 9, 2023

Nowosad commented Feb 11, 2023

Comparison with kmeans clustering #23

Comparison with kmeans clustering #23

Comments

kadyb commented Feb 7, 2023 • edited Loading

kadyb commented Feb 7, 2023 • edited Loading

kadyb commented Feb 7, 2023

Nowosad commented Feb 8, 2023

kadyb commented Feb 8, 2023 • edited Loading

Nowosad commented Feb 8, 2023

kadyb commented Feb 8, 2023 • edited Loading

kadyb commented Feb 9, 2023 • edited Loading

kadyb commented Feb 9, 2023 • edited Loading

Nowosad commented Feb 9, 2023

kadyb commented Feb 9, 2023

Nowosad commented Feb 11, 2023

kadyb commented Feb 7, 2023 •

edited

Loading

kadyb commented Feb 7, 2023 •

edited

Loading

kadyb commented Feb 8, 2023 •

edited

Loading

kadyb commented Feb 8, 2023 •

edited

Loading

kadyb commented Feb 9, 2023 •

edited

Loading

kadyb commented Feb 9, 2023 •

edited

Loading