/
experimental.Rmd
153 lines (117 loc) · 6.08 KB
/
experimental.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Experimental features of the supercells package"
author: Jakub Nowosad
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Experimental features of the supercells package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: "`r system.file('refs.bib', package = 'supercells')`"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 6.5#,
# fig.height = 5
)
```
Superpixels is a collection of segmentation concepts of grouping pixels with similar characteristics.
It is often used in computer vision to delineate parts of RGB images that are more meaningful and easier to analyze.
When applied to RGB images, each superpixel contains similar colors that also could represent real-world objects.
A large number of methods for creating superpixels were developed in the last decades, with the SLIC algorithm (Achanta et al. (2012), <doi:10.1109/TPAMI.2012.120>) being the most prominent.
The **supercells** package aims to utilize the concept of superpixels to a variety of spatial data.
This package works on spatial data with one variable (e.g., continuous raster), many variables (e.g., RGB rasters), and spatial patterns (e.g., areas in categorical rasters).
Therefore, it enables not only to find areas that look similar on an RGB (satellite) image, but also to regionalize areas with comparable values of one or more variables.
This vignette shows some experimental features of the **supercells** package.
To reproduce the following results on your own computer, install and attach the packages:
```{r, message=FALSE}
library(supercells) # superpixels for spatial data
library(terra) # spatial raster data reading and handling
library(sf) # spatial vector data reading and handling
```
The first step is to read the input data -- `ortho.tif` -- included in the **supercells** package, that contains three layers representing red, green, and blue satellite bands^[It also has an empty square on the left part of the image for testing purposes.].
```{r}
ortho = rast(system.file("raster/ortho.tif", package = "supercells"))
plot(ortho)
```
# Large data support
The **supercells** package supports calculations on datasets that do not fit into computer memory (RAM).^[Many thanks to Micha Silver; https://github.com/Nowosad/supercells/issues/10.]
This is done by splitting the input data into smaller chunks, and then reading each chunk separately into the memory.
To turn this feature on, you need to use the `chunks` argument:
- `chunks = FALSE` - the default. Chunking is not used.
- `chunks = TRUE`- only large input data will be split into chunks of automatically determined size
- `chunks = (a numeric value)` - the input raster data is split into chunks with user-defined size, e.g., `chunks = 150` means that each chunk will have the size of 150 by 150 cells at the most
```{r}
sc_ortho = supercells(ortho, k = 100, compactness = 1, chunks = 150)
```
The latter approach can be seen in the visualization of the results below:
```{r}
plot(sc_ortho)
```
# Parallel calculations
The large data support can be extended and used for parallel computations.
In this approach, we need to:
1. Attach the **future** package.
2. Specify the parallelization strategy (the `plan()` function).
3. Use `chunks = TRUE` or set `chunks` to some numeric value in `supercells()`.
4. Set `future = TRUE` in `supercells()`.
```{r, message=FALSE}
library(future)
plan(multisession, workers = 2)
sc_ortho2 = supercells(ortho, k = 100, compactness = 1,
chunks = 150, future = TRUE)
```
The code above will divide the whole area into six chunks, and use two separate R sessions (`workers = 2`) to work on some chunks in parallel.
The calculations' result is the same as in the previous example; however, it should decrease computation time for large datasets.^[Note: parallelization should not be used for small data, as it could add unnecessary overhead.]
```{r}
plot(sc_ortho2)
```
# Custom cluster centers
By default, the original SLIC algorithm uses regularly distributed cluster centers, where each of the initial cluster centers has an overlapping "search window" with its neighbors.
The next experimental feature in **supercells** is the possibility of specifying custom cluster centers.^[Thanks to Johannes Heisig; https://github.com/Nowosad/supercells/issues/15.]
This can be done by providing an `sf` object with any number of points.
The `sf` object can be read from a file, or created manually, as in the example below:
```{r}
set.seed(2021-11-21)
custom_centers = sf::st_as_sfc(sf::st_bbox(ortho))
custom_centers = sf::st_sample(custom_centers, 100, type = "random")
custom_centers = sf::st_sf(geom = custom_centers)
```
```{r}
plot(ortho)
plot(st_geometry(custom_centers), add = TRUE)
```
When we want to use custom cluster centers, then we need to set both `k` and `step` arguments:
- `k` - to provide an `sf` object with any number of points.
- `step` - to provide a size of the "search window".
The impact of setting different values of `step` can be seen in the examples below.
When `step` was set to 10, then many locations in our area were not segmented:
```{r}
ortho_slic1 = supercells(ortho, k = custom_centers, step = 10,
compactness = 1, clean = FALSE)
```
```{r}
plot(ortho)
plot(st_geometry(ortho_slic1), add = TRUE, border = "red")
```
Increasing the `step` value allows to include more locations into the segmentation process.
However, it also results in many detached areas (tiny polygons not connected directly to the main superpixels):
```{r}
ortho_slic2 = supercells(ortho, k = custom_centers, step = 20,
compactness = 1, clean = FALSE)
```
```{r}
plot(ortho)
plot(st_geometry(ortho_slic2), add = TRUE, border = "red")
```
The result could be made more smooth by allowing for enforcement of the supercells' connectivity:
```{r}
ortho_slic3 = supercells(ortho, k = custom_centers, step = 20,
compactness = 1, clean = TRUE, minarea = 8)
```
```{r}
plot(ortho)
plot(st_geometry(ortho_slic3), add = TRUE, border = "red")
```