# Labeling for Pixel-wise Classification with Sentinel-2 Satellite Imagery

Michael Mommert, Stuttgart University of Applied Sciences, 2025

This Notebook introduces the process of data annotation or labeling for a subsequent classification task. You will learn how to label satellite images with a web-based tool and how to prepare the resulting data for a pixel-wise classification task. We will showcase the process for a [tiny Sentinel-2 sample dataset](https://zenodo.org/records/12819787). For more details on the supervised learning techniques used in this Notebook, please refer to the Notebook [*Pixel-wise Classification with Machine Learning Methods for Sentinel-2 Satellite Imagery*](https://github.com/Hochschule-fuer-Technik-Stuttgart/teaching-mommert/blob/main/classification/pixel-wise/ml/sentinel-2/classification_pixel-wise_ml_sentinel2.ipynb).

In [None]:
%pip install numpy \
    scipy \
    shapely \
    matplotlib \
    rasterio \
    seaborn \
    scikit-learn

In [None]:
import os
import json
import numpy as np
from shapely.geometry import Polygon
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from rasterio.features import rasterize
from sklearn.model_selection import train_test_split
import zipfile

## Data Download and Preprocessing

We download the Sentinel-2 sample dataset used in this Notebook.

In [None]:
# download dataset
!wget https://zenodo.org/records/14990200/files/sentinel2.zip?download=1 -O sentinel2.zip

import zipfile

# extract dataset zipfile
with zipfile.ZipFile('sentinel2.zip', 'r') as zip_ref:
    zip_ref.extractall('./')

The dataset contains 5 different scenes in a coastal setup. Each image contains all 12 Sentinel-2 Level-2A bands.

Due to the small size of the dataset, we can read in the entire dataset into a NumPy array.

In [None]:
data = []
filenames = sorted(os.listdir('data/'))
for filename in filenames:
    if filename.endswith('.npy'):
        data.append(np.load(open(os.path.join('data', filename), 'rb'), allow_pickle=True))
data = np.array(data)

The data is now stored as a Numpy array, following the shape convention `[scene, band, height, width]`.

Let's display one of the images. In order to do so, we have to do two things:

1. we have to change the shape to `[height, width, bands]` (this particular shape is expected by matplotlib)
2. we have to normalize the pixel values (which vary on a large range) to a range from 0 to 1.

In [None]:
i = 1  # image id

# first, we extract the R, G and B bands and stack them into the shape [120, 120, 3]
img = np.dstack([data[i][3], data[i][2], data[i][1]])

# then we normalize the pixel values in such a way that they range from 0 (min) to 1 (max)
img = (img-np.min(img, axis=(0,1)))/(np.max(img, axis=(0,1)) - np.min(img, axis=(0,1)))

# now we can plot the image
plt.imshow(img)

Our goal is now to label different land cover classes in our dataset. Potential classes are `water`, `forest`, `grassland` and `sand`, all of which are present in the image shown above. Labeling means to assign image regions to those different classes. We could define these areas within Python, but that is cumbersome.

Instead, we will use a web-based tool for generating the labels. In order to use this tool, we have to create simple image files (such as `.png`) which the tool can read in. Since `.png` files can only store RGB data, we have to extract these bands and save the resulting image as `.png` files. One more thing to pay attention to is that the resulting image file should have the same dimensions as the original data (120 x 120 pixels); If this is not enforced, the resulting labels must be transformed to the correct image size.

In [None]:
# create a `pngs` directory, if it not yet exists
os.mkdir('pngs') if not os.path.exists('pngs/') else None

# loop over all images in the dataset
png_filenames = []
for i in range(len(data)):

    # extract RGB bands and normalize
    img = np.dstack([data[i][3], data[i][2], data[i][1]])
    img = (img-np.min(img, axis=(0,1)))/(np.max(img, axis=(0,1)) - np.min(img, axis=(0,1)))

    f, ax = plt.subplots(1, 1, figsize=(5, 5))  # create an image canvas of a fixed size (5 inches x 5 inches)
    ax.imshow(img)  # plot image
    plt.axis('off')  # remove axes labels
    plt.tight_layout(pad=0)  # remove padding around the image
    png_filenames.append('img_{:03d}.png'.format(i))
    plt.savefig(os.path.join('pngs', png_filenames[-1]), dpi=img.shape[0]/5)  # write file; define dpi value to force correct image size
    plt.close()  # close plot to clear memory

The images can now be found in the `pngs/` directory. Before we can start with the labeling process, you have to **download** the `pngs/` directory to your local computer.

## Labeling

We will use the [*ImgLab*](https://solothought.com/imglab/) web-tool to perform the labeling. *ImgLab* is rather simplistic, but it offers all the functionality that we will need in the following. Other tools, such as [*Label Studio*](https://labelstud.io/) offer more functionality and convenience, but they require local installation. For large-scale labeling campaigns, I would definitely recommend *Label Studio*, but for the purpose of this tutorial, *ImgLab*'s browser-based app is easier to use.

Follow these steps:
1. Open [*ImgLab*](https://solothought.com/imglab/) in your browser.
2. Click on the **folder symbol** in the bottom left corner. This will allow you to **import images from a folder**. Select the `pngs/` directory that you downloaded. Once the images have been imported, you see the five images on the bottom of the screen.
3. Click on the first image. It will be displayed in the main area of the screen. Since our images are rather small (120 x 120 pixels), it makes sense to zoom in. Use the **zoom function** in the bottom left corner of the screen. Click on the magnifying glass. The magnifying factor will appear at the top of the screen. Increase the magnification until the image details are easy to see for you.
4. We begin the labeling of the first image. Click on the **Polygon symbol** on the left; your cursor will turn into a crosshair. Pick an area that you would like to label and create polygon nodes by following and clickking on its outline. Once your polygon is complete, hit the **Enter key**. If you made a mistake and would like to remove the polygon, simply click on it and hit the Delete key. If the polygon is fine, click on it and **select a category name** in the top right corner of the screen. This will assign a class name to the polygon; simply type in the name. Repeat this step to label a number of areas in the image and in the other images. Make sure to use consistent class names.
5. Once you're done with labeling, you can **export the labels**. Different formats are available. For our purposes, please download the labels as **COCO JSON**.
6. Finally, please upload the resulting `.json` file to your Notebook environment.

## Label processing

In the following, you can use your own label file. Simply replace the filename in the next code cell. Alternatively, you can use a pre-built label file called `coastal_labels.json`.

Let's have a look at the `.json` file.

In [None]:
rawlabels = json.load(open('coastal_labels.json', 'r'))
rawlabels

The file contains a lot of information. Let's have a look at the main attributes, which are stored as the keys of the resulting dictionary:

In [None]:
rawlabels.keys()

What do those attributes mean?

* `images` contains the list of image filenames used in the labeling. For each image, it contains its filename, dimensions and an id number.
* `types` defines the type of labels; in our case, we provide instance labels (each instance is labeled separately).
* `annotations` is the most important attribute and contains a list of polygons that you created. For each polygon, it contains a list of the node coordinates (`segmentation`), the `image_id`, class id (`category_id`) and other attributes.
* `categories` lists the different classes that are available. For each class, it contains the name (what you provided in the labeling process), a unique id number and a supercategory (which we don't use here).

Let's extract one polygon and reassemble it using the `shapely` module.

In [None]:
coordsraw = rawlabels['annotations'][0]['segmentation'][0]  # extract raw coordinate list (x_0, y_0, x_1, y_1, x_2...)
coords = [(coordsraw[i], coordsraw[i+1]) for i in range(0, len(coordsraw), 2)]  # split coordinates by x and y
Polygon(coords)  # turn coordinates into Polygon

This looks like a polygon. Let's extract all polygons from one of the images and plot them on the image.

But before we do so, let's assemble the different class names and assign colors to them.

In [None]:
# extract class names
class_names = {}
for c in rawlabels['categories']:
    class_names[c['id']] =  c['name']

# define class colors (RGB values)
class_colors = np.array([
    (0, 0, 0), # background should be black
    (0, 0, 1),  # class 1 (water)
    (1, 1, 0.8),  # class 2 (sand)
    (0.2, 0.8, 0.2),  # class 3 (grassland)
    (0.1, 0.5,0.1)])  # class 4 (forest)
class_cmap_nobackground = mpl.colors.ListedColormap(class_colors[1:])
class_cmap = mpl.colors.ListedColormap(class_colors)

Now we plot an image with the corresponding polygon labels.

In [None]:
i = 2 # image index

# identify image_id used in rawlabels for this image
filename = png_filenames[i]
image_id = None
for imgfile in rawlabels['images']:
    if imgfile['file_name'] == filename:
        image_id = imgfile['id']

# extract RGB bands and normalize, plot image
img = np.dstack([data[i][3], data[i][2], data[i][1]])
img = (img-np.min(img, axis=(0,1)))/(np.max(img, axis=(0,1)) - np.min(img, axis=(0,1)))
plt.imshow(img)

# identify annotations that correspond to this image
for j in range(len(rawlabels['annotations'])):
    if rawlabels['annotations'][j]['image_id'] == image_id:
        # extract coordinates and class
        coordsraw = rawlabels['annotations'][j]['segmentation'][0]
        coords = np.array([(coordsraw[m], coordsraw[m+1]) for m in range(0, len(coordsraw), 2)])
        class_id = rawlabels['annotations'][j]['category_id']

        # plot polygon based on coordinates
        plt.fill(*coords.transpose(), color=class_colors[class_id], label=class_names[class_id], edgecolor='black', linewidth=2, alpha=0.5)

plt.legend()

This looks good. Now let's turn this into masks for each image and each class.

In [None]:
# store labels as array with shape [scene, height, width]
# note that we add a background class for those image areas that are not labeled
labels = np.zeros((len(data), data.shape[-2], data.shape[-1]))

# for each image...
for i in range(len(data)):

    # extract image data
    imgdata = data[i]

    # identify image_id used in rawlabels for this image
    filename = png_filenames[i]
    image_id = None
    for imgfile in rawlabels['images']:
        if imgfile['file_name'] == filename:
            image_id = imgfile['id']

    # identify annotations that correspond to this image
    for j in range(len(rawlabels['annotations'])):
        if rawlabels['annotations'][j]['image_id'] == image_id:
            # extract coordinates and class
            coordsraw = rawlabels['annotations'][j]['segmentation'][0]
            coords = np.array([(coordsraw[m], coordsraw[m+1]) for m in range(0, len(coordsraw), 2)])
            class_id = rawlabels['annotations'][j]['category_id']

            if len(coords) < 3:
                # if there's less than 3 points, it's not a polygon
                continue

            # create a polygon and rasterize it
            polygon = Polygon(coords)
            m = rasterize([polygon], out_shape=(120, 120))
            labels[i] = labels[i] + m*class_id

Now we generated masks that show us which of the labeled pixels belong to which class. Let's have a look at one of the masks.

In [None]:
i = 2 # image index

f, ax = plt.subplots(1, 2, figsize=(10,5))

# extract RGB bands and normalize, plot image
img = np.dstack([data[i][3], data[i][2], data[i][1]])
img = (img-np.min(img, axis=(0,1)))/(np.max(img, axis=(0,1)) - np.min(img, axis=(0,1)))
ax[0].imshow(img)

# plot segmentation mask
ax[1].imshow(labels[i], cmap=class_cmap)

Note the fact that the vast majority of pixels is black - that is the case since all of those pixels are not labeled. We have to be careful to only consider those pixels in the training process that have labels. Let's extract the labeled pixels.

In [None]:
X, y = [], []
for c in range(1, len(class_names)+1):
    for i in range(len(data)):
        _X = np.dstack(data[i])[labels[i] == c]  # extract spectral properties
        _y = np.array([c for _ in range(len(_X))])  # extract class index
        # append results
        if len(X) == 0:
            X = _X
            y = _y
        else:
            X = np.concatenate([X, _X], axis=0)
            y = np.concatenate([y, _y], axis=0)

X.shape, y.shape

Great, now we have a table of labeled pixels with corresponding classes. Before we can use the data, we have to split them into train/val/test splits.

In [None]:
# we split the entire dataset into a training (70%) and remain (30%) split; the remain fraction will be split into validation (50%) and test (50%)
X_train, X_remain, y_train, y_remain = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=42, stratify=y)
X_val, X_test, y_val, y_test = train_test_split(X_remain, y_remain, train_size=0.5, shuffle=True, random_state=42, stratify=y_remain)

X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape

## k-Nearest Neighbor Classification

Now we can use a k-NN (or any other classifier) to classify our dataset. We use $k=5$ - but this is just a guess.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# instantiate the model
model = KNeighborsClassifier(5)

# "train" the model on the training dataset
model.fit(X_train, y_train)

As we did before, we plot the prediction for the entire scene:

In [None]:
i = 2

# predict classes for each pixel
pred = model.predict(np.dstack(data[i]).reshape(-1, 12))

f, ax = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(12, 6))

img = np.dstack([data[i][3], data[i][2], data[i][1]])  # we extract the R, G, B bands for this scene
img = (img-np.min(img, axis=(0,1)))/(np.max(img, axis=(0,1)) - np.min(img, axis=(0,1)))
ax[0].imshow(img)

ax[1].imshow(pred.reshape(120, 120), cmap=class_cmap_nobackground)

The qualitative result looks very good! What about the accuracy metric?

In [None]:
from sklearn.metrics import accuracy_score

pred = model.predict(X_test)
accuracy_score(pred, y_test)

This looks also very good.

We can look at the mistakes the model makes using a confusion matrix:

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

conf_matrix = confusion_matrix(y_test, pred)
disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=list(class_names.values())[:-1])
disp.plot()

There is very little confusion between the classes. In fact, the only confusion is between the grassland and forest classes, which makes sense.

**Exercise**: Label more polygons and train a model based on the combined dataset. Will the accuracy improve even more?

In [None]:
# use this cell for the exercise