# Creating Labeled Data from a Planet GeoTIFF with Label Maker



In this notebook, we create labeled data for training a machine learning algorithm. As inputs, we use [OpenStreetMap](https://www.openstreetmap.org/#map=4/38.01/-95.84) as the ground truth source and a Planet GeoTIFF as the image source. Development Seed's [Label Maker](https://developmentseed.org/blog/2018/01/11/label-maker/) tool is used to download and prepare the ground truth data, chip the Planet imagery, and package the two to feed into the training process.

The primary interface for Label Maker is through the command-line interface (cli). It is configured through the creation of a configuration file. More information about that configuration file and command line usage can be found in the Label Maker repo [README](https://github.com/developmentseed/label-maker/blob/master/README.md).

The goal of this tutorial is to demonstrate labeling data from a local GeoTIFF as well as a Cloud-Optimized GeoTIFF (COG) available via the Planet API. This is inspired by the fact that label-maker now supports both local GeoTIFFs and remote COGs ([blog post](https://www.developmentseed.org/blog/2018/04/09/label-maker-update/)). When only a portion of a scene is needed, accessing it as a remote COG can save time, bandwidth, and local storage.

NOTE: Currently, label-maker supports only 8-bit RGB imagery. Therefore, the `visual` asset is best for use with label-maker.

**RUNNING NOTE**

This notebook is meant to be run in a docker image specific to this folder. The docker image must be built from the custom [Dockerfile](Dockerfile) according to the directions below.

In label-data directory:
```
docker build -t planet-notebooks:label .
```

Then start up the docker container as you usually would, specifying `planet-notebooks:label` as the image.

There is currently an incompatibility between the URL Planet uses for COGs (which does not use a geotiff name along with the `tif` extension) and the released version of label-maker. The released version looks for the `tif` extension in the url before treating it as a COG. See the [issue](https://github.com/developmentseed/label-maker/issues/80) for more information. There is a fixed version at [jreiberkyle/label-maker](https://github.com/jreiberkyle/label-maker/tree/geotiff-download-80). 

This image installs the fixed version of label-maker along with its dependencies.


## Install Dependencies

In addition to the python packages imported below, the label-maker and planet python packages are also a dependency. However, in this notebook, both packages are accessed through their command-line interface.

In [90]:
import json
import os

import geojson
import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import Image
import numpy as np
from planet import Session, DataClient, Auth
import rasterio
from shapely.geometry import shape

## Setup

In this section, we create the local data directory and set the scene specifications.

As noted in the intro, label-maker only supports 8-bit imagery, so we want to make sure to use the `visual` asset.

In [91]:
# if your Planet API Key is not set as an environment variable, you can paste it below
API_KEY = os.getenv('PL_API_KEY', 'PASTE_YOUR_KEY_HERE')

client = Auth.from_key(API_KEY)

In [92]:
# create data directory
data_dir = os.path.join('data', 'label-maker-geotiff')
if not os.path.isdir(data_dir):
    os.makedirs(data_dir)

In [93]:
# scene specifications
item_id = '760818_4848718_2017-09-17_0e2f'
item_type = 'PSOrthoTile'
asset_type = 'visual'

## Label Local GeoTIFF

In this portion of the tutorial, we create labeled data using a local GeoTIFF downloaded from Planet.

### Download Scene

There are a few steps involved in order to download an asset (in our case, the visual scene) using the Planet Python Client:

* **Get Asset:** Get a desscription of our asset based on the specifications we're looking for
* **Activate Asset:** Activate the asset with the given description
* **Wait Asset:** Wait for the asset to be activated
* **Download Asset:** Now our asset is ready for download!

Let's go through these steps below.

In [94]:
async with Session() as sess:
    cl = DataClient(sess)
    # Get Asset
    asset_desc = await cl.get_asset(item_type_id=item_type,item_id=item_id, asset_type_id=asset_type)
    # Activate Asset
    await cl.activate_asset(asset=asset_desc)
    # Wait Asset
    await cl.wait_asset(asset=asset_desc)
    # Download Asset
    asset_path = await cl.download_asset(asset=asset_desc, directory=data_dir, overwrite=True)

data/label-maker-geotiff/760818_4848718_2017-09-17_0e2f_RGB_Visual.tif: 100%|█| 90.0k/90.0k [00


In [95]:
asset_desc

{'_links': {'_self': 'https://api.planet.com/data/v1/assets/eyJpIjogIjc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsICJjIjogIlBTT3J0aG9UaWxlIiwgInQiOiAidmlzdWFsIiwgImN0IjogIml0ZW0tdHlwZSJ9',
  'activate': 'https://api.planet.com/data/v1/assets/eyJpIjogIjc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsICJjIjogIlBTT3J0aG9UaWxlIiwgInQiOiAidmlzdWFsIiwgImN0IjogIml0ZW0tdHlwZSJ9/activate',
  'type': 'https://api.planet.com/data/v1/asset-types/visual'},
 '_permissions': ['download'],
 'expires_at': '2022-12-06T16:34:02.874778',
 'location': 'https://api.planet.com/data/v1/download?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJaallKYTlaX0tNd3ZCNjRqd0YteDdLTUFjRFgwdVFEQ2ktaXowZHhvXzhMNVNoLW5mN01NLUxLY0daVzRsWGZUVWJDcWFxWnlOZEJFZmZqQ0xrY0tWUT09IiwiZXhwIjoxNjcwMzQ0NDQyLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsIml0ZW1fdHlwZV9pZCI6IlBTT3J0aG9UaWxlIiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.ysiUDEvZn3EmkvHZAJCARPAIAo58PTfHUY3iwJgZDQgVvJiJrZr0MShy_SIWVrBsiOj2HlTi

In [41]:
# Here's the relative path to our newly-downloaded asset!
path_str = str(asset_path)
path_str

'data/label-maker-geotiff/760818_4848718_2017-09-17_0e2f_RGB_Visual.tif'

### Create Label Maker Config File

Label maker is behavior is specified through a configuration file. The configuration file we use in this tutorial was pulled from the label-maker [tutorial on mapping buildings in Vietnam](https://developmentseed.org/blog/2018/01/19/sagemaker-label-maker-case/) and then customized to utilize a local GeoTIFF. The imagery url is set to the GeoTIFF filename. We also changed the bounds to an area of interest fully contained within the GeoTIFF. This is because I am not sure how label maker handles masked pixels.

See the label-maker [README](https://github.com/developmentseed/label-maker/blob/master/README.md) for a description of the config entries.

In [18]:
# define AOI
bounds_geom = {'type': 'Polygon',
 'coordinates': [[[105.81775409169494, 20.84015810005586],
   [105.9111433289945, 20.84015810005586],
   [105.9111433289945, 20.925748489914824],
   [105.81775409169494, 20.925748489914824],
   [105.81775409169494, 20.84015810005586]]]}

bounding_box = shape(bounds_geom).bounds
bounding_box

(105.81775409169494, 20.84015810005586, 105.9111433289945, 20.925748489914824)

In [45]:
# define location relative to data_dir
geotiff_filename = '760818_4848718_2017-09-17_0e2f_RGB_Visual.tif'

# create config file
local_config = {
    "country": "vietnam",
    "bounding_box": bounding_box,
    "zoom": 17,
    "classes": [
    { "name": "Buildings", "filter": ["has", "building"] }
    ],
    "imagery": geotiff_filename,
    "background_ratio": 1,
    "ml_type": "classification"
}

# define project files and folders
local_config_name = 'config_local.json'
local_config_filename = os.path.join(data_dir, local_config_name)

# write config file
with open(local_config_filename, 'w') as cfile:
    cfile.write(json.dumps(local_config))

print('wrote config to {}'.format(local_config_filename))

wrote config to data/label-maker-geotiff/config_local.json


### Run label-maker

In this section, we use label-maker to download and prepare the OSM label data and tile the GeoTIFF.

For more details on running label-maker, see the [README](https://github.com/developmentseed/label-maker/blob/master/README.md).

In [43]:
!cd $data_dir && label-maker download --config $local_config_name

zsh:1: command not found: label-maker


In [8]:
!cd $data_dir && label-maker labels --config $local_config_name

Determining labels for each tile
---
Buildings: 18 tiles
Total tiles: 1225
Writing out labels to data/labels.npz


In [9]:
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3 --config $local_config_name

In [34]:
tiles_dir = os.path.join(data_dir, 'data', 'tiles')
print(tiles_dir)

data/label-maker-geotiff/data/tiles


In [35]:
!ls $tiles_dir

In [37]:
# clean out tiles directory if it exists
!rm -rf $tiles_dir

In [38]:
%time !cd $data_dir && label-maker images --config $local_config_name

Downloading 36 tiles to data/tiles
CPU times: user 40 ms, sys: 30 ms, total: 70 ms
Wall time: 3.5 s


In [39]:
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
    img_filename = os.path.join(tiles_dir, img)
    print(img_filename)
    display(Image(filename=img_filename))

data/label-maker-geotiff/data/tiles/104064-57754-17.jpg


<IPython.core.display.Image object>

data/label-maker-geotiff/data/tiles/104087-57749-17.jpg


<IPython.core.display.Image object>

data/label-maker-geotiff/data/tiles/104080-57759-17.jpg


<IPython.core.display.Image object>

In [40]:
# will not be able to open image tiles that weren't generated because the label tiles contained no classes
!cd $data_dir && label-maker package --config $local_config_name

Saving packaged file to data/data.npz


### Check labeled data package

In [41]:
data_file = os.path.join(data_dir, 'data', 'data.npz')
data = np.load(data_file)

In [42]:
for k in data.keys():
    print('data[\'{}\'] shape: {}'.format(k, data[k].shape))

data['x_train'] shape: (28, 256, 256, 3)
data['y_train'] shape: (28, 2)
data['x_test'] shape: (8, 256, 256, 3)
data['y_test'] shape: (8, 2)


28 x (image) and y (label) datasets were created in the train set, and 8 x and y datasets were created in the test set, adding up to 36 sets total. Not enough to train a classifier, but this is only one image in a daily image stream, so looking at an image stack would allow us to build up an excellent labeled training dataset quickly!

## Remote Cloud-Optimized GeoTIFF

In this portion of this tutorial, we are accessing a portion of the GeoTIFF directly from the download endpoint. This way we only download the pixels that we need.

For another tutorial covering accessing Planet COGs, see the [Download a Subarea tutorial](https://developers.planet.com/tutorials/download-a-subarea/).

### Activate Scene & Get Download URL

Because activations do not last very long, we need to activate our asset again, right before we access the scene. Let's run this again to activate our asset. This time, we'll set `overwrite=False` because we don't need to actually download our image again, just activate it to get the download URL.

In [96]:
async with Session() as sess:
    cl = DataClient(sess)
    # Activate Asset
    await cl.activate_asset(asset=asset_desc)
    # Wait Asset
    await cl.wait_asset(asset=asset_desc)
    # Download Asset
    asset_path = await cl.download_asset(asset=asset_desc, directory=data_dir, overwrite=False)

In [97]:
# Let's check out our asset description. The 'location' is the download URL we're looking for.
asset_desc

{'_links': {'_self': 'https://api.planet.com/data/v1/assets/eyJpIjogIjc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsICJjIjogIlBTT3J0aG9UaWxlIiwgInQiOiAidmlzdWFsIiwgImN0IjogIml0ZW0tdHlwZSJ9',
  'activate': 'https://api.planet.com/data/v1/assets/eyJpIjogIjc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsICJjIjogIlBTT3J0aG9UaWxlIiwgInQiOiAidmlzdWFsIiwgImN0IjogIml0ZW0tdHlwZSJ9/activate',
  'type': 'https://api.planet.com/data/v1/asset-types/visual'},
 '_permissions': ['download'],
 'expires_at': '2022-12-06T16:34:02.874778',
 'location': 'https://api.planet.com/data/v1/download?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJaallKYTlaX0tNd3ZCNjRqd0YteDdLTUFjRFgwdVFEQ2ktaXowZHhvXzhMNVNoLW5mN01NLUxLY0daVzRsWGZUVWJDcWFxWnlOZEJFZmZqQ0xrY0tWUT09IiwiZXhwIjoxNjcwMzQ0NDQyLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsIml0ZW1fdHlwZV9pZCI6IlBTT3J0aG9UaWxlIiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.ysiUDEvZn3EmkvHZAJCARPAIAo58PTfHUY3iwJgZDQgVvJiJrZr0MShy_SIWVrBsiOj2HlTi

In [98]:
download_url = asset_desc['location']

### Check downloading COG within bounding box

GDAL uses the vsicurl driver to access the COG. To envoke the vsicurl driver, we prepend the url with `/vsicurl/`.

In [122]:
!rio info 'https://api.planet.com/data/v1/download?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJaallKYTlaX0tNd3ZCNjRqd0YteDdLTUFjRFgwdVFEQ2ktaXowZHhvXzhMNVNoLW5mN01NLUxLY0daVzRsWGZUVWJDcWFxWnlOZEJFZmZqQ0xrY0tWUT09IiwiZXhwIjoxNjcwMzQ0NDQyLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsIml0ZW1fdHlwZV9pZCI6IlBTT3J0aG9UaWxlIiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.ysiUDEvZn3EmkvHZAJCARPAIAo58PTfHUY3iwJgZDQgVvJiJrZr0MShy_SIWVrBsiOj2HlTiZFHWc7D-94187Q'

{"blockxsize": 512, "blockysize": 512, "bounds": [571500.0, 2303500.0, 596500.0, 2328500.0], "colorinterp": ["red", "green", "blue", "alpha"], "compress": "lzw", "count": 4, "crs": "EPSG:32648", "descriptions": [null, null, null, null], "driver": "GTiff", "dtype": "uint8", "height": 8000, "indexes": [1, 2, 3, 4], "interleave": "pixel", "lnglat": [105.80791666321292, 20.942539124513797], "mask_flags": [["per_dataset", "alpha"], ["per_dataset", "alpha"], ["per_dataset", "alpha"], ["all_valid"]], "nodata": null, "res": [3.125, 3.125], "shape": [8000, 8000], "tiled": true, "transform": [3.125, 0.0, 571500.0, 0.0, -3.125, 2328500.0, 0.0, 0.0, 1.0], "units": [null, null, null, null], "width": 8000}


In [109]:
# check if COG url is valid
!gdalinfo $vsicurl_url

zsh:1: no matches found: /vsicurl/https://api.planet.com/data/v1/download?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJaallKYTlaX0tNd3ZCNjRqd0YteDdLTUFjRFgwdVFEQ2ktaXowZHhvXzhMNVNoLW5mN01NLUxLY0daVzRsWGZUVWJDcWFxWnlOZEJFZmZqQ0xrY0tWUT09IiwiZXhwIjoxNjcwMzQ0NDQyLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsIml0ZW1fdHlwZV9pZCI6IlBTT3J0aG9UaWxlIiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.ysiUDEvZn3EmkvHZAJCARPAIAo58PTfHUY3iwJgZDQgVvJiJrZr0MShy_SIWVrBsiOj2HlTiZFHWc7D-94187Q


In [115]:
# write geojson file
geojson_str = geojson.dumps(geojson.Feature(geometry=bounds_geom))
geojson_file = os.path.join(data_dir, 'bounds.geojson')
with open(geojson_file, 'w') as cfile:
    cfile.write(geojson_str)

In [116]:
output_file = os.path.join(data_dir, item_id + '_bounds.tif')

In [120]:
%time !gdalwarp -cutline $geojson_file -crop_to_cutline -overwrite $vsicurl_url $output_file

ERROR 1: An error occurred while creating a virtual connection to the DAP server:Error while reading the URL: https://api.planet.com/data/v1/download.ver.
The OPeNDAP server returned the following message:
Not Found: The data source or server could not be found.
        Often this means that the OPeNDAP server is missing or needs attention;
        Please contact the server administrator.
ERROR 1: Driver DODS is considered for removal in GDAL 3.5. You are invited to convert any dataset in that format to another more common one. If you need this driver in future GDAL versions, create a ticket at https://github.com/OSGeo/gdal (look first for an existing one first) to explain how critical it is for you (but the GDAL project may still remove it), and to enable it now, set the GDAL_ENABLE_DEPRECATED_DRIVER_DODS configuration option / environment variable to YES.
CPU times: user 59.5 ms, sys: 28.1 ms, total: 87.6 ms
Wall time: 3.78 s


In [118]:
# load local visual module
# autoreload because visual is in development

%load_ext autoreload
%autoreload 2

import visual

In [119]:
def load_rgb(filename):
    with rasterio.open(filename, 'r') as src:
        # visual band ordering: red, green, blue, alpha
        r, g, b, a = src.read() 
        # mask wherever the alpha band is zero
        mask = a == 0  
    bands = [np.ma.array(band, mask=mask) for band in [r,g,b]]
    return bands

rgb_bands = load_rgb(output_file)
visual.plot_image(rgb_bands, title='Cropped Scene')

RasterioIOError: data/label-maker-geotiff/760818_4848718_2017-09-17_0e2f_bounds.tif: No such file or directory

### Create Config File

In [54]:
# create config file
config = local_config.copy()
config['imagery'] = download_url

# define project files and folders
config_filename = os.path.join(data_dir, 'config.json')

# write config file
with open(config_filename, 'w') as cfile:
    cfile.write(json.dumps(config))

print('wrote config to {}'.format(config_filename))

wrote config to data/label-maker-geotiff/config.json


### Run Label Maker

The only label maker commands that interact with the imagery are `preview` and `images`. We have already run `download` and `label` above, so we don't need to run them again. We do, however, need to clear out the tiles directory (created by `images`) of the tiles created from the local GeoTIFF.

NOTE: This section requires the container be from the `planet-notebooks:label` image. See introduction for building instructions.

In [55]:
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3

In [56]:
# clear tiles directory
!cd $tiles_dir && rm -R *

In [59]:
# download image tiles
# Note: if this doesn't work, there are two possibilities:
# 1. the activation code has timed out. re-activate if it has been over an hour
# 2. this notebook is not being run in planet-notebooks:label image, which implements a fix to label-maker.
!cd $data_dir && label-maker images

Downloading 36 tiles to data/tiles


In [60]:
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
    img_filename = os.path.join(tiles_dir, img)
    print(img_filename)
    display(Image(filename=img_filename))

data/label-maker-geotiff/data/tiles/104095-57761-17.jpg


<IPython.core.display.Image object>

data/label-maker-geotiff/data/tiles/104070-57772-17.jpg


<IPython.core.display.Image object>

data/label-maker-geotiff/data/tiles/104080-57759-17.jpg


<IPython.core.display.Image object>