<a href="https://colab.research.google.com/github/agroimpacts/nmeo/blob/class%2Ff2023/materials/code/notebooks/planet_basemap_cluster_segment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Analyzing Planet tiles

In this exercise, we are going to perform some basic analyses on the Planet images we reprojected and retiled from NICFI quads over Malawi.  

We will calculate NDVI, perform a cluster analysis (unsupervised classification), and segment the image.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

## Set up

### Installs and imports

In [None]:
%%capture
%pip install leafmap
%pip install localtileserver
%pip install pysnic
%pip install rioxarray
%pip install rio-cogeo

Restart runtime to enable imports

In [None]:
root = '/content/gdrive'
import os
import sys
import re
from subprocess import run
from pathlib import Path
import pandas as pd
import leafmap.leafmap as leafmap
# import leafmap.foliumap as leafmap
import localtileserver
import numpy as np
import geopandas as gpd
import rioxarray as rxr

import rasterio
from rasterio.plot import show, reshape_as_raster, reshape_as_image
from rasterio.windows import Window
import random

from matplotlib import pyplot as plt
from pysnic.algorithms.snic import snic
from sklearn.cluster import KMeans

### Paths and files

In [None]:
proj_path = f"{root}/MyDrive/data/nmeo"  # main output path

quad_dir = f"{proj_path}/quads"  # for downloaded NICFI quads
tile_dir = f"{proj_path}/tiles"  # for output tiles
analyses_dir = f"{proj_path}/analyses"  # for output tiles
if not os.path.exists(analyses_dir):
    os.makedirs(analyses_dir, exist_ok=True)

tile_path = f"{proj_path}/inputs/malawi_tiles_buf179.geojson"

### Get Planet tiles

In [None]:
cog_tiles = [f"{tile_dir}/{tile}" for tile in os.listdir(tile_dir)
             if ".xml" not in tile]
cog_tiles

## Calculate a VI using xarray/rioxarray

Instead of rasterio, this time around we will work with the image using [`xarray`](https://docs.xarray.dev/en/stable/getting-started-guide/why-xarray.html) and [`rioxarray`](https://corteva.github.io/rioxarray/stable/readme.html).

Here we will simply read in one of the images, calculate NDVI from it, and plot.

In [None]:
img = rxr.open_rasterio(cog_tiles[0])

# calculate ndvi
ndvi = ((img[3] - img[2]) / (img[3] + img[2]))

fig, ax_arr = plt.subplots(1, 2, sharex=True, figsize=(20, 10))
ax1, ax2 = ax_arr.ravel()
img[[3,2,1]].plot.imshow(ax=ax1, vmin=0, vmax=3000)
ax1.set_title("NGB")
ndvi.plot.imshow(ax=ax2, add_colorbar=False, vmin=-1, vmax=1)
ax2.set_title("NIR")
None

## Cluster an image

Here we will use kmeans clustering to do an unsupervised classification of the imagery

### Read in image

We are collecting just the innermost 2000X2000 pixels of the image, removing the overlap between each tiles and its neighbor.


In [None]:
with rasterio.open(cog_tiles[0]) as src:
    window = Window(179, 179, 2000, 2000)
    dst_transform = src.window_transform(window)
    dst_meta = src.meta.copy()

    img = src.read(window=window)

# Same thing, with rioxarray
# img = rxr.open_rasterio(cog_tiles[0])\
#     .isel(x=slice(358, 2358), y=slice(358, 2358))\
#     .load()

# get height and width of image
_, h, w = img.shape



### Reshape and sample the data

Kmeans needs the data values all in one row, with one column per band. For that we use a function to flatten the image.

In [None]:
def get_flat(array):
    _, h, w = array.shape
    data = np.empty((h * w, len(array)))
    for i in range(len(array)):
        data[:, i] = array[i, :, :].flatten()

    return data

And then apply it to the data. We then sample 1000 observations from that dataset to provide some samples for training the model, as opposed to using all 4,000,000 * 4 values

In [None]:
data = get_flat(img).tolist()
random.seed(1) # this makes sure the same random sample is collected
data_sample = random.sample(data, 1000)
print(f"{len(data)} pixels in image, {len(data_sample)} pixels in sample")

### Fit the model and make a prediction
Now let's fit the model. We will specify that there should be 7 clusters, or classes, identified.

In [None]:
# fit the model with a random_state value, to ensure reproducibility
model = KMeans(n_clusters=7, random_state=1)
model.fit(data_sample)

Run the prediction, applying to the full reshaped image data

In [None]:
out = model.predict(data).reshape((h, w)).astype(np.uint8)

Let's write the image out to a geotiff. We have to first prepare the necessary metadata, in terms of number of rows, height, and the spatial transform of the image. Note that when we read in the tile image initially, we collected image metadata in `dst_meta`, and collected the spatial transform for the subset of the image we captured using the `Window` function by applying the `window_transform` function.

We update the `dst_meta` object with those values, and then write out the geotiff.

In [None]:
i = 0
dst_meta["transform"] = dst_transform
dst_meta["height"] = h
dst_meta["width"] = w
dst_meta["count"] = 1 # only one output band
dst_meta["dtype"] = np.uint8 # reduces the size of image on disk
# dst_meta["nodata"] = -128

out_file = Path(analyses_dir) / f"cluster7_{i}.tif"
with rasterio.open(out_file, "w+", **dst_meta) as dst:
    dst.write(out, 1)

### Have a look

We can look quickly at the output classification using `rasterio`'s `show` function.

In [None]:
show(rasterio.open(out_file))
None

Let's compare to the image using leafmap. Note, showing the cluster image might throw errors and fail to display--it worked previously but seems to now show an error related to more than one data type.

In [None]:
m = leafmap.Map()
m.add_basemap()
m.add_basemap("SATELLITE")
m.add_raster(cog_tiles[0], vmin=0, vmax=2500, layer_name="Tile")
m.add_raster(str(out_file), layer_name="Clusters", palette="Spectral",
             vmin=0, vmax=7, zoom_to_layer=True)
m

## Cluster all images

We are now going to make a larger model that can cluster all the images, and then apply that to cluster all of the collected tiles.

### Collect samples from each image

In [None]:
data_samples = []
for cog in cog_tiles:
    print(f"Processing {os.path.basename(cog)}")
    with rasterio.open(cog_tiles[0]) as src:
        window = Window(179, 179, 2000, 2000)
        img = src.read(window=window)

    data = get_flat(img).tolist()
    random.seed(1) # this makes sure the same random sample is collected
    data_samples.extend(random.sample(data, 500))

### Fit the model

In [None]:
model = KMeans(n_clusters=7, random_state=1)
model.fit(data_samples)

### Make a prediction on each image

In [None]:
for i in range(len(cog_tiles)):
# for i in range(2):
    print(i)
    with rasterio.open(cog_tiles[i]) as src:
        window = Window(179, 179, 2000, 2000)
        dst_transform = src.window_transform(window)
        dst_meta = src.meta.copy()
        img = src.read(window=window)

    # get height and width of image
    _, h, w = img.shape

    # reshape
    data = get_flat(img).tolist()

    # predict
    out = model.predict(data).reshape((h, w)).astype(np.uint8)

    # write out
    dst_meta["transform"] = dst_transform
    dst_meta["height"] = h
    dst_meta["width"] = w
    dst_meta["count"] = 1 # only one output band
    dst_meta["dtype"] = np.uint8 # reduces the size of image on disk

    out_file = str(Path(analyses_dir) / f"cluster7_2_{i}.tif")
    with rasterio.open(out_file, "w+", **dst_meta) as dst:
        dst.write(out, 1)


### Mosaic and COGify the predictions


In [None]:
from rasterio.merge import merge

# get list of predictions
cluster_files = [
    f"{analyses_dir}/{clust}" for clust in os.listdir(analyses_dir)
    if "_2_" in clust
]

# read them into a list
files_to_mosaic = []
for file in cluster_files:
    src = rasterio.open(file)
    files_to_mosaic.append(src)

# mosaic/merge them
mosaic, out_trans = merge(files_to_mosaic)

# Update metadata and write to disk
dst_meta = src.meta.copy()
dst_meta.update({
    "height": mosaic.shape[1],
    "width": mosaic.shape[2],
    "transform": out_trans
    # "count": 1
})

out_file = str(Path(analyses_dir) / "cluster7_mosaic.tif")
with rasterio.open(out_file, "w", **dst_meta) as dst:
    dst.write(mosaic)


COGify

In [None]:
cmd = ['rio', 'cogeo', 'create', '-b', '1', str(out_file), str(out_file)]
p = run(cmd, capture_output=True)
msg = p.stderr.decode().split('\n')
print(f'...{msg[-2]}')

cmd = ['rio', 'cogeo', 'validate', str(out_file)]
p = run(cmd, capture_output = True)
msg = p.stdout.decode().split('\n')
print(f'...{msg[0]}')

### Inspect

In [None]:
import re
tids = [int(re.sub("tile", "", os.path.basename(tile).split("_")[0]))
        for tile in cog_tiles]
tiles = gpd.read_file(tile_path)

In [None]:
m = leafmap.Map()
m.add_basemap()
m.add_basemap("SATELLITE")
m.add_raster(str(out_file), layer_name="Clusters", vmin=0, vmax=7,
             palette="Spectral", zoom_to_layer=True)
m.add_gdf(tiles[tiles.tile.isin(tids)])
m

## Segment an image

We are going to use [SNIC](https://www.epfl.ch/labs/ivrl/research/snic-superpixels/), a segmentation algorithm, to segment the Planet tile.

We are going to test it on just one image, as it can run rather slowly. SNIC requires the image to be reshaped from (band, row, col) to (row, col, band), so we will make use of the NIR, red, and green channels.

As before we take a windowed read of the first input image, selecting bands 4,3,2 (note rasterio uses 1-based indexing for the bands), and apply the function `reshape_as_image` to place bands last.


In [None]:
with rasterio.open(cog_tiles[0]) as src:
    window = Window(179, 179, 2000, 2000)
    dst_transform = src.window_transform(window)
    dst_meta = src.meta.copy()
    img = src.read([4,3,2], window=window)
    print(f"rasterio shape: {img.shape}")

img = reshape_as_image(img)
print(f"shape needed for SNIC: {img.shape}")

Segment the image using SNIC, specifying that we want 2000 segments with a compactness of 10.

In [None]:
segmentation, _, centroids = snic(img, 2000, 10)

Write the segmented image to a geotiff.

In [None]:
dst_meta["transform"] = dst_transform
dst_meta["height"] = 2000
dst_meta["width"] = 2000
dst_meta["count"] = 1 # only one output band
dst_meta["dtype"] = np.int64 # reduces the size of image on disk

i = 0
out_file = Path(analyses_dir) / f"segmentation_{i}.tif"
with rasterio.open(out_file, "w+", **dst_meta) as dst:
    dst.write(np.array(segmentation), 1)

And plot

In [None]:
fig, ax_arr = plt.subplots(1, 2, figsize=(20, 20))
ax1, ax2 = ax_arr.ravel()
show(rasterio.open(cog_tiles[0]).read([4,3,2], window=window),
     adjust=True, ax=ax1)
show(segmentation, ax=ax2)
None

### On your own

The segmented image provides a unique instance of different objects it detects in the image. Try work out how to combine the segments with the clusters such that you isolate segments representing a distinct class (e.g. the ones that looks most like crops).