## Unsupervised ML on the Descartes Labs Platform: Deploying a KMeans Classifier with Batch Compute
This notebook will demonstrate a typical example of how to deploy a ML model using Descartes Labs Platform APIs. General steps we will cover in this notebook are:
* Create a new product to store results, including a single classified band
* Split up our larger study area into DLTiles
* Define and submit a Batch Compute asynchronous Function which iterates over each tile and:
    * Searches corresponding Sentinel-2 data
    * Retrieves the saved ML model
    * Runs predictions on the scene's imagery
    * Stores the predictions as a new image

Optionally move on to  [04c Interactive Deployment with Dynamic Compute.ipynb]() for even more options for interfacing with our APIs.

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import Blob, ClassBand, Image, Product, properties as p
from descarteslabs.compute import Function, Job
from descarteslabs.vector import Table

In [None]:
import os, pickle
import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

Global variables:

In [None]:
org = dl.auth.Auth().payload["org"]
user_id = dl.auth.Auth().namespace

In [None]:
func_name = f"Run KMeans Model Inference {datetime.today().strftime('%Y-%m-%d')}"
s2_pid = "esa:sentinel-2:l2a:v1"
bands = ["nir", "red", "green"]
resolution = 10.0
n_classes = 5

Deleting this Product if we have already run this example in the past:

In [None]:
kmeans_pid = f"{org}:kmeans-results-{user_id}"

In [None]:
try:
    kmeans_product = Product.get_or_create(kmeans_pid)
    status = kmeans_product.delete_related_objects()
    if status:
        status.wait_for_completion()
    kmeans_product.delete()
    print("Deleted")
except:
    print("No Product exists")

In [None]:
kmeans_product = Product.get_or_create(kmeans_pid)
kmeans_product.name = "Testing KMeans Outputs"
kmeans_product.tags = ["examples"]
kmeans_product.readers = []
kmeans_product.save()
kmeans_product

Next we create a classified band:

In [None]:
band = ClassBand.get_or_create(
    id=f"{kmeans_product.id}:class",
    band_index=0,
    data_type=dl.catalog.DataType.BYTE,
    data_range=[0, n_classes],
    display_range=[0, n_classes],
    nodata=n_classes + 1,
    resolution=dl.catalog.Resolution(
        value=resolution, unit=dl.catalog.ResolutionUnit.METERS
    ),
)
band.save()
band

Next we will search Vector for all counties in the US State of Vermont:

In [None]:
counties_table = Table.get(
    "descarteslabs:hifld:us-counties",
    columns=["STATEFP", "NAME", "geometry"],
    property_filter=(p.STATEFP == "50"),
)
counties_gdf = counties_table.collect()
counties_gdf.plot()

Then call .dissolve() and pass the unioned geometry in for tiling:

In [None]:
geom = counties_gdf.dissolve()["geometry"][0]
dltiles = dl.geo.DLTile.from_shape(geom, resolution=10.0, tilesize=1024, pad=0)
len(dltiles)

Now we will define a Python function to submit to the Batch Compute service. The inputs here are:
* A DLTile key
* Our KMeans Product ID


The overall steps are as follows:
1. Retrieve and download our ML classifier from a blob
2. Re-create a DLTile from the passed key
3. Search Sentinel-2 using Catalog and our DLTile as the spatial intersection
4. Mosaic the returned ImageCollection
5. Run clf.predict() from our retrieved model
6. Create and upload our predictions as a new image in our output product

In [None]:
def write_kmeans_to_catalog(dltile_key, out_pid):
    import descarteslabs as dl
    import pickle, os
    import numpy as np
    from descarteslabs.catalog import (
        Blob,
        Product,
        Image,
        OverviewResampler,
        properties as p,
    )

    org = dl.auth.Auth().payload["org"]
    user_id = dl.auth.Auth().namespace

    blob = Blob.get(namespace=f"{org}:{user_id}", name="training_kmeans_model")

    blob.download("kmeans.pickle")

    print("Downloaded classifier")
    clf = pickle.load(open(f"kmeans.pickle", "rb"))

    # Getting DLTile, finding scenes
    dltile = dl.geo.DLTile.from_key(dltile_key)

    s2_pid = "esa:sentinel-2:l2a:v1"
    bands = ["nir", "red", "green"]

    s2_prod = Product.get(s2_pid)
    search = s2_prod.images()
    ic = (
        search.intersects(dltile)
        .filter("2023-06-01" < p.acquired < "2023-09-01")
        .filter(p.cloud_fraction < 0.1)
        .limit(None)
    ).collect()

    print(ic)
    # Asserting we have imagery, else returning
    try:
        assert len(ic) > 0
    except:
        print("No imagery here")
        return

    # Downloading as ndarray
    mosaic = ic.mosaic(
        bands=bands,
        geocontext=dltile,
        bands_axis=-1,
    )

    print("Retrieved imagery")

    # Reshaping for sklearn:
    ny, nx, nsamples = mosaic.shape
    in_data = mosaic.reshape(-1, nsamples)

    # predicting
    preds = clf.predict(in_data).reshape(ny, nx)
    print("Ran predictions")

    out_product = Product.get_or_create(out_pid)
    print(f"Writing to {out_product.id}")

    # Creating an image - note the required unique id corresponding to the DLTile
    image = Image(
        product=out_product,
        id=f"{out_product.id}:{dltile_key.replace(':', '_')}",
    )
    print(f"Created {image.id}")

    # Setting image geotransform + projection from dltile info
    image.geotrans = dltile.geotrans
    image.projection = dltile.proj4
    image.acquired = "2023-11-15"  # Make sure this is accurate
    print("Uploading image")

    image.upload_ndarray(
        ndarray=preds.astype("uint8"),
        overviews=[2, 4, 8, 16, 32, 64],
        overview_resampler=OverviewResampler.NEAREST,
        overwrite=True,
    )

    print("Complete")

    # cleaning up
    os.remove("kmeans.pickle")
    return image.id

Now we format a list of arguments to iterate over:

In [None]:
args = [(dltile.key, kmeans_pid) for dltile in dltiles]
len(args)

Testing things out locally:

In [None]:
img_id = write_kmeans_to_catalog(*args[0])

img = Image.get(img_id)
ndarray = img.ndarray("class")
plt.imshow(ndarray[0])

Once we are happy with the performance of our function we can submit it to our Batch Compute service:

In [None]:
async_func = Function(
    write_kmeans_to_catalog,
    name=func_name,
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=900,
    maximum_concurrency=50,
    retry_count=2,
    requirements=["scikit-learn"],
)
async_func.save()
print(f"Saved {async_func.id}")

__Take note of your Function ID!__

And map args to our Function:

In [None]:
jobs = async_func.map(args)
len(jobs)

Navigate to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) to track your progress!