# Unsupervised ML on the Descartes Labs Platform: Deploying a KMeans Classifier with Batch Compute
__________________

This notebook will demonstrate a typical example of how to deploy a machine learning model using Descartes Labs Platform APIs. 

The general steps covered in this notebook are:
* Create a new [`Product`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/product.html#descarteslabs.catalog.Product) to store results, including a single classified [`Band`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/band.html#descarteslabs.catalog.Band)
* Split up the state of Vermont into [`DLTile`](https://docs.descarteslabs.com/descarteslabs/geo/readme.html#descarteslabs.geo.DLTile)s, using the [`Vector API`](https://docs.descarteslabs.com/api/vector.html) to retrieve the input geometry
* Define and submit an asynchronous Batch Compute [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function), accepting a tile ID and Product ID as input arguments which:
    * Searches intersecting Sentinel-2 data for the input tile
    * Retrieves the saved machine learning model [`Blob`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/blob.html#descarteslabs.catalog.Blob)
    * Runs inference on the **nir**, **red**, and **green** bands
    * Saves the predictions as a new [`Image`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html)

Optionally move on to  [01c Interactive Deployment with Dynamic Compute.ipynb](01c%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) to interactively deploy this model to new AOIs.

_Note:_ In order to run this example you must first complete the steps outlined in [01a Training an Unsupervised Classifier.ipynb](01a%20Training%20an%20Unsupervised%20Classifier.ipynb).

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import Blob, ClassBand, Image, Product, properties as p
from descarteslabs.compute import Function
from descarteslabs.vector import Table

In [None]:
import os, pickle, sys
import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

Defining global variables for reference throughout this example, including the current user's org and user ID:

In [None]:
org = dl.auth.Auth().payload["org"]
user_id = dl.auth.Auth().namespace

As well as a unique name for our function, and product search metadata:

In [None]:
func_name = f"Run KMeans Model Inference {datetime.today().strftime('%Y-%m-%d')}"
s2_pid = "esa:sentinel-2:l2a:v1"
bands = ["nir", "red", "green"]
resolution = 10.0
n_classes = 5

In [None]:
major = sys.version_info.major
minor = sys.version_info.minor
compute_image = f"python{major}.{minor}:latest"
compute_image

## Creating an Output Product

Next we will create a new product, by first creating a unique ID:

In [None]:
kmeans_pid = f"{org}:kmeans-results-{user_id}"

#### **_Note on Product Creation:_** 
We do not always need to delete and overwrite our product on every iteration as in the following cell. This notebook is designed for demonstration purposes, where we do not care about preserving each prior product. 

In practice, as long as your product has a **unique** ID you may ignore the next cell and skip to the following.

In [None]:
try:
    kmeans_product = Product.get(kmeans_pid)
    print("Product already exists, deleting old iteration")
    status = kmeans_product.delete_related_objects()
    if status:
        status.wait_for_completion()
    print("Deleted related objects")
    kmeans_product.delete()
    print("Deleted Product")
except:
    print("No Product exists")

Here we'll create our output product:

In [None]:
kmeans_product = Product.get_or_create(kmeans_pid)
kmeans_product.name = "Testing KMeans Outputs"
kmeans_product.tags = ["examples"]
kmeans_product.readers = []
kmeans_product.save()
kmeans_product

Next we create a single [`ClassBand`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/band.html#descarteslabs.catalog.ClassBand), which includes:
* Unique ID
* Data type
* Valid data range
* Display range
* No data value
* Resolution

In [None]:
band = ClassBand.get_or_create(
    id=f"{kmeans_product.id}:class",
    band_index=0,
    data_type=dl.catalog.DataType.BYTE,
    data_range=[0, n_classes],
    display_range=[0, n_classes],
    nodata=n_classes + 1,
    resolution=dl.catalog.Resolution(
        value=resolution, unit=dl.catalog.ResolutionUnit.METERS
    ),
)
band.save()
band

## Setting the Global Study Area

In the previous notebook [01a Training an Unsupervised Classifier.ipynb](01a%20Training%20an%20Unsupervised%20Classifier.ipynb) we trained our model over the Burlington area. Here we will retrieve all the counties in the state of Vermont as our global AOI over which we scale the model's inference.

Here we will search for all counties with **STATEFP==50**:

In [None]:
counties_table = Table.get(
    "descarteslabs:hifld:us-counties",
    columns=["STATEFP", "NAME", "geometry"],
    property_filter=(p.STATEFP == "50"),
)
counties_gdf = counties_table.collect()
counties_gdf.plot()

And call [`.dissolve()`](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.dissolve.html#geopandas.GeoDataFrame.dissolve) and pass the dissolved geometry in to [`DLTile.from_shape()`](https://docs.descarteslabs.com/descarteslabs/geo/readme.html#descarteslabs.geo.DLTile):

In [None]:
geom = counties_gdf.dissolve()["geometry"][0]
dltiles = dl.geo.DLTile.from_shape(geom, resolution=10.0, tilesize=1024, pad=0)
len(dltiles)

_Note on spatial tiling grids:_

This example was designed for demonstration purposes. You should modify the **resolution**, **tilesize**, and **pad** according to your input dataset.

## Defining the Batch Compute Function

Now we can define a Python function to submit to the [`Batch Compute`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html) service. The inputs here are:
* A DLTile key
* Output Product ID


The overall steps are as follows:
1. Retrieve our trained classifier as a blob
2. Re-create a tile from the passed key
3. Search Sentinel-2 using Catalog and our tile as the spatial intersection
4. Mosaic the returned imagery as a numpy array
5. Run predict from our retrieved model
6. Create and upload our predictions to the output product

In [None]:
def write_kmeans_to_catalog(dltile_key, out_pid):
    import descarteslabs as dl
    import pickle, os
    import numpy as np
    from datetime import datetime
    from descarteslabs.catalog import (
        Blob,
        Product,
        Image,
        OverviewResampler,
        properties as p,
    )

    # Variables
    org = dl.auth.Auth().payload["org"]
    user_id = dl.auth.Auth().namespace

    s2_pid = "esa:sentinel-2:l2a:v1"
    bands = ["nir", "red", "green"]
    # Download blob
    blob = Blob.get(name="training_kmeans_model", namespace=f"{org}:{user_id}")
    blob.download("kmeans.pickle")

    print("Downloaded classifier")
    # Load classifier
    clf = pickle.load(open(f"kmeans.pickle", "rb"))

    # Getting DLTile, finding imagery
    dltile = dl.geo.DLTile.from_key(dltile_key)

    s2_prod = Product.get(s2_pid)
    search = s2_prod.images()
    ic = (
        search.intersects(dltile)
        .filter("2023-06-01" < p.acquired < "2023-09-01")
        .filter(p.cloud_fraction < 0.1)
        .limit(None)
    ).collect()

    print(ic)

    # Asserting we have imagery, else returning None
    try:
        assert len(ic) > 0
    except:
        print("No imagery here")
        return "No imagery"

    # Downloading as ndarray
    mosaic = ic.mosaic(
        bands=bands,
        geocontext=dltile,
        bands_axis=-1,
    )

    print("Retrieved imagery")

    # Reshaping for sklearn:
    ny, nx, nbands = mosaic.shape
    in_data = mosaic.reshape(-1, nbands)

    # predicting
    preds = clf.predict(in_data).reshape(ny, nx)
    print("Ran predictions")
    # Getting Product
    out_product = Product.get_or_create(out_pid)
    print(f"Writing to {out_product.id}")

    # Creating an image
    # note the required *unique* id corresponding to the DLTile
    image = Image(
        product=out_product,
        id=f"{out_product.id}:{dltile_key.replace(':', '_')}",
    )

    print(f"Created {image.id}")

    # Setting image geotransform + projection from dltile info
    image.geotrans = dltile.geotrans
    image.projection = dltile.proj4

    # Make sure date is accurate
    image.acquired = datetime.today().strftime("%Y-%m-%d")

    print("Uploading image")
    # Uploading as array
    upload = image.upload_ndarray(
        ndarray=preds.astype("uint8"),
        overviews=[2, 4, 8, 16, 32, 64],
        overview_resampler=OverviewResampler.NEAREST,
        overwrite=True,
    )
    # Waiting for completion
    upload.wait_for_completion()
    print("Complete")

    # cleaning up
    os.remove("kmeans.pickle")
    return image.id

Now we format a list of arguments to iterate over:

In [None]:
args = [(dltile.key, kmeans_pid) for dltile in dltiles]
len(args)

And test things out locally:

In [None]:
img_id = write_kmeans_to_catalog(*args[0])

img = Image.get(img_id)
ndarray = img.ndarray("class")
dl.utils.display(ndarray[0], size=5)

Once we are happy with the performance of our function we can save it to our Compute service. 

Note here that we must pass scikit-learn as a requirement:

In [None]:
async_func = Function(
    write_kmeans_to_catalog,
    name=func_name,
    image=compute_image,
    cpus=1,
    memory=2,
    timeout=900,
    maximum_concurrency=50,
    retry_count=2,
    requirements=["scikit-learn"],
)
async_func.save()
print(f"Saved {async_func.id}")

**_Take note of your Function ID!_**

And finally map args to our [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) to return a set of [`Job`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Job)s:

In [None]:
jobs = async_func.map(args)
len(jobs)

Navigate to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) to track your progress! Or wait programmatically via:

In [None]:
# async_func.wait_for_completion()

Once this function completes, you can navigate to [Explorer](https://app.descarteslabs.com/explorer) to view your results or move on to [01c Interactive Deployment with Dynamic Compute.ipynb](01c%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) for more interactivity with [`Dynamic Compute`](https://docs.descarteslabs.com/api/dynamic-compute.html)