## Supervised ML on Descartes Labs Platform: Deploying a Random Forest Classifier
__________________
This example will demonstrate a typical example of how to deploy and save the results of a simple supervised classifier model.

The general steps covered in this notebook are:
* Create a new [`Product`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/product.html#descarteslabs.catalog.Product) and associated [`Classified Band`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/band.html#descarteslabs.catalog.ClassBand)
* Split up the training AOI into tiles
* Define an asynchronous[`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) which iterates over each tile and:
    * Retrieves the saved ML model from [02b Training a Supervised Classifier.ipynb](02b%20Training%20a%20Supervised%20Classifier.ipynb)
    * Searches and rasters NAIP imagery as an ndarray
    * Runs inference on retrieved imagery
    * Stores predictions as a new [`Image`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html#descarteslabs.catalog.Image)


Optionally move on to [02d Interactive Deployment with Dynamic Compute.ipynb](02d%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) where we will use [`Dynamic Compute`](https://docs.descarteslabs.com/api/dynamic-compute.html) to visualize results and interactively deploy this model to new AOIs.

**_Note:_** In order to run this example you must first complete the steps outlined in [02a Generate Training Data.ipynb](02a%20Generate%20Training%20Data.ipynb) and [02b Training a Supervised Classifier.ipynb](02b%20Training%20a%20Supervised%20Classifier.ipynb).

In [None]:
import datetime
import uuid
import yaml

import descarteslabs as dl
import descarteslabs.compute
import descarteslabs.vector as dl_vector
import geopandas as gpd
import matplotlib.pyplot as plt
import shapely.geometry as sgeom

Defining global variables for reference throughout this example, including the NAIP product ID, a list of bands, a start and end date, resolution, and a name for our function:

In [None]:
with open("config.yaml", "r") as file:
    config = yaml.load(file, yaml.FullLoader)

As well as retrieve our training features:

In [None]:
table = dl_vector.Table.get(config["training_table_name"])
gdf = table.collect()

## Creating an Output Product
Next we create a new product by first setting a unique ID:

In [None]:
org = dl.auth.Auth().payload["org"]
result_product_id = f"{org}:training-rfc-test-outputs"

#### _Note on Product Creation:_
We do not always need to delete and overwrite our product on every iteration as in the following cell. This notebook is designed for demonstration purposes, where we do not care about preserving each prior product.

In practice, as long as your product has a **unique** ID you may ignore the next cell and skip to the following.

In [None]:
result_product = dl.catalog.Product.get_or_create(result_product_id)

if result_product.state == dl.catalog.DocumentState.SAVED:
    status = result_product.delete_related_objects()
    if status:
        status.wait_for_completion()
    result_product.delete()

result_product = dl.catalog.Product.get_or_create(result_product_id)
result_product.name = "Testing RFC Outputs"
result_product.tags = ["examples"]
result_product.readers = []
result_product.save()
result_product

Next we create a single band, which includes:
* Unique ID
* Data type
* Valid data range
* Display range
* No data value
* Resolution

In [None]:
preds_max = int(gdf.category_int.max())
preds_min = int(gdf.category_int.min())

In [None]:
# Creating a band
band = dl.catalog.ClassBand.get_or_create(
    id=f"{result_product.id}:class",
    band_index=0,
    data_type=dl.catalog.DataType.BYTE,
    data_range=[preds_min, preds_max],
    display_range=[preds_min, preds_max],
    nodata=preds_max + 1,
    colormap_name="terrain",
    resolution=dl.catalog.Resolution(
        value=config["resolution_m"], unit=dl.catalog.ResolutionUnit.METERS
    ),
)
band.save()
band

## Tiling Input AOI
Here we will once again split the extent of our input features into tiles:

In [None]:
gdf_geom = sgeom.box(*gdf["geometry"].total_bounds)
dltiles = dl.geo.DLTile.from_shape(
    gdf_geom, resolution=config["resolution_m"], tilesize=2048, pad=0
)
len(dltiles)

This time there is no need to filter out tiles that don't intersect features!

In [None]:
fig, ax = plt.subplots(figsize=(5, 5))
dltile_gdf = gpd.GeoDataFrame(
    {
        "geometry": [dltile.geometry for dltile in dltiles],
    },
    crs=4326,
)
dltile_gdf.plot(ax=ax, facecolor="none", edgecolor="grey", linewidth=0.5)
gdf.plot(ax=ax, column="category")

## Defining the Batch Compute Function
Now we can define a self contained python function which writes our outputs to our new product. The inputs here are:

* A DLTile key
* Output Product ID

The overall steps are as follows:

* Re-create a tile from the passed key
* Retrieve our trained classifier as a blob
* Search NAIP using Catalog over our tile
* Mosaic the returned imagery as a numpy array
* Run predict from our retrieved model
* Create and upload our predictions to the output product

In [None]:
def write_rfc_to_catalog(dltile_key, product_id):

    import datetime
    import os
    import pickle

    import descarteslabs as dl
    import numpy as np

    org = dl.auth.Auth().payload["org"]
    namespace = dl.auth.Auth().namespace

    print(f"Processing: {dltile_key}")
    # Getting DLTile, finding scenes
    dltile = dl.geo.DLTile.from_key(dltile_key)

    blob = dl.catalog.Blob.get(name="training_rfc_model")
    blob.download("rfc.pickle")
    print("Retrieved Classifier")

    clf = pickle.load(open(f"rfc.pickle", "rb"))

    pid = "usda:naip:v1"
    bands = ["nir", "red", "green"]
    start = "2020-01-01"
    end = "2021-01-01"

    print("Searching Images...")
    ###NAIP over this DLTile
    naip_ic = (
        dl.catalog.Product.get(pid)
        .images()
        .intersects(dltile)
        .filter(start <= dl.catalog.properties.acquired < end)
        .sort("acquired")
        .limit(None)
    ).collect()
    print(naip_ic)

    naip_arr, raster_info = naip_ic.mosaic(bands=bands, bands_axis=-1, raster_info=True)
    print("Rastered imagery")
    # Reshaping for sklearn:
    nx, ny, nsamples = naip_arr.shape
    in_ras_arr = naip_arr.reshape(-1, nsamples)

    preds = clf.predict(in_ras_arr).reshape(nx, ny)
    print("Predictions Complete")
    # Getting the Product--NOTE that this is hard coded in
    out_product = dl.catalog.Product.get_or_create(product_id)

    # Creating an image - note the required unique id corresponding to the DLTile
    image = dl.catalog.Image(
        product=out_product,
        id="{}:class_{}".format(out_product.id, dltile_key.replace(":", "_")),
    )
    print(f"Writing to: {image.id}")

    image.acquired = datetime.datetime.now().isoformat()

    upload = image.upload_ndarray(
        ndarray=preds.astype("uint8"),
        raster_meta=raster_info,
        overviews=[2, 4, 8, 16, 32, 64],
        overview_resampler=dl.catalog.OverviewResampler.NEAREST,
        overwrite=True,
    )
    upload.wait_for_completion()
    print("Complete")
    # Cleaning up
    os.remove("rfc.pickle")
    return image.id

Now we format our input arguments:

In [None]:
args = [(dltile.key, result_product.id) for dltile in dltiles]

And we can test our function locally:

In [None]:
img_id = write_rfc_to_catalog(*args[0])

In [None]:
img = dl.catalog.Image.get(img_id)
ndarr = img.ndarray("class")
plt.imshow(ndarr[0])

Once we are happy with the performance of our function we can save it to our Compute service.

Note here that we must pass the specific scikit-learn _version_ as a requirement:

In [None]:
async_func = dl.compute.Function(
    write_rfc_to_catalog,
    name=config["pred_func_name"],
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=900,
    maximum_concurrency=20,
    retry_count=1,
    requirements=["scikit-learn==1.3.2"],
)

async_func.save()
print(f"Saved {async_func.id}")

**_Take note of your Function ID!_**

And finally map args to our Function to return a set of jobs:

In [None]:
jobs = async_func.map(args)
len(jobs)

Navigate to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) to track your progress! Or wait programmatically via:

In [None]:
# async_func.wait_for_completion()

Once this function completes, you can navigate to [Explorer](https://app.descarteslabs.com/explorer) to view your results or move on to [02d Interactive Deployment with Dynamic Compute.ipynb](02d%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) where we will use [`Dynamic Compute`](https://docs.descarteslabs.com/api/dynamic-compute.html) to visualize our new product and interactively deploy this model to new AOIs.