## Supervised ML on Descartes Labs Platform: Deploying a Random Forest Classifier
__________________
This example will demonstrate a typical example of how to deploy and save the results of a simple supervised classifier model.

The general steps covered in this notebook are:
* Create a new [`Product`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/product.html#descarteslabs.catalog.Product) and associated [`Classified Band`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/band.html#descarteslabs.catalog.ClassBand)
* Split up the training AOI into tiles
* Define an asynchronous[`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) which iterates over each tile and:
    * Retrieves the saved ML model from [02b Training a Supervised Classifier.ipynb](02b%20Training%20a%20Supervised%20Classifier.ipynb)
    * Searches and rasters NAIP imagery as an ndarray
    * Runs inference on retrieved imagery
    * Stores predictions as a new [`Image`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html#descarteslabs.catalog.Image)


Optionally move on to [02d Interactive Deployment with Dynamic Compute.ipynb](02d%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) where we will use [`Dynamic Compute`](https://docs.descarteslabs.com/api/dynamic-compute.html) to visualize results and define new areas for model inference interactively.

_Note:_ In order to run this example you must first complete the steps outlined in [02a Generate Training Data.ipynb](02a%20Generate%20Training%20Data.ipynb) and [02b Training a Supervised Classifier.ipynb](02b%20Training%20a%20Supervised%20Classifier.ipynb).

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import (
    Blob,
    ClassBand,
    Image,
    OverviewResampler,
    Product,
    properties as p,
)
from descarteslabs.compute import Function, Job
from descarteslabs.vector import Table

In [None]:
import geopandas as gpd

from datetime import datetime
from shapely.geometry import box

import matplotlib.pyplot as plt

%matplotlib inline

Defining global variables for reference throughout this example, including the NAIP product ID, a list of bands, a start and end date, resolution, and a name for our function:

In [None]:
pid = "usda:naip:v1"
bands = ["nir", "red", "green"]
start = "2020-01-01"
end = "2021-01-01"
resolution = 1.0  # meters
func_name = f"Run RFC Model Inference {datetime.today().strftime('%Y-%m-%d')}"

And current user namespace information:

In [None]:
org = dl.auth.Auth().payload["org"]
user_id = dl.auth.Auth().namespace

## Tiling Input AOI
Here we will retrieve the input training features and split the extent into tiles:

In [None]:
table_id = "descarteslabs:austin-landcover-training-data"
table = Table.get(table_id)
gdf = table.collect()

In [None]:
gdf_geom = box(*gdf["geometry"].total_bounds)
dltiles = dl.geo.DLTile.from_shape(
    gdf_geom, resolution=resolution, tilesize=2048, pad=0
)
len(dltiles)

In [None]:
fig, ax = plt.subplots(figsize=(5, 5))
dltile_gdf = gpd.GeoDataFrame(
    {
        "geometry": [dltile.geometry for dltile in dltiles],
    },
    crs=4326,
)
dltile_gdf.plot(ax=ax, facecolor="none", edgecolor="grey", linewidth=0.5)
gdf.plot(ax=ax, column="category")

## Creating an Output Product
Next we create a new product by first setting a unique ID:

In [None]:
rfc_pid = f"{org}:training-rfc-test-outputs"

#### _Note on Product Creation:_
We do not always need to delete and overwrite our product on every iteration as in the following cell. This notebook is designed for demonstration purposes, where we do not care about preserving each prior product.

In practice, as long as your product has a **unique** ID you may ignore the next cell and skip to the following.

In [None]:
try:
    rfc_product = Product.get(rfc_pid)
    print("Product already exists, deleting old iteration")
    status = rfc_product.delete_related_objects()
    if status:
        status.wait_for_completion()
    print("Deleted related objects")
    rfc_product.delete()
    print("Deleted Product")
except:
    print("No Product exists")

Here we'll create our output product:

In [None]:
# Creating catalog product
rfc_product = Product.get_or_create(rfc_pid)
rfc_product.name = "Testing RFC Outputs"
rfc_product.tags = ["examples"]
rfc_product.readers = []
rfc_product.save()
rfc_product

Next we create a single band, which includes:
* Unique ID
* Data type
* Valid data range
* Display range
* No data value
* Resolution

In [None]:
# Get min/max of classes for the Band
class_values = gdf["category_int"].unique()

preds_min = int(class_values.min())
preds_max = int(class_values.max())

In [None]:
# Creating a band
band = ClassBand.get_or_create(
    id=f"{rfc_product.id}:class",
    band_index=0,
    data_type=dl.catalog.DataType.BYTE,
    data_range=[preds_min, preds_max],
    display_range=[preds_min, preds_max],
    nodata=preds_max + 1,
    colormap_name="terrain",
    resolution=dl.catalog.Resolution(
        value=resolution, unit=dl.catalog.ResolutionUnit.METERS
    ),
)
band.save()
band

## Defining the Batch Compute Function
Now we can define a self contained python function which writes our outputs to our new product. The inputs here are:

* A DLTile key
* Output Product ID

The overall steps are as follows:

* Re-create a tile from the passed key
* Retrieve our trained classifier as a blob
* Search NAIP using Catalog over our tile
* Mosaic the returned imagery as a numpy array
* Run predict from our retrieved model
* Create and upload our predictions to the output product

In [None]:
def write_rfc_to_catalog(dltile_key, product_id):
    import descarteslabs as dl
    import os, pickle
    import numpy as np
    from descarteslabs.catalog import (
        Blob,
        Product,
        Image,
        OverviewResampler,
        properties as p,
    )

    org = dl.auth.Auth().payload["org"]
    namespace = dl.auth.Auth().namespace

    print(f"Processing: {dltile_key}")
    # Getting DLTile, finding scenes
    dltile = dl.geo.DLTile.from_key(dltile_key)

    blob = Blob.get(namespace=f"{org}:{namespace}", name="training_rfc_model")
    blob.download("rfc.pickle")
    print("Retrieved Classifier")

    clf = pickle.load(open(f"rfc.pickle", "rb"))

    pid = "usda:naip:v1"
    bands = ["nir", "red", "green"]
    start = "2020-01-01"
    end = "2021-01-01"

    print("Searching Images...")
    ###NAIP over this DLTile
    naip_ic = (
        Product.get(pid)
        .images()
        .intersects(dltile)
        .filter(start <= p.acquired < end)
        .sort("acquired")
        .limit(None)
    ).collect()
    print(naip_ic)

    naip_arr = naip_ic.mosaic(
        bands=bands,
        bands_axis=-1,
    )
    print("Rastered imagery")
    # Reshaping for sklearn:
    nx, ny, nsamples = naip_arr.shape
    in_ras_arr = naip_arr.reshape(-1, nsamples)

    preds = clf.predict(in_ras_arr).reshape(nx, ny)
    print("Predictions Complete")
    # Getting the Product--NOTE that this is hard coded in
    out_product = dl.catalog.Product.get_or_create(product_id)

    # Creating an image - note the required unique id corresponding to the DLTile
    image = Image(
        product=out_product,
        id="{}:class_{}".format(out_product.id, dltile_key.replace(":", "_")),
    )
    print(f"Writing to: {image.id}")
    # Setting image geotransform + projection from dltile info
    image.geotrans = dltile.geotrans
    image.projection = dltile.proj4
    image.acquired = "2022-08-09"  # Make sure this is accurate

    upload = image.upload_ndarray(
        ndarray=preds.astype("uint8"),
        overviews=[2, 4, 8, 16, 32, 64],
        overview_resampler=OverviewResampler.NEAREST,
        overwrite=True,
    )
    upload.wait_for_completion()
    print("complete")
    os.remove("rfc.pickle")
    return image.id

Now we format our input arguments:

In [None]:
args = [(dltile.key, rfc_pid) for dltile in dltiles]
args[0]

And we can test our function locally:

In [None]:
img_id = write_rfc_to_catalog(*args[0])
img = Image.get(img_id)
ndarr = img.ndarray("class")
plt.imshow(ndarr[0])

Once we are happy with the performance of our function we can save it to our Compute service.

Note here that we must pass scikit-learn as a requirement:

In [None]:
async_func = Function(
    write_rfc_to_catalog,
    name=func_name,
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=900,
    maximum_concurrency=20,
    retry_count=1,
    requirements=["descarteslabs-vector", "geopandas", "scikit-learn"],
)

async_func.save()
print(f"Saved {async_func.id}")

**_Take note of your Function ID!_**

And finally map args to our Function to return a set of jobs:

In [None]:
jobs = async_func.map(args)
len(jobs)

Navigate to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) to track your progress! Or wait programmatically via:

In [None]:
# async_func.wait_for_completion()

Once this function completes, you can navigate to [Explorer](https://app.descarteslabs.com/explorer) to view your results or move on to [02d Interactive Deployment with Dynamic Compute.ipynb](02d%20Interactive%20Deployment%20with%20Dynamic%20Compute.ipynb) [`Dynamic Compute`](https://docs.descarteslabs.com/api/dynamic-compute.html) to visualize results and define new areas for model inference interactively.