# Creating new Product Images with Batch Compute
__________________

The `Compute` module provides scalable, out of the box resources to parallelize your computations across nearly any spatio-temporal scale. `Compute` enables users to package and execute your Python code within nodes hosted on Descartes Labs' cloud infrastructure, offering the ability to access imagery at extremely high rates of throughput to fast-track your analyses.

In this example notebook, we will create a [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) to calculate and create an NDVI product from Sentinel-2 L2A imagery then scale it over Yakima County in Washington. The Yakima Valley contains ~75% of the total US hop acreage. 

First, we'll start by importing the descarteslabs Python client and the `Compute` module's main classes, the `Function` and `Job`:

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import Product, SpectralBand, properties as p

In [None]:
from descarteslabs.compute import Function, Job

Now that we have the `Compute` module imported, let's look at the primary objects we'll be working with: `Function` and `Job`.
 * `Function:` dynamically created, serverless functions containing user submitted, compiled code that you can submit many jobs to.
 * `Job:` submitted request for a single invocation of a created Function. 
 
As a hands-on example, we'll create a local Python function to search for Sentinel-2 imagery over a given AOI, calculate NDVI from the red and near-infrared bands, and upload the NDVI image to our catalog product using our `Catalog` module. Then, we'll create a `Compute Function` object wrapping our NDVI function to scale across the entire county. 

First, we'll create a new `Product` to write our results in to, including a `Spectral Band`:

In [None]:
# Add Unique ID to prevent conflicting products across your organization
from uuid import uuid4

# Get your org for namespace
org = dl.auth.Auth().payload["org"]
# Create NDVI Catalog product
product = Product.get_or_create(
    id=f"{org}:sentinel-2_ndvi-{uuid4()}",
    name="Sentinel-2 L2A NDVI",
)
product.save()

In [None]:
# Create NDVI band for product
band = SpectralBand(
    product=product,
    name="ndvi",
    band_index=0,
    data_type="Float64",
    nodata=0,
    data_range=(0.0, 1.0),
    display_range=(0.0, 0.4),
)
band.save()
print(f"Saved {band}")

In [None]:
print("Product status: ", product.state)  # check that product is 'saved'
print("Product ID: ", product.id)  # Get product ID that we will pass to function later

Next we'll define our Python function to wrap into a `Compute Function` below. The general methodology is as follows:
* Inputs:
    * DLTile Key
    * Start Date
    * End Date
    * Product ID

* Steps:
    1. Create a `DLTile` object from our DLTile Key
    2. Search our Sentinel-2 L2A `Product` obver our specified date range and, intersecting our `DLTile`
    3. `Mosaic` our resulting `ImageCollection` to retrieve the red and NIR bands
    4. Calculate NDVI on our `ndarray`
    5. Create a new `Image` object
    6. Write our `ndarray` to our new `Image` via `upload_ndarray`

In [None]:
# NDVI method
def create_ndvi_image(tile_key, start_date, end_date, product_id):
    import descarteslabs as dl

    # Import Catalog module methods
    from descarteslabs.catalog import Image, Product, properties as p

    # Get DLTile GeoContext for AOI
    dltile = dl.geo.DLTile.from_key(tile_key)

    # Find Sentinel-2 imagery over AOI
    print("Searching for imagery")
    images = (
        Product.get("esa:sentinel-2:l2a:v1")
        .images()
        .intersects(dltile)
        .filter(start_date < p.acquired <= end_date)
        .filter(p.cloud_fraction < 0.1)
    ).collect()
    print(f"Found {len(images)} images")

    # Create stack of red and nir bands for NDVI calc
    mosaic, raster_info = images.mosaic(["nir", "red"], raster_info=True)

    nir = mosaic[0]
    red = mosaic[1]

    # Calculate NDVI
    print("Calculating NDVI")
    ndvi = (nir - red) / (nir + red)

    # Create image for upload
    ndvi_image = Image(
        name=tile_key.replace(":", "_"),
        geometry=dltile.geometry,
        product_id=product_id,
        acquired=end_date,
    )

    # Upload image to catalog product
    upload = ndvi_image.upload_ndarray(ndvi, raster_meta=raster_info, overwrite=True)
    upload.wait_for_completion()

    print("NDVI image upload ID:", upload.id)

    return upload.id

### Define AOI: Yakima County, Washington state
Now that we have our function, let's define our AOI to calculate NDVI over. First we'll read in our GeoJSON file as a `GeoDataFrame`:

In [None]:
import geopandas as gpd

# Get Yakima County GeoJSON File
yak = gpd.read_file("../catalog/data/yakima.geojson")

Next we will create a list of `DLTile`s from our input geometry:

In [None]:
from shapely.geometry import box

# Create bounding box to get DLTiles over
bbox = box(*yak.total_bounds)
# Create DLTile GeoContext objects to iterate over for NDVI function
dltiles = dl.geo.DLTile.from_shape(
    bbox, resolution=30, tilesize=256, pad=0  # 30 meters
)
# Get list of DLTile keys
dltile_keys = [tile.key for tile in dltiles]

print("Number of DLTiles: ", len(dltiles))
print("Single DLTile example: ")
print(dltiles[0])

### Testing function
Let's test the NDVI method locally and see the images uploaded to the Catalog product we created:

In [None]:
# Test start and end dates
start_date = "2021-06-01"
end_date = "2021-06-15"
# Submit request for NDVI image upload
ndvi_test = create_ndvi_image(
    dltile_keys[0], product_id=product.id, start_date=start_date, end_date=end_date
)

Now that we've completed the first tile's upload locally, we should see a single `Image` returned in our new `Product`:

In [None]:
# Compare upload ID
product.image_uploads().collect()
# Check if new images are in Catalog product
img = product.images().filter(start_date < p.acquired <= end_date).collect()
img

Note that our resulting `ndarray` is of the same shape as our `DLTile`:

In [None]:
test_arr = img[0].ndarray("ndvi")
test_arr.shape

In [None]:
# Plot example of image
dl.utils.display(test_arr, title="NDVI test", size=4, colormap="viridis")

### Create Compute function
Now that we're happy with the results of our locally-run function, we can now create our `Batch Compute Function`. Here we will create a new `Function` object by passing in our Python function as the first inpurt argument, with the following keyword arguments:
* Name
* Image, which should always be `python3.X:latest`
* CPUs
* Memory
* Timeout, in seconds
* Maximum Concurrency
* Retry Count

In [None]:
async_func = Function(
    create_ndvi_image,
    name="NDVI-from-sentinel",
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=60 * 10,  # in seconds ~ currently 10 min # Must be less than 900
    maximum_concurrency=25,
    retry_count=2,
)
async_func.save()

In [None]:
async_func.id

### Submit jobs to Compute function

Now that we have a `Function` built, we can test submitting a `Job`. 

First, we'll generate our argument to pass into our `Function`:

In [None]:
# Get the DLTile key for the first tile
key = dltiles[0].key
# Set a start_date and end_date
start_date = "2021-06-01"
end_date = "2021-06-15"

Next we will create a single `Job` object by passing in our `Function ID` and `args` as a list:

In [None]:
# Create the job function
job = Job(async_func.id, args=[key, start_date, end_date, product.id])
job.save()

We can `wait_for_completion()` programmatically if we choose:

### Submit multiple jobs
We can also submit multiple jobs to the same function.

##### Create dictionary of arguments to pass as Jobs


In [None]:
# Convenience function for building kwarg dict
def get_bulk_kwargs(jobs, **kwargs):
    bulk_kwargs = []
    for _ in range(jobs):
        bulk_kwargs.append(dict(kwargs))

    return bulk_kwargs

In [None]:
## Collect args to submit to Function
# Get a list of the DLTile keys
dltile_keys = [
    [tile.key] for tile in dltiles
]  # First iterable argument needs to be list of lists - List[List]
# Test start and end dates
start_date = "2021-06-01"
end_date = "2021-06-15"

# Build kwarg dict
kwargs = get_bulk_kwargs(
    len(dltiles), start_date=start_date, end_date=end_date, product_id=product.id
)
print("Example of Key word args")
kwargs[0:5]

In [None]:
dltile_keys[:5]

In [None]:
# Submit multiple Jobs using map
jobs = async_func.map(dltile_keys, kwargs)

In [None]:
len(jobs)

### Submit Multiple jobs by creating multiple Job objects

Jobs can also be create directly and will be executed once the object is saved. I.e., `job.save()`

### Waiting for Completion
Now that we've mapped our arguments to `Job`s, we can wait for our `Function` to complete by either navigating to [app.descarteslabs.com/monitor](https://app.descarteslabs.com/monitor) or programmatically via:

In [None]:
from IPython.display import IFrame

IFrame("https://app.descarteslabs.com/monitor", width=700, height=350)