# Creating new Product Images with Batch Compute
__________________

The Compute module provides scalable, out of the box resources to parallelize your computations across nearly any spatio-temporal scale. Compute enables users to package and execute your Python code within nodes hosted on Descartes Labs' cloud infrastructure, offering the ability to access imagery at extremely high rates of throughput to fast-track your analyses.

In this example notebook, we will create a [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) to calculate and create an NDVI product from Sentinel-2 L2A imagery then scale it over Yakima County in Washington. The Yakima Valley contains ~75% of the total US hop acreage. 

As a hands-on example, we'll create a local Python function to search for Sentinel-2 imagery over a given AOI, calculate NDVI from the red and near-infrared bands, and upload the NDVI image to a new catalog product. Then, we'll create a Compute Function object wrapping our NDVI function to scale across the entire county.

For more on creating and managing products, review [Catalog 02 Creating and Managing Products.ipynb](../catalog/02%20Creating%20and%20Managing%20Products.ipynb)

In [None]:
import descarteslabs as dl
from descarteslabs.compute import Function, Job
from descarteslabs.catalog import Image, Product, SpectralBand, properties as p

First, we'll create a new product to write our results in to, including it's single NDVI band:

In [None]:
org = dl.auth.Auth().payload["org"]
user_id = dl.auth.Auth().namespace

*Note: Since other users within your organization may have run this notebook before, we will create a new unique Product ID using the current users ID. This is not required for your own work.*

In [None]:
# Create NDVI Catalog product
product = Product.get_or_create(
    id=f"sentinel-2_ndvi-{user_id}",
    name="Sentinel-2 L2A NDVI",
)
product.save()

In [None]:
# Create NDVI band for product
band = SpectralBand.get_or_create(
    id=f"{product.id}:ndvi",
    band_index=0,
    data_type="Float64",
    nodata=0,
    data_range=(0.0, 1.0),
    display_range=(0.0, 0.4),
)
band.save()
print(f"Saved {band}")

In [None]:
print("Product status: ", product.state)  # check that product is 'saved'
print("Product ID: ", product.id)  # Get product ID that we will pass to function later

Next we'll define our Python function to wrap into a Compute Function below. The general methodology is as follows:
* Inputs:
    * DLTile Key
    * Start Date
    * End Date
    * Product ID

* Steps:
    1. Create a DLTile object from our DLTile Key
    2. Search our Sentinel-2 L2A Product over our specified date range and, intersecting our DLTile
    3. Mosaic our resulting ImageCollection to retrieve the red and NIR bands
    4. Calculate NDVI on our ndarray
    5. Create a new Image object
    6. Write our ndarray to our new Image via upload_ndarray

In [None]:
# NDVI method
def create_ndvi_image(tile_key, start_date, end_date, product_id):
    import descarteslabs as dl

    # Import Catalog module methods
    from descarteslabs.catalog import Image, Product, properties as p

    # Get DLTile GeoContext for AOI
    dltile = dl.geo.DLTile.from_key(tile_key)

    # Find Sentinel-2 imagery over AOI
    print("Searching for imagery")
    images = (
        Product.get("esa:sentinel-2:l2a:v1")
        .images()
        .intersects(dltile)
        .filter(start_date < p.acquired <= end_date)
        .filter(p.cloud_fraction < 0.1)
    ).collect()
    print(f"Found {len(images)} images")

    # Create stack of red and nir bands for NDVI calc
    mosaic, raster_info = images.mosaic(["nir", "red"], raster_info=True)

    nir = mosaic[0]
    red = mosaic[1]

    # Calculate NDVI
    print("Calculating NDVI")
    ndvi = (nir - red) / (nir + red)

    # Create image for upload
    ndvi_image = Image(
        name=tile_key.replace(":", "_"),
        geometry=dltile.geometry,
        product_id=product_id,
        acquired=end_date,
    )

    # Upload image to catalog product
    upload = ndvi_image.upload_ndarray(ndvi, raster_meta=raster_info, overwrite=True)
    print("NDVI image upload ID:", upload.id)

    upload.wait_for_completion()

    print("NDVI image ID:", ndvi_image.id)

    return ndvi_image.id

### Define AOI: Yakima County, Washington state
Now that we have our function, let's define our AOI to calculate NDVI over. First we'll read in our GeoJSON file as a geodataframe:

In [None]:
import geopandas as gpd

yak = gpd.read_file("../catalog/data/yakima.geojson")

Next we will create a list of DLTiles from our input geometry by [`DLTile.from_shape`](https://docs.descarteslabs.com/descarteslabs/geo/readme.html#descarteslabs.geo.DLTile):

In [None]:
from shapely.geometry import box

# Create bounding box to get DLTiles over
bbox = box(*yak.total_bounds)
# Create DLTile GeoContext objects to iterate over for NDVI function
dltiles = dl.geo.DLTile.from_shape(
    bbox, resolution=30, tilesize=256, pad=0  # 30 meters
)
# Get list of DLTile keys
dltile_keys = [tile.key for tile in dltiles]

print("Number of DLTiles: ", len(dltiles))
print("Single DLTile example: ")
print(dltiles[0])

### Testing function
Let's test the NDVI method locally and see the images uploaded to the Catalog product we created:

In [None]:
# Test start and end dates
start_date = "2021-06-01"
end_date = "2021-06-15"
# Submit request for NDVI image upload
create_ndvi_image(
    dltile_keys[0], product_id=product.id, start_date=start_date, end_date=end_date
)

Now that we've completed the first tile's upload locally, we should see a single image returned in our new product:

In [None]:
# Compare Upload ID
print([upload.id for upload in product.image_uploads()])
# Check if new images are in Catalog product
images = product.images().filter(start_date < p.acquired <= end_date).collect()
print([image.id for image in images])

Note that our resulting ndarray is of the same shape as our DLTile:

In [None]:
test_arr = images[0].ndarray("ndvi")
test_arr.shape

And lets have a look at the image we created:

In [None]:
# Plot example image
dl.utils.display(test_arr, title="NDVI test", size=4, colormap="viridis")

In order to avoid duplication, lets delete the image that we created:

In [None]:
images[0].delete()

### Create Compute function
Now that we're happy with the results of our locally-run function, we can now create our Batch Compute Function. Here we will create a new Function object by passing in our Python function as the first input argument, with the following keyword arguments:
* __name__
* __image__, which should always be __python3.X:latest__ corresponding to your environment
* __cpus__, number of CPUs
* __memory__
* __timeout__, in seconds
* __maximum_concurrency__, or number of parallel Jobs running at a time
* __retry_count__, number of times to retry failed Jobs

For more information on __memory__ and __cpu__ combinations visit our [Documentation page](https://docs.descarteslabs.com/guides/quota.html)

In [None]:
async_func = Function(
    create_ndvi_image,
    name="NDVI-from-sentinel",
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=600,
    maximum_concurrency=25,
    retry_count=2,
)
async_func.save()

In [None]:
async_func.id

### Submit jobs to Compute function

Now that we have a Function built, we can test submitting a Job. 

First, we'll generate our argument to pass into our Function:

In [None]:
# Get the DLTile key for the first tile
key = dltiles[0].key
# Set a start_date and end_date
start_date = "2021-06-01"
end_date = "2021-06-15"

Next we will create a single Job object by passing in our Function ID and args:

In [None]:
# Create the job function
job = async_func(key, start_date, end_date, product.id)

We can [`wait_for_completion()`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Job.wait_for_completion) programmatically if we choose. Note that it will take some time between creating the Function above, and completion of the first job.

    job.wait_for_completion()

Or visit our [Compute Monitor](https://app.descarteslabs.com/compute) to track rour Function's progress.

Let's check what was created:

In [None]:
image_id = job.result()
print(image_id)

image = Image.get(image_id)
print(image)

And finally, to again avoid any duplication, we will delete both the job and the image we created.

In [None]:
job.delete(delete_result=True)
image.delete()

### Submit multiple jobs
We can also submit multiple jobs to the same function. This is the most typical pattern for creating and running large numbers of jobs, and is more efficient than creating jobs one by one, unless there is non-trivial computation required to generate the arguments to your Function.

In [None]:
## Collect args to submit to Function
# Get a list of the DLTile keys
dltile_keys = [
    [tile.key] for tile in dltiles
]  # First iterable argument needs to be list of lists - List[List]
# Test start and end dates
start_date = "2021-06-01"
end_date = "2021-06-15"

In [None]:
# Submit multiple Jobs using map
jobs = async_func.map(
    dltile_keys,
    repeat(
        {"product_id": product.id, "start_date": start_date, "end_date": end_date},
        len(dltile_keys),
    ),
)
print(len(jobs))

### Waiting for Completion
Now that we've mapped our arguments to Jobs, we can wait for our Function to complete by either navigating to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) or programmatically via:

In [None]:
# print new image ids as they are completed
for job in async_func.as_completed(jobs):
    print(job.result())

or:

In [None]:
# wait for everyting to finish
async_func.wait_for_completion()

### Verify
We can check for successful completion of all jobs, and verify our images exist.

In [None]:
async_func.refresh(include="job.statistics")
print(async_func.job_statistics)
print(product.images().count())

### Cleaning up

When we are done, it is always good hygiene to clean up!

In [None]:
# remove function and jobs
async_func.delete_jobs(delete_results=True)
async_func.delete()

# remove product and images
task = product.delete_related_objects()
task.wait_for_completion() if task
product.delete()