# Imagery Generation with Batch Compute
__________________

This notebook will cover a typical pattern of using Compute to scalably generate new imagery across large areas of interest (AOIs). 

In this hands-on example we will calculate a simple Normalized Difference Vegetation Index (NDVI) map over Yakima County in Washington, which contains ~75% of the total US hop acreage. 

The general steps covered in this example are as follows:
* Create a new [`Product`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/product.html#descarteslabs.catalog.Product) and [`Band`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/band.html#descarteslabs.catalog.Band) to save results
* Define a self-contained Python function that accepts a tile key, start date, end date, and output product ID which performs the following:
    * Generates a [`DLTile`](https://docs.descarteslabs.com/descarteslabs/geo/readme.html#descarteslabs.geo.DLTile) from the passed key
    * Searches Sentinel-2 imagery which intersect the passed tile given the specified date range and 10% cloud fraction
    * Calculates NDVI from the **nir** and **red** bands
    * Saves the NDVI calculation as a new [`Image`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html#descarteslabs.catalog.Image)
* Wrap the local function into a [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) to scale asynchronously across the entire county

*__Note__*: For more on creating and managing products, review Catalog tutorial [02 Creating and Managing Products.ipynb](../catalog/02%20Creating%20and%20Managing%20Products.ipynb).

In [None]:
import descarteslabs as dl
from descarteslabs.compute import Function, Job
from descarteslabs.catalog import Image, Product, SpectralBand, properties as p

Defining global variables for reference throughout this example, including the current user's org and user ID:

In [None]:
org = dl.auth.Auth().payload["org"]
user_id = dl.auth.Auth().namespace

In [None]:
# Test start and end dates
start_date = "2023-10-15"
end_date = "2023-10-25"

## Creating the Output Product
First we'll create a new product in which we will write our results to, including a single NDVI band. 

In [None]:
pid = f"{org}:yakima-county-ndvi-{user_id}"

#### **_Note on Product Creation:_** 
We do not always need to delete and overwrite our product on every iteration as in the following cell. This notebook is designed for demonstration purposes, where we do not care about preserving each prior product. 

In practice, as long as your product has a **unique** ID you may ignore the next cell and skip to the following.

In [None]:
try:
    product = Product.get(pid)
    print("Product already exists, deleting old iteration")
    status = product.delete_related_objects()
    if status:
        status.wait_for_completion()
    print("Deleted related objects")
    product.delete()
    print("Deleted Product")
except:
    print("No Product exists")

Now we create the product:

In [None]:
# Create NDVI Catalog product
product = Product.get_or_create(
    id=pid,
    name="Yakima County NDVI",
)
product.tags = ["examples"]
product.save()
product

And a single band:

In [None]:
# Create NDVI band for product
band = SpectralBand.get_or_create(
    id=f"{product.id}:ndvi",
    band_index=0,
    data_type="Float64",
    nodata=0,
    data_range=(0.0, 1.0),
    display_range=(0.0, 0.4),
)
band.save()
print(f"Saved {band}")

In [None]:
print("Product status: ", product.state)  # check that product is 'saved'
print("Product ID: ", product.id)  # Get product ID that we will pass to function later

## Local Function Definition
Next we'll define our self-contained Python function to send to a Compute below. The general methodology is as follows:
* Inputs:
    * DLTile Key
    * Start Date
    * End Date
    * Output Product ID

* Steps:
    1. Create a DLTile object from our DLTile Key
    2. Search our Sentinel-2 Product for imagery intersecting our DLTile and date range
    3. Mosaic our resulting ImageCollection to retrieve the **red** and **nir** bands
    4. Calculate NDVI on our ndarray
    5. Create a new Image object
    6. Write our ndarray to our new Image via upload_ndarray

In [None]:
# NDVI method
def create_ndvi_image(tile_key, start_date, end_date, product_id):
    import descarteslabs as dl

    # Import Catalog module methods
    from descarteslabs.catalog import Image, Product, properties as p

    # Get DLTile GeoContext for AOI
    dltile = dl.geo.DLTile.from_key(tile_key)

    # Find Sentinel-2 imagery over AOI
    print("Searching for imagery")
    images = (
        Product.get("esa:sentinel-2:l2a:v1")
        .images()
        .intersects(dltile)
        .filter(start_date < p.acquired <= end_date)
        .filter(p.cloud_fraction < 0.1)
    ).collect()
    print(f"Found {len(images)} images")

    # Create stack of red and nir bands for NDVI calc
    mosaic, raster_info = images.mosaic(["nir", "red"], raster_info=True)

    nir = mosaic[0]
    red = mosaic[1]

    # Calculate NDVI
    print("Calculating NDVI")
    ndvi = (nir - red) / (nir + red)

    # Create image for upload
    ndvi_image = Image(
        name=tile_key.replace(":", "_"),
        geometry=dltile.geometry,
        product_id=product_id,
        acquired=end_date,
    )

    # Upload image to catalog product
    upload = ndvi_image.upload_ndarray(ndvi, raster_meta=raster_info, overwrite=True)

    print("NDVI image upload ID:", upload.id)

    upload.wait_for_completion()

    print("NDVI image ID:", ndvi_image.id)

    return ndvi_image.id

### Local Iteration
Now that we have our function, let's define our AOI to calculate NDVI over. Here we'll read in a local geojson file as a geodataframe:

In [None]:
import geopandas as gpd

yak = gpd.read_file("../catalog/data/yakima.geojson")
yak.plot(figsize=(5, 5))

Next we will create a list of tiles from our input geometry by [`DLTile.from_shape()`](https://docs.descarteslabs.com/descarteslabs/geo/readme.html#descarteslabs.geo.DLTile):

In [None]:
# Create bounding box to get DLTiles over
# Create DLTile GeoContext objects to iterate over for NDVI function
dltiles = dl.geo.DLTile.from_shape(
    yak.iloc[0]["geometry"], resolution=10.0, tilesize=4096, pad=0  # 30 meters
)
# Get list of DLTile keys
dltile_keys = [tile.key for tile in dltiles]

print("Number of DLTiles: ", len(dltiles))
print("Single DLTile example: ")
print(dltiles[0])

### Testing the Function
Let's test the NDVI method locally and see the images uploaded to the Catalog product we created:

In [None]:
# Submit request for NDVI image upload
image_id = create_ndvi_image(
    dltile_keys[0], product_id=product.id, start_date=start_date, end_date=end_date
)

Now that we've completed the first tile's upload locally, we should see a single image returned in our new product:

In [None]:
image = Image.get(image_id)
ndarr = image.ndarray("ndvi")
dl.utils.display(ndarr, size=5)

Note that our resulting ndarray is of the same shape as our DLTile!

In [None]:
ndarr.shape

### Create Compute Function
Now that we're happy with the results of our locally-run function, we can now create our Batch Compute Function. Here we will create a new Function object by passing in our Python function as the first input argument, with the following keyword arguments:
* __name__
* __image__, which should always be __python3.X:latest__ corresponding to your environment
* __cpus__, number of CPUs
* __memory__
* __timeout__, in seconds
* __maximum_concurrency__, or number of parallel Jobs running at a time
* __retry_count__, number of times to retry failed Jobs

For more information on __memory__ and __cpu__ combinations visit our [Documentation page](https://docs.descarteslabs.com/guides/quota.html)

In [None]:
async_func = Function(
    create_ndvi_image,
    name="NDVI Yakima County",
    image="python3.9:latest",
    cpus=0.25,
    memory=512,
    timeout=600,
    maximum_concurrency=25,
    retry_count=2,
)
async_func.save()

In [None]:
async_func.id

### Submit Individual Jobs to Compute

Now that we have a Function built, we can test submitting a Job. 

First, we'll generate our argument to pass into our Function:

In [None]:
# Get the DLTile key for the first tile
key = dltiles[0].key

Next we will create a single Job object by passing in our Function ID and args:

In [None]:
# Create the job function
job = async_func(key, start_date, end_date, product.id)

We can [`wait_for_completion()`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Job.wait_for_completion) programmatically if we choose. Note that it will take some time between creating the Function above, and completion of the first job.

    job.wait_for_completion()

Or visit our [Compute Monitor](https://app.descarteslabs.com/compute) to track our Function's progress.

### Submit Multiple Jobs
We can also submit multiple jobs to the same function. This is the most typical pattern for creating and running large numbers of jobs, and is more efficient than creating jobs one by one, unless there is non-trivial computation required to generate the arguments to your Function.

In [None]:
## Collect args to submit to Function
# Get a list of the DLTile keys
dltile_keys = [
    [tile.key] for tile in dltiles
]  # First iterable argument needs to be list of lists - List[List]

In [None]:
from itertools import repeat

# Submit multiple Jobs using map
jobs = async_func.map(
    dltile_keys,
    repeat(
        {"product_id": product.id, "start_date": start_date, "end_date": end_date},
        len(dltile_keys),
    ),
)
print(len(jobs))

### Waiting for Completion
Now that we've mapped our arguments to Jobs, we can wait for our Function to complete by either navigating to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) or programmatically via:

    for job in async_func.as_completed(jobs):
        print(job.result())

or:

    async_func.wait_for_completion()

### Verification

We can check for successful completion of all jobs, and verify our images exist.

In [None]:
async_func.refresh(include="job.statistics")
print(async_func.job_statistics)
print(product.images().count())

### Cleaning up

When we are done, it is always good hygiene to clean up!

In [None]:
# remove function and jobs
async_func.delete_jobs(delete_results=True)
async_func.delete()

# remove product and images
task = product.delete_related_objects()
if task:
    task.wait_for_completion()
product.delete()