# Batch Compute Timeseries Analysis
In this notebook we will demonstrate how one can utilize the `Batch Compute API` to scalable generate arbitrary timeseries statistics using `Catalog`. This notebook will define a `Function` that calculates daily Sentinel-2 NDVI statistics, masked to the Cropland Data Layer Hops class and summarized by week, in 4 major hop producing areas in the Northwestern United States.

The general methodology is as follows:
1. Read in GeoJSON of Census Tract boundary data and store onto `Descartes Labs Catalog`
2. Define a Compute Function that:
    * Takes a GEOID and date as input parameters
    * Searches Sentinel-2 scenes over the specified date ranges
    * Mosaics Sentinel-2 Images and the Cropland Data Layer as ndarrays
    * Calculates NDVI and masks to the Hops class
    * Returns the mean NDVI value for the specified date. If no data is present, return NaN
3. Submit arguments to the Compute Function and monitor the status
4. Retrieve the time series results as a pandas DataFrame

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import Blob, Image, Product, properties as p
from descarteslabs.compute import Function, Job

In [None]:
import json

import pandas as pd
import geopandas as gpd
import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

Setting global variables:

In [None]:
pid = "esa:sentinel-2:l2a:v1"
cdl_pid = "usda:cdl:v1"
hop_cdl_value = 56
start_date = "2022-10-01"
end_date = "2022-11-01"
func_name = f"Get NDVI Timeseries Values {datetime.today().strftime('%Y-%m-%d')}"
func_name

Reading in our input GeoJSON:

In [None]:
gdf = gpd.read_file("data/hop_tracts.geojson")
gdf.plot()

### Function Methodology
These next few cells parse out the methodology contained within our `Compute Function` which will be defined below. The general steps are as follows:
1. Create an `AOI` from our input geometry
2. Search for Sentinel-2 L2A `ImageCollection` using `Catalog API`, filtered to the provided AOI and date params
3. Create an `Image` object of the Cropland Data Layer's 2022 classification
4. Retrieve the associated `nir` and `red` bands for Sentinel-2 and `class` band for CDL as `ndarray`s
5. Calculate NDVI, mask to the hops class in the CDL array

Defining an `AOI` object out of our GeoJSON, alongside output raster metadata parameters (resolution, output CRS):

In [None]:
moxee_geom = gdf.loc[gdf["GEOID"] == "53077001702"]["geometry"].values[0]
aoi = dl.geo.AOI(moxee_geom, resolution=30.0, crs="EPSG:3857")
aoi

Searching `Catalog` for Sentinel-2 Imagery, chaining `intersects` and `filter` methods:

In [None]:
s2_ic = (
    Product.get(pid)
    .images()
    .intersects(aoi)
    .filter("2022-10-01" <= p.acquired <= "2022-10-02")
    .sort("acquired")
    .limit(None)
).collect()
s2_ic

Retrieve an `Image` of our 2022 Cropland Data Layer:

In [None]:
cdl_image = Image.get("usda:cdl:v1:meta_2022_30m_cdls_v1")
cdl_image

Retrieve our pixel data as `ndarray`s via `.mosaic` on our `ImageCollection` and `.ndarary` on our `Image`. Note, we are using the _same `AOI`_ in both cases, to ensure a matching geotransform:

In [None]:
s2_arr = s2_ic.mosaic(bands=["nir", "red"], data_type="Float32")

cdl_arr = cdl_image.ndarray(bands=["class"], geocontext=s2_ic.geocontext)

s2_arr.shape, cdl_arr.shape

Caluclate NDVI, mask to the Hop CDL value:

In [None]:
# Calculating NDVI
nir = s2_arr[0, :, :]
red = s2_arr[1, :, :]
cdl = cdl_arr[0, :, :]

ndvi = (nir - red) / (nir + red)

hop_ndvi_msk = np.ma.masked_where(cdl != hop_cdl_value, ndvi)

Plotting our results:

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 10))
ax[0].imshow(ndvi)
ax[0].set_title("NDVI")
ax[1].imshow(cdl, cmap="terrain")
ax[1].set_title("CDL")
ax[2].imshow(hop_ndvi_msk)
ax[2].set_title("Hop Masked NDVI")

### Preparing Batch Compute Function
Now that we're settled on a methodology, we can put together our Python function to pass to `Batch Compute`. First, we will store our GeoJSON GeoDataFrame to a `DL Storage Blob` for retrieval in our concurrent environments:

In [None]:
org = dl.auth.Auth().payload["org"]
try:
    # Create a new Blob object
    blob = Blob(
        name="hop_tracts",
        namespace="ndvi_timeseries_example_notebook",
        readers=[f"org:{org}"],
        tags=["examples"],
    )
    # Upload our DataFrame to this Blob:
    blob.upload_data(json.dumps(gdf.to_json()))
    blob.save()

except:
    # Already exists within your org
    blob = Blob.get(name="hop_tracts", namespace="ndvi_timeseries_example_notebook")
    print("Blob already exists")
blob

Next we will define our Python function. The steps have already been outlined in the above cells, just combined into a single function:

In [None]:
def return_ndvi(args):
    import numpy as np
    import geopandas as gpd

    from datetime import datetime, timedelta
    from json import dumps, loads

    import descarteslabs as dl
    from descarteslabs.catalog import Blob, Image, Product, properties as p

    # Unpack args
    geoid, date = args

    # Getting this and next date
    date = datetime.strptime(date, "%Y-%m-%d")
    next_date = date + timedelta(days=1)

    # Retrieve our GDF
    geom_blob = Blob.get(
        name="hop_tracts", namespace="ndvi_timeseries_example_notebook"
    )

    geom_data = loads(geom_blob.data())
    gdf = gpd.read_file(geom_data)

    print("Pulled down GDF from Storage Blob")

    in_geom = gdf[gdf["GEOID"] == geoid].iloc[0]["geometry"]

    # Create AOI from GDF
    aoi = dl.geo.AOI(in_geom, resolution=30.0, crs="EPSG:3857")
    # Search Sentinel2 for our date ranges
    pid = "esa:sentinel-2:l2a:v1"
    cdl_pid = "usda:cdl:v1"

    print(f"Searching {date.strftime('%Y-%m-%d')} to {next_date.strftime('%Y-%m-%d')}")

    s2_ic = (
        Product.get(pid)
        .images()
        .intersects(aoi)
        .filter(date <= p.acquired < next_date)
        .sort("acquired")
        .limit(None)
    ).collect()

    # End if we have no imagery satisfying our filter conditions:
    try:
        assert len(s2_ic) > 0
    except:
        result_dict = {
            "GEOID": geoid,
            "mean_ndvi": np.nan,
            "date": date.strftime("%Y-%m-%d"),
        }
        return result_dict

    print(f"Found {len(s2_ic)} images")

    # Get CDL Image
    cdl_image = Image.get("usda:cdl:v1:meta_2022_30m_cdls_v1")

    ##Mosaic and ndarray our image data to the _same GeoContext_
    s2_arr = s2_ic.mosaic(bands=["nir", "red"], data_type="Float32", geocontext=aoi)

    cdl_arr = cdl_image.ndarray(bands=["class"], geocontext=aoi)
    print("Rastered imagery")

    # Calculating NDVI, masking to Hop Class 56
    nir = s2_arr[0, :, :]
    red = s2_arr[1, :, :]
    cdl = cdl_arr[0, :, :]

    ndvi = (nir - red) / (nir + red)
    hop_ndvi_msk = np.ma.masked_where(cdl != 56, ndvi)

    # Returning mean
    mean_ndvi = float(hop_ndvi_msk.mean())

    print("Completed calculation")

    result_dict = {
        "GEOID": geoid,
        "mean_ndvi": mean_ndvi,
        "date": date.strftime("%Y-%m-%d"),
    }

    return result_dict

Next we'll define our input arguments for our `Compute Function`:
1. Generate a list of dates between our start and end dates
2. Generate a tuple of (GEOID, date) for each GEOID in our Census Tracts GDF

In [None]:
date_list = pd.date_range(start_date, end_date).strftime("%Y-%m-%d").tolist()

In [None]:
params = [(geoid, date) for geoid in gdf["GEOID"].tolist() for date in date_list]
len(params)

Let's test a run of this function locally, before we submit to `Compute`:

In [None]:
return_ndvi(params[10])

### Creating Batch Compute Function
We can now create our `Function` object. Below we will pass several scaling parameters and `save` our `Function`:

In [None]:
async_func = Function(
    return_ndvi,
    name=func_name,
    image="python3.9:latest",
    cpus=1,
    memory=2,
    timeout=900,
    maximum_concurrency=50,
    retry_count=2,
    requirements=["geopandas"],
)
async_func.save()
print(f"Saved {async_func.id}")

Now we can submit `Job`s to our `Function`:

In [None]:
jobs = []
for param in params:
    job = async_func(param)
    jobs.append(job)
len(jobs)

### Waiting for Completion
We now wait for our `Function` to complete. We can do that on the `Function`-level or the `Job` level:

async_func.wait_for_completion()

### Waiting for Completion
Now that we've mapped our arguments to `Job`s, we can wait for our `Function` to complete by either navigating to [app.descarteslabs.com/monitor](https://app.descarteslabs.com/monitor) or programmatically via:

In [None]:
from IPython.display import IFrame

IFrame("https://app.descarteslabs.com/monitor", width=1000, height=350)

### Retrieving Results
Once our `Function` is completed, we can retrieve our result dictionaries via `Storage Blob`s by structuring the resulting IDs:
* User Org
* User Hash
* Function ID
* Job ID

In [None]:
orgname = dl.auth.Auth().payload["org"]
user_hash = dl.auth.Auth().namespace

In [None]:
print(f"Results for {async_func.id}")
res_list = []
for b in (
    Blob.search()
    .filter(p.namespace == f"{orgname}:{user_hash}")
    .filter(p.name.startswith(async_func.id))
    .filter(p.storage_type == "compute")
):
    print(f"ID: {b.id}")
    res_list.append(json.loads(b.data()))

Voila!

In [None]:
res_df = pd.DataFrame(res_list)
fig, ax = plt.subplots()
res_df["date"] = pd.to_datetime(res_df["date"])

week_df = (
    res_df.groupby("GEOID")
    .resample("W-Mon", on="date")["mean_ndvi"]
    .mean()
    .reset_index()
)
for geoid, geoid_df in week_df.groupby("GEOID"):
    geoid_df.plot("date", "mean_ndvi", ax=ax, label=f"GEOID:{geoid}")
ax.set_title("NDVI");