# Deploying a Vessel Detection Pipeline with Batch Compute and Scheduled Events
__________________
#### _Objective:_
Demonstrate how the Platform can enable low-latency, event-driven image analysis to be deployed easily at scale. Here we deploy a sample scheduled pipeline which analyzes open-access [Sentinel-2 data](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2) with [an open source ship detection model hosted by geoai](https://github.com/opengeos/geoai). 

This pipeline will be deployed over [Alang, Gujarat, the largest shipbreaking yard in the world](https://www.chemistryworld.com/features/the-toxic-tide-of-ship-breaking/4015158.article).

#### _What will we cover?_
* Searching [`Catalog`](https://docs.descarteslabs.com/descarteslabs/catalog/readme.html) to retrieve imagery and input into a [pre-trained AI model](https://github.com/opengeos/geoai/blob/main/docs/examples/ship_detection.ipynb)
* Creating a  [`Function`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Function) which will trigger nightly at nidnight US Eastern Time and write detected vessels alongside other metadata into a [`Table`](https://docs.descarteslabs.com/descarteslabs/vector/readme.html)
* Defining an [`EventSubscription`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/event_subscription.html#descarteslabs.catalog.EventSubscription) which will invoke the function nightly

#### Requirements
Note that to use the ship detection model in this example, we must first install the [`geoai` library](https://github.com/opengeos/geoai/tree/main), which contains a series of open-source pre-trained AI models:

In [None]:
# %pip install geoai

In [None]:
import geoai

In [None]:
import descarteslabs as dl
from descarteslabs.catalog import (
    EventSchedule,
    ScheduledEventSubscription,
    EventSubscriptionComputeTarget,
    Placeholder,
    Product, 
    properties as p
)
from descarteslabs.compute import Function
from descarteslabs.vector import Table

In [None]:
import json
import os
import rasterio
import sys

import numpy as np
import geopandas as gpd
import pandas as pd

from datetime import timedelta
from dateutil import parser
import matplotlib.pyplot as plt

from rasterio.plot import reshape_as_image
from rasterio.transform import Affine
from shapely import wkt

_**Note:** For brevity, this helper function is imported. For reference, please see [utils.py](utils.py)_

In [None]:
from utils import create_s2_table

Setting global variables:

In [None]:
# For Batch Compute
major = sys.version_info.major
minor = sys.version_info.minor

In [None]:
auth = dl.auth.Auth.get_default_auth()
org = auth.payload['org']
user_hash = auth.namespace

The input Vector [`Table`](https://docs.descarteslabs.com/descarteslabs/vector/readme.html#descarteslabs.vector.Table) ID to write results to:

In [None]:
tid = f"geoai-vessel-detections-demo-table:{user_hash}"
tid = create_s2_table(tid)

An input AOI,such as a [well-known text polygon](https://shapely.readthedocs.io/en/2.0.4/manual.html#well-known-formats):

In [None]:
alang_wkt = '''POLYGON ((72.1633092891017 21.376538520693956, 72.17463606558397 21.395910851691355,72.18566940225975 21.402167375179133,
72.1952942278717 21.399271375754935, 72.18916133594212 21.37940764547419, 72.18602153002692 21.372849563342783, 72.16612631123255 21.374133877561107, 
72.1633092891017 21.376538520693956))'''
alang = wkt.loads(alang_wkt)
alang

And lastly, a sample date to monitor:

In [None]:
end = "2025-01-02"
start = parser.parse(end) - timedelta(days=1)
print(f"Searching imagery from {start.strftime('%Y-%m-%d')} to {end}")

## Methodology
Below, we can iterate on the methodology we want to deploy:
* Search and download Sentinel-2 Imagery as a geotiff image
* Instantiate our AI model
* Vectorize the thresholded results

In [None]:
sentinel2 = dl.catalog.Product.get("esa:sentinel-2:l2a:v1")

ic = (
    sentinel2.images()
    .intersects(alang)
    .filter(start < p.acquired < end)
    .filter(p.cloud_fraction < 0.3)
).collect()

ic.download_mosaic(['red', 'green', 'blue'], dest="data/rgb.tif")

And inspect our scene:

In [None]:
with rasterio.open("data/rgb.tif", "r") as in_ds:
    plt.imshow(reshape_as_image(in_ds.read()))

Here, we import the [geoai Ship Detection model](https://github.com/opengeos/geoai/blob/main/docs/examples/ship_detection.ipynb): 

In [None]:
detector = geoai.ShipDetector()

And infer on our input image:

In [None]:
masks_path = detector.generate_masks(
    "data/rgb.tif",
    output_path="data/msk_outputs.tif",
    confidence_threshold=0.8,
    mask_threshold=0.7,
    overlap=0.25,
    chip_size=(256, 256),
    batch_size=4
)

Finally, vectorize our predicted feature masks:

In [None]:
gdf = detector.vectorize_masks(
    "data/msk_outputs.tif",
    output_path="data/ships_masks.geojson",
    confidence_threshold=0.8,
    min_object_area=10,
    max_object_size=10000,
)
gdf.plot()

## Scaling with Batch Compute
Here we define a local function to send to our compute service which:
* Accepts a date as the input argument
* Pulls down any newly acquired image data and applies the vessel detection methodology
* Writes the output geodataframe as input [`Feature`](https://docs.descarteslabs.com/descarteslabs/vector/readme.html#descarteslabs.vector.Feature)`s to a table

In [None]:
def s2_vessel_deployment(end, write_to_table=True):
    import descarteslabs as dl
    from descarteslabs.vector import Table
    from descarteslabs.catalog import Product, properties as p
    import os
    import geoai
    import datetime
    import json
    import numpy as np
    import geopandas as gpd
    from dateutil import parser
    from shapely import wkt

    auth = dl.auth.Auth.get_default_auth()
    org = auth.payload['org']
    user_hash = auth.namespace
    
    end = parser.parse(end)
    start = end - datetime.timedelta(days=1)
    print(f"Searching imagery from {start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}")
    alang_wkt = '''POLYGON ((72.1633092891017 21.376538520693956, 72.17463606558397 21.395910851691355, 72.18566940225975
    21.402167375179133, 72.1952942278717 21.399271375754935, 72.18916133594212 21.37940764547419, 72.18602153002692 21.372849563342783,
    72.16612631123255 21.374133877561107, 72.1633092891017 21.376538520693956))'''
    alang = wkt.loads(alang_wkt)
    
    sentinel2 = dl.catalog.Product.get("esa:sentinel-2:l2a:v1")
    tid = f"{org or user_hash}:geoai-vessel-detections-demo-table:{user_hash}"
    
    ic = (
        sentinel2.images()
        .intersects(alang)
        .filter(start < p.acquired < end)
        .filter(p.cloud_fraction < 0.3)
    ).collect()
    print(f"Found {len(ic)} images today")
    try:
        assert len(ic)>0
    except:
        print("No images today")
        return None
        return
    
    ic.download_mosaic(
        ["red", "green", "blue"],
        dest="rgb.tif"
    )
    print("Downloaded image")
    detector = geoai.ShipDetector()
    print("Downloaded model")
    masks_path = detector.generate_masks(
        "rgb.tif",
        output_path="msk_outputs.tif",
        confidence_threshold=0.8,
        mask_threshold=0.7, overlap=0.25,
        chip_size=(256, 256),
        batch_size=4
    )
    print("Completed feature masks")
    gdf = detector.vectorize_masks(
        "msk_outputs.tif",
        output_path="ships_masks.geojson",
        confidence_threshold=0.8,
        min_object_area=10,
        max_object_size=10000,
    ).to_crs(4326)
    print(f"Vectorized {len(gdf)} ships today")
    gdf['date']=end.strftime("%Y-%m-%d")
    if write_to_table:
        gdf = Table.get(tid).add(gdf[['date', 'confidence', 'geometry']])
    # Cleaning up
    os.remove("rgb.tif")
    os.remove("msk_outputs.tif")
    os.remove("ships_masks.geojson")
    print("Complete:)")
    return gdf.to_json()

Next, we'll test the methodology out locally (without writing the output rows to the table):

In [None]:
s2_vessel_deployment(end, write_to_table=False)

And lastly submit our compute function alongside several scaling parameters, such as:
* Number of **CPUs**
* **Memory** allocated to each job
* **Max Concurrency** of running jobs
* **Timeout** if errors occur
* **Requirements** for extra packages to import, such as `geoai`

_**Note** Some dependency packages may result in longer-to-build or difficult-to-solve environments. Try hard-coding in package versions such as in the below example if you run into trouble building your Compute Function._

In [None]:
async_func = Function(
    s2_vessel_deployment,
    name="Test Vessel Detector S2 Pipeline",
    image=f"python{major}.{minor}:latest",
    cpus=4,
    memory=8,
    maximum_concurrency=20,
    timeout=300,
    retry_count=0,
    requirements=[
        'geoai-py',
        'torch==2.6.0'
    ]
)
async_func.save()
print(f"Created: {async_func.id}")

Here, we'll submit our first [`Job`](https://docs.descarteslabs.com/descarteslabs/compute/readme.html#descarteslabs.compute.Job) to get things started:

In [None]:
job = async_func(end)
job

And pass a few more dates to generate a "backlog":

In [None]:
ic = (sentinel2.images()
      .intersects(alang)
      .filter("2025-01-02"<p.acquired<"2025-03-01")
      .filter(p.cloud_fraction<0.3)
     ).collect()
date_list = list(set(ic.each.acquired.strftime("%Y-%m-%d")))
len(date_list)

In [None]:
jobs = async_func.map([(d,) for d in date_list])
len(jobs)

### Tracking Running Functions
Now, it is advised to navigate to [app.descarteslabs.com/compute](https://app.descarteslabs.com/compute) to track your function's build progress and any active running jobs. We will refer to this page for the remainder of the notebook. 

## Scheduling Events
Now that a function is defined with predefined inputs, we can set up an [`EventSubscription`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/event_subscription.html#descarteslabs.catalog.EventSubscription) which can be configured to fire at user-specified intervals of time. Here, we will:
* Create an [`EventSchedule`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/event_schedule.html#descarteslabs.catalog.EventSchedule) which defines a nightly schedule at midnight US Eastern Time
* Create a [`ScheduledEventSubscription`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/event_subscription.html#descarteslabs.catalog.ScheduledEventSubscription) which passes these nightly triggers to our running function

First, we'll clear any old subscriptions that may have the same name (if you've run this notebook in the past!):

In [None]:
for schedule in EventSchedule.search().filter(p.owners==f"user:{user_hash}").collect():
    print(f"Deleting old {schedule.id}")
    schedule.delete()
for subscription in (ScheduledEventSubscription.search()
                     .filter(p.owners==f"user:{user_hash}")
                     .filter(p.event_type=="scheduled")
                    ).collect():
    print(f"Deleting old {subscription.id}")
    subscription.delete()

Next, we'll create an [`EventSchedule`'](https://docs.descarteslabs.com/descarteslabs/catalog/docs/event_schedule.html#descarteslabs.catalog.EventSchedule), which accepts:
* **Schedule**, a [cron or rate expression](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-scheduled-rule-pattern.html)
* **Schedule Timezone** when you want your events to fire, if cron

In [None]:
# create a schedule to fire the daily at midnight NYC
schedule = EventSchedule(
    name="schedule_daily_s2_geoai",
    namespace=f"{org or user_has}:demo-scheduler",
    schedule="cron(0 0 * * ? *)",
    schedule_timezone="America/New_York",
)
schedule.save()
print(f"Saved Schedule: {schedule.id}")

In [None]:
subscription = ScheduledEventSubscription(
    schedule.id,
    name="s2_nightly_event_schedule",
    targets=[
        EventSubscriptionComputeTarget(
            async_func.id,
            Placeholder("event.detail.scheduled_time"),
            title="Nightly S2 Image Search",
        )
    ]
)
subscription.save()
print(f"Saved Subscription: {subscription.id}")

## Deploying the Scheduled Event
Now, we have effectively deployed our nightly scheduled pipeline. To check on your running jobs, you can view them in the [Compute Monitor](https://app.descarteslabs.com/compute):

In [None]:
print(len(async_func.jobs.collect()))

Or we could wait on them programmatically

    async_func.wait_for_completion()

Once complete, we can collect the resulting table's data as a geodataframe for export:

In [None]:
res_gdf = Table.get(tid).collect()
res_gdf.plot(column='date', legend=True, legend_kwds={'loc': 'upper left'}, figsize=(10,10))

Or save for export:

In [None]:
gdf.to_parquet("data/ship_detections.geoparquet")

### Cleaning Up
_Is always best practice!_

In [None]:
schedule.delete()
subscription.delete()

In [None]:
async_func.delete_jobs(delete_results=True)
async_func.delete()

In [None]:
os.remove("data/rgb.tif")
os.remove("data/msk_outputs.tif")
os.remove("data/ships_masks.geojson")
os.remove("data/ship_detections.geoparquet")