# How to run clay over custom AOIs

This script shows in a few simple steps how the clay model can be run for custom AOIs and over custom date ranges.

## Download and open global list of MGRS tiles

In [None]:
import os
from pathlib import Path

# The repo home is our working directory.
wd = Path.cwd().parent
os.chdir(wd)
# Ensure data directories exist
Path("data/mgrs").mkdir(exist_ok=True)
Path("data/chips").mkdir(exist_ok=True)
Path("data/checkpoints").mkdir(exist_ok=True)
Path("data/embeddings").mkdir(exist_ok=True)

In [None]:
import geopandas as gpd

!wget https://clay-mgrs-samples.s3.amazonaws.com/mgrs_full.fgb \
    -O data/mgrs/mgrs_full.fgb

In [None]:
mgrs = gpd.read_file("data/mgrs/mgrs_full.fgb")
mgrs

In [None]:
mgrs.crs

## Create a Geopandas dataframe with AOI

This example uses a string with a single polygon over the area around Puri, India.

In [None]:
aoi_src = """{
    "type": "FeatureCollection",
    "name": "puri",
    "crs": {
        "type": "name",
        "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" }
    },
    "features": [
        {
            "type": "Feature",
            "properties": {
                "Region": "Puri"
            },
            "geometry": {
                "type": "Polygon",
                "coordinates": [ [
                    [ 85.050328768992927, 19.494998302498729 ],
                    [ 85.169749958884921, 19.441346370958311 ],
                    [ 85.968028516820738, 19.799945705366934 ],
                    [ 86.029742823006544, 20.15322442052609 ],
                    [ 86.104280881127053, 20.516965603502392 ],
                    [ 85.404584916189307, 20.564248936237991 ],
                    [ 84.945334300027469, 20.339711117597215 ],
                    [ 84.816295296184407, 19.799945705366934 ],
                    [ 85.050328768992927, 19.494998302498729 ]
                ] ]
            }
        }
    ]
}"""
aoi = gpd.read_file(aoi_src, driver="GeoJSON")
aoi.geometry[0]

## Intersect the AOI with the MGRS tile layer

This will select the MGRS tiles that intersect with your AOI. The processing will then happen for each of the MGRS tiles. This will most likely provide slightly more data than the AOI itself, as the whole tile data will downloaded for each matched MGRS tile.

Store the intersected tiles in a file, it will be used by the `datacube.py` script.

In [None]:
mgrs_aoi = mgrs.overlay(aoi)
# Rename the name column to use lowercase letters for the datacube script to
# pick upthe MGRS tile name.
mgrs_aoi = mgrs_aoi.rename(columns={"Name": "name"})
mgrs_aoi.geometry[2]

In [None]:
mgrs_aoi.to_file("data/mgrs/mgrs_aoi.fgb")
mgrs_aoi

## Use the datacube.py script to download imagery

Each run of th datacube script will take an index as input, which is the index of the geometry within the input file. This is why we need to download the data in a loop.

A list of date ranges can be specified. The script will look for the least cloudy Sentinel-2 scene for each date range, and match Sentinel-1 dates near the identified Sentinel-2 dates.

The output folder can be specified as a local folder, or a bucket can be specified to upload the data to S3.

In [None]:
# Print the help of the script to get a sense of the input parameters.
! python scripts/datacube.py --help

In the example below, we limit data volume to a pixel window of 1024x1024 pixels to speed up processing. With the subsetting, this should provide up to 4 image "chips" per MGRS tile and date range, depending on data availability. Remove the subset argument for a real use case, where all the data should be downloaded.

In [None]:
for index, row in mgrs_aoi.iterrows():
    print(index, row)
    ! python scripts/datacube.py \
        --sample data/mgrs/mgrs_aoi.fgb \
        --subset 1500,1500,2524,2524 \
        --localpath data/chips  \
        --index {index} \
        --dateranges 2020-01-01/2020-04-01,2021-06-01/2021-09-15

## Create the embeddings for each training chip

The checkpoints can be downloaded from huggingface.

In [None]:
! wget \
https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt?download=true\
-O data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt

After downloading the model weights, embeddings can be created with just one command.

In [None]:
! wandb disabled
! python trainer.py predict \
    --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \
    --trainer.precision=16-mixed \
    --data.data_dir=/home/tam/Desktop/aoitiles \
    --data.batch_size=2 \
    --data.num_workers=8