# How to run clay over custom AOIs

This script shows in a few simple steps how the clay model can be run for custom AOIs and over custom date ranges.

## Download and open global list of MGRS tiles

In [None]:
import os
from pathlib import Path

# The repo home is our working directory.
wd = Path.cwd().parent
os.chdir(wd)
# Ensure data directories exist
Path("data/mgrs").mkdir(exist_ok=True)
Path("data/chips").mkdir(exist_ok=True)
Path("data/checkpoints").mkdir(exist_ok=True)
Path("data/embeddings").mkdir(exist_ok=True)

In [None]:
# get the MGRS grid
mgrs_file = Path("data/mgrs/mgrs_full.fgb")
if not mgrs_file.exists():
    print("Downloading MGRS grid...")
    !wget https://clay-mgrs-samples.s3.amazonaws.com/mgrs_full.fgb \
        -O data/mgrs/mgrs_full.fgb
    print("Done.")

In [None]:
import geopandas as gpd

mgrs = gpd.read_file("data/mgrs/mgrs_full.fgb")
print(f"Loaded {len(mgrs)} MGRS grid cells.")

## Create a Geopandas dataframe with AOI

> You can specify a geojson of a Point or Polygon. 

This example uses a string with a point Puri, India.

In [None]:
aoi_src = """{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "coordinates": [
          85.82371323098533,
          19.80224809538211
        ],
        "type": "Point"
      }
    }
  ]
}"""
aoi = gpd.read_file(aoi_src, driver="GeoJSON")

## Intersect the AOI with the MGRS tile layer

This will select the MGRS tiles that intersect with your AOI. The whole tile will be downloaded. Adjacent MGRS tiles do also overlap in most cases, so you will download some data that is not in your AOI.


Store the intersected tiles in a file, it will be used by the `datacube.py` script.

In [None]:
mgrs_aoi = mgrs.sjoin(aoi)
mgrs_aoi = mgrs_aoi.rename(columns={"Name": "name"}).reset_index(drop=True)

In [None]:
import contextily as ctx
import matplotlib.pyplot as plt

# print number of mgrs_aoi, and the area of the aoi and mgrs_aoi
print(f"Number of MGRS tiles covering AOI: {len(mgrs_aoi)}")
aoi_area = aoi.to_crs(epsg=3857).area[0] / 1e6
mgrs_area = mgrs_aoi.to_crs(epsg=3857).area.sum() / 1e6
print(f"Area of AOI: {aoi_area:.2f} sq km")
print(f"Area of MGRS tiles covering AOI: {mgrs_area:.2f} sq km")
print(f"Percentage of AOI covered by MGRS tiles: {100*mgrs_area/aoi_area:.2f}%")


fig, ax = plt.subplots(figsize=(10, 10))
mgrs_aoi.to_crs(epsg=3857).plot(ax=ax, color="blue", alpha=0.5, label="MGRS")
for idx, row in mgrs_aoi.to_crs(epsg=3857).iterrows():
    x = float(row.geometry.centroid.x)
    y = float(row.geometry.centroid.y)
    ax.annotate(text=row["name"], xy=(x, y), color="white")
aoi.to_crs(epsg=3857).plot(ax=ax, color="red", alpha=0.5, label="AOI")
x = float(aoi.to_crs(epsg=3857).geometry.centroid.x)
y = float(aoi.to_crs(epsg=3857).geometry.centroid.y)
ax.annotate(text="AOI", xy=(x, y), color="white")
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)
ax.set_axis_off()
plt.show()

In [None]:
mgrs_aoi.to_file("data/mgrs/mgrs_aoi.fgb")
mgrs_aoi.head()

## Use the datacube.py script to download imagery

Each run of th datacube script will take an index as input, which is the index of the geometry within the input file. This is why we need to download the data in a loop.

A list of date ranges can be specified. The script will look for the least cloudy Sentinel-2 scene for each date range, and match Sentinel-1 dates near the identified Sentinel-2 dates.

The output folder can be specified as a local folder, or a bucket can be specified to upload the data to S3.

Note: You may need to setup a Planetary Computer access key. Get in on the [PC developer portal](https://planetarycomputer.developer.azure-api.net/) and set it with 
```bash
planetarycomputer configure
```

In [None]:
# Print the help of the script to get a sense of the input parameters.
! python scripts/datacube.py --help

In the example below, we limit data volume to a pixel window of 1024x1024 pixels to speed up processing. With the subsetting, this should provide up to 4 image "chips" per MGRS tile and date range, depending on data availability. Remove the subset argument for a real use case, where all the data should be downloaded.

In [None]:
for index, row in mgrs_aoi.iterrows():
    print(index, row)
    ! python scripts/datacube.py \
        --sample data/mgrs/mgrs_aoi.fgb \
        --localpath data/chips  \
        --index {index} \
        --dateranges "2018-01-01/2018-04-01,2023-06-01/2023-12-15"
    # --subset 1500,1500,2524,2524 \

## Create the embeddings for each training chip

The checkpoints can be downloaded from huggingface.

In [None]:
model_file = Path("data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt")
model_url = "https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt"
if not model_file.exists():
    print("Downloading model...")
    !wget $model_url \
        -O $model_file
    print("Done.")

After downloading the model weights, embeddings can be created with just one command.

In [None]:
! wandb disabled
! python trainer.py predict \
    --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \
    --trainer.precision=16-mixed \
    --data.data_dir=data/chips

You should see your embeddings files in the folder `data/embeddings`. 

> Note that `chips` are stored one per file (many per MGRS tile), but `embeddings` are stored in a single file per MGRS tile. See [Documentation](https://clay-foundation.github.io/model/model_embeddings.html).
