Preparing the dataset
===

<hr style="border: 2px solid gray">

# Step 1: find dates and locations of sandstorms

[NASA WorldView](https://worldview.earthdata.nasa.gov/?v=42.53496283204216,20.198112059324743,73.78467362462987,34.23809673633836&l=Reference_Labels_15m(hidden),Reference_Features_15m(hidden),Coastlines_15m(hidden),VIIRS_SNPP_CorrectedReflectance_TrueColor&lg=true&t=2022-01-21-T13%3A06%3A55Z)

![Screenshot from 2024-05-09 15-51-52.png](attachment:188bad92-58ec-4678-abc3-2350eaf76ae2.png)

# Step 2: download imagery via [GIBS API](https://nasa-gibs.github.io/gibs-api-docs/)

In [None]:
from skimage import io
from rastervision.core.box import Box


def get_img(layer_name: str, date: str | None, bbox: Box, height: int,
            width: int, crs: str) -> 'np.ndarray':
    bbox_str = ','.join(map(str, bbox))
    args = dict(
        version='1.3.0',
        service='WMS',
        request='GetMap',
        format='image/png',
        STYLE='default',
        bbox=bbox_str,
        CRS=crs,
        HEIGHT=height,
        WIDTH=width,
        layers=layer_name,
    )
    if date is not None:
        args['TIME'] = date
    query_string = '&'.join(f'{k}={v}' for k, v in args.items())
    url = f'https://gibs.earthdata.nasa.gov/wms/epsg4326/best/wms.cgi?{query_string}'
    arr = io.imread(url)
    return arr

In [None]:
layer_name = 'VIIRS_SNPP_CorrectedReflectance_TrueColor'
bbox = Box(ymin=24, xmin=54, ymax=34, xmax=64)
date = '2022-01-21'

arr = get_img(
    layer_name=layer_name,
    date=date,
    bbox=bbox,
    height=1200,
    width=1200,
    crs='EPSG:4326',
)

In [None]:
from matplotlib import pyplot as plt

plt.imshow(arr)
plt.show()

In [None]:
from os.path import join
from rastervision.pipeline.file_system.utils import make_dir
from rastervision.core.data.utils import write_bbox

bbox_str = ','.join(map(str, bbox))
tiff_path = join('data/gibs/img/', f'{date}_{bbox_str}.tif')
make_dir(tiff_path, use_dirname=True)
write_bbox(tiff_path, arr, bbox, crs_wkt='epsg:4326')

# Step 3: label data

> Become one with the data

― Andrej Karpathy, [_A Recipe for Training Neural Networks_](https://karpathy.github.io/2019/04/25/recipe/)

![Screenshot from 2024-05-10 10-43-12.png](attachment:a2f4dbd7-6754-47eb-a599-fee569b07663.png)

![Screenshot from 2024-05-09 15-58-40.png](attachment:9f13014e-85e2-42b4-83f7-53bfe7fa19f3.png)

![Screenshot from 2024-05-09 16-01-24.png](attachment:38c57afc-2f0d-471f-8dd8-ddc2c3ec02cd.png)

## Training set

![centroids_train_labels.png](attachment:feabbba5-bbb7-4e07-8b19-ed915dbc16f3.png)

![dist_train_labels.png](attachment:d88a5c4e-2252-448b-b357-2306bbffe504.png)

## Test set

![centroids_test_labels.png](attachment:20805153-a099-4e7f-9dd1-f251ab3ecdec.png)

![dist_test_labels.png](attachment:b85b5403-b124-49d2-9c83-b983a6a94992.png)