# South Sudan Data Layers

This notebook is used to prepare the data layers for South Sudan. The data layers will be processed and upload to Mapbox.

## Data Hierarchy

The data is organized in the following hierarchy:

- [This](https://docs.google.com/spreadsheets/d/1RdJCjygAiWu2zBMGRF0ayigzrA2WhaObWMNdlkllgSQ/edit?usp=sharing) is the link to the data hierarchy spreadsheet.

## Data Access
**Input Data**

The data is stored in the following Google Cloud Storage bucket (source from GMV):
- https://console.cloud.google.com/storage/browser/wbhydross_deliverables

**Output Data**
- Raster layers: s3 bucket
- Vector layers: Mapbox


## Setup

### Library import


In [1]:
# imports
import os
import sys
from pathlib import Path
from pprint import pprint

# Include local library paths if you have ../src/utils.py
sys.path.append("../src/")
sys.path.append("../src/datasets")
sys.path.append("../src/helpers")
sys.path.append("../src/datasets/factory")

from datasets.datasets import dataset_database
from datasets.processing import LayerProcessing
from helpers.mapbox_uploader import upload_to_mapbox
from helpers.s3_uploader import upload_directory_to_s3
from helpers.settings import get_settings

In [2]:
# Load settings with environment variables
settings = get_settings()

# Data Acquisition

## Dataset information

In [3]:
datasets = dataset_database.datasets()
pprint(datasets)

{'Agricultural drought exposure': <datasets.datasets.Dataset object at 0x7f8c35ef17f0>,
 'Agricultural drought hazard': <datasets.datasets.Dataset object at 0x7f8c35ef1760>,
 'Boundaries': <datasets.datasets.Dataset object at 0x7f8c35ef1790>,
 'Contextual layers': <datasets.datasets.Dataset object at 0x7f8c35ef1730>,
 'EO-based flood exposure': <datasets.datasets.Dataset object at 0x7f8c35ef1700>,
 'EO-based flood hazard': <datasets.datasets.Dataset object at 0x7f8c35ef16d0>,
 'Hydrographic data': <datasets.datasets.Dataset object at 0x7f8c35ef16a0>,
 'Hydrometeorological Data': <datasets.datasets.Dataset object at 0x7f8c35ef1670>,
 'Meteorological drought exposure': <datasets.datasets.Dataset object at 0x7f8c35ef1640>,
 'Meteorological drought hazard': <datasets.datasets.Dataset object at 0x7f8c35ef1610>,
 'Model-based flood exposure': <datasets.datasets.Dataset object at 0x7f8c35ef15e0>,
 'Model-based flood hazard': <datasets.datasets.Dataset object at 0x7f8c35ef1520>,
 'Populated in

## Floods and Droughts Layers
### Create layers

In [4]:
datasets_list = [
    "Model-based flood hazard",
    "EO-based flood hazard",
    "Meteorological drought hazard",
    "Agricultural drought hazard",
    "Model-based flood exposure",
    "EO-based flood exposure",
    "Meteorological drought exposure",
    "Agricultural drought exposure",
]

dict_path = "../data/processed/datasets_dict.json"

layer_processing = LayerProcessing(datasets, datasets_list, dict_path)
layer_processing.create_layers()

  0%|          | 0/8 [00:00<?, ?it/s]

Model-based flood hazard


100%|██████████| 10/10 [00:00<00:00, 105384.52it/s]


EO-based flood hazard


100%|██████████| 2/2 [00:00<00:00, 57456.22it/s]


Meteorological drought hazard


100%|██████████| 1/1 [00:00<00:00, 39568.91it/s]


Agricultural drought hazard


INFO:helpers.raster_processor:Applying styles
Application path not initialized
Application path not initialized
Application path not initialized
Application path not initialized


Processing Combined SNDVI and SMA indices from Agricultural drought hazard


Application path not initialized
Application path not initialized
INFO:helpers.raster_processor:Converting to GeoTIFF
INFO:helpers.raster_processor:Converting to Cloud-Optimized GeoTIFF
Reading input: /home/iker/Vizzuality/Proiektuak/wims-south-sudan/data-processing/data/processed/RasterLayers/ADH_combined_sndvi_and_sma_indices.tif

Adding overviews...
Updating dataset tags...
Writing output to: /home/iker/Vizzuality/Proiektuak/wims-south-sudan/data-processing/data/processed/RasterLayers/ADH_combined_sndvi_and_sma_indices.tif
INFO:helpers.raster_processor:Processing complete. Output saved to ../data/processed/RasterLayers/ADH_combined_sndvi_and_sma_indices.tif


Creating tiles ...


100%|██████████| 1/1 [02:43<00:00, 163.32s/it]
 50%|█████     | 4/8 [02:43<02:43, 40.83s/it]

Model-based flood exposure


100%|██████████| 20/20 [00:00<00:00, 671088.64it/s]


EO-based flood exposure


100%|██████████| 10/10 [00:00<00:00, 391991.03it/s]


Meteorological drought exposure


100%|██████████| 8/8 [00:00<00:00, 316551.25it/s]


Agricultural drought exposure


100%|██████████| 8/8 [00:00<00:00, 316551.25it/s]
100%|██████████| 8/8 [02:43<00:00, 20.42s/it]


### Raster layers

#### Upload raster tiles to S3 bucket

In [None]:
directory_path = Path("../data/processed/RasterTiles/")
bucket_name = "wims-ss-staging-assets-bucket"
bucket_folder = "raster-tiles"

all_folders = os.listdir(directory_path)

for folder in all_folders:
    local_directory = directory_path / Path(folder)
    if os.path.isdir(local_directory):
        upload_directory_to_s3(local_directory, bucket_name, f"{bucket_folder}/{folder}")
        print(folder)

### Vector layers

#### Upload layers to Mapbox

In [5]:
directory_path = Path("../data/processed/VectorLayers/")

all_files = os.listdir(directory_path)

for file_name in all_files:
    local_file = directory_path / Path(file_name)

    # Upload to Mapbox
    upload_name = upload_to_mapbox(
        local_file,
        file_name,
        settings.MAPBOX_USER,
        settings.MAPBOX_TOKEN,
    )

## Animated raster data

In [None]:
raster_layers = {}
for dataset_name, dataset in datasets.items():
    layers = dataset.layers()
    dataset_layers = {}
    for layer_name, layer in layers.items():
        if layer.type == "raster" and layer.format == "Zarr":
            dataset_layers[layer_name] = layer
    if dataset_layers:
        raster_layers[dataset_name] = dataset_layers

raster_layers

In [None]:
layer = raster_layers["Hydrometeorological Data"]["Evapotranspiration"]
ds = layer.get_data()

In [None]:
ds

In [None]:
data = ds.isel(months=0)
data = data.rio.write_crs("EPSG:4326")
# Save as COG
output_path = "../data/processed/evapotranspiration.tif"
data.rio.to_raster(output_path, driver="COG")