# South Sudan Data Precalculations

This notebook is used to precalculate data at different regions of South Sudan. 

## Data Hierarchy

The data is organized in the following hierarchy:

- [This](https://docs.google.com/spreadsheets/d/1RdJCjygAiWu2zBMGRF0ayigzrA2WhaObWMNdlkllgSQ/edit?usp=sharing) is the link to the data hierarchy spreadsheet.

## Data Access
The data is stored in the following Google Cloud Storage bucket:
- https://console.cloud.google.com/storage/browser/wbhydross_deliverables


## Setup

### Library import


In [9]:
# imports
import json
import sys
from pprint import pprint

from dask.distributed import Client

# Include local library paths if you have ../src/utils.py
sys.path.append("../src/")
sys.path.append("../src/datasets")
sys.path.append("../src/datasets/factory")

from datasets.datasets import dataset_database
from zonal_statistics import ZonalStatistics

**Start Dask Client for Dashboard**

In [4]:
client = Client()
client  # noqa: B018

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 14.98 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:37177,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 16
Started: Just now,Total memory: 14.98 GiB

0,1
Comm: tcp://127.0.0.1:40879,Total threads: 4
Dashboard: http://127.0.0.1:44863/status,Memory: 3.74 GiB
Nanny: tcp://127.0.0.1:44325,
Local directory: /tmp/dask-scratch-space/worker-2nvyts25,Local directory: /tmp/dask-scratch-space/worker-2nvyts25

0,1
Comm: tcp://127.0.0.1:41275,Total threads: 4
Dashboard: http://127.0.0.1:46855/status,Memory: 3.74 GiB
Nanny: tcp://127.0.0.1:33625,
Local directory: /tmp/dask-scratch-space/worker-4e8zln_q,Local directory: /tmp/dask-scratch-space/worker-4e8zln_q

0,1
Comm: tcp://127.0.0.1:34125,Total threads: 4
Dashboard: http://127.0.0.1:39303/status,Memory: 3.74 GiB
Nanny: tcp://127.0.0.1:41185,
Local directory: /tmp/dask-scratch-space/worker-41z0ei8p,Local directory: /tmp/dask-scratch-space/worker-41z0ei8p

0,1
Comm: tcp://127.0.0.1:46051,Total threads: 4
Dashboard: http://127.0.0.1:44381/status,Memory: 3.74 GiB
Nanny: tcp://127.0.0.1:45061,
Local directory: /tmp/dask-scratch-space/worker-9cdplobo,Local directory: /tmp/dask-scratch-space/worker-9cdplobo


### Utils

In [7]:
def print_dict(d):
    """
    Print a dictionary with indentation.
    """
    print(json.dumps(d, indent=2))

# Data Acquisition

## Dataset information

In [18]:
datasets = dataset_database.datasets()
pprint(datasets)

{'Administrative Boundaries': <datasets.datasets.Dataset object at 0x7faa00195940>,
 'Hydrological Basins': <datasets.datasets.Dataset object at 0x7faa2c7014f0>,
 'Meteorological Data': <datasets.datasets.Dataset object at 0x7fa9e9016ea0>,
 'Model-based flood hazard': <datasets.datasets.Dataset object at 0x7fa9e90162d0>}


## Vector Data
**Load the vector data**

In [20]:
dataset = datasets.get("Administrative Boundaries")
vector_layers = dataset.layers()
for layer_name, layer in vector_layers.items():
    print(layer_name)
    print_dict(layer.to_dict())

{
  "name": "adm0",
  "type": "vector",
  "format": "Shapefile",
  "url": "https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm0_4326_SouthSudan_20230829_20240228.shp"
}
{
  "name": "adm1",
  "type": "vector",
  "format": "Shapefile",
  "url": "https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm1_4326_SouthSudan_20230829_20240228.shp"
}
{
  "name": "adm2",
  "type": "vector",
  "format": "Shapefile",
  "url": "https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm2_4326_SouthSudan_20230829_20240228.shp"
}
{
  "name": "adm3",
  "type": "vector",
  "format": "Shapefile",
  "url": "https://st

In [21]:
layer = vector_layers["adm1"]
gdf = layer.get_data()
gdf.head()

Loading data from https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm1_4326_SouthSudan_20230829_20240228.shp...


Unnamed: 0,index,level,adm0_en,adm0_pcode,adm1_en,adm1_pcode,geometry
0,0,1,South Sudan,SS,Central Equatoria,SS01,"POLYGON ((32.13885 4.70399, 32.12436 4.59871, ..."
1,1,1,South Sudan,SS,Eastern Equatoria,SS02,"POLYGON ((35.09476 5.7436, 35.1084 5.71964, 35..."
2,2,1,South Sudan,SS,Jonglei,SS03,"POLYGON ((30.55928 9.50145, 30.56094 9.50109, ..."
3,3,1,South Sudan,SS,Lakes,SS04,"POLYGON ((30.47487 7.07277, 30.53654 7.06721, ..."
4,4,1,South Sudan,SS,Northern Bahr el Ghazal,SS05,"POLYGON ((26.69631 9.49, 26.70392 9.48833, 26...."


## Raster Data
**Load the raster data**

In [23]:
dataset = datasets.get("Meteorological Data")
raster_layers = dataset.layers()
for layer_name, layer in raster_layers.items():
    print(layer_name)
    print_dict(layer.to_dict())

{
  "name": "Precipitation",
  "type": "raster",
  "format": "Zarr",
  "url": "gs://wbhydross_deliverables/D3-Database/02- Meteorological datasets/Rainfall-CHIRPS/WBHYDROSSD_CHIRPS_5km_Precipitation_SouthSudan_1981_2022_20240425.zarr"
}
{
  "name": "Temperature",
  "type": "raster",
  "format": "Zarr",
  "url": "gs://wbhydross_deliverables/D3-Database/02- Meteorological datasets/Temperature-ERA5Land/WBHYDROSSD_ERA5-Land_t2m-D-mean_South_Sudan-EPSG4326_19500101_20231231_20240209091716.zarr"
}


In [24]:
layer = raster_layers["Precipitation"]
ds = layer.get_data()
ds

Loading Zarr data from gs://wbhydross_deliverables/D3-Database/02- Meteorological datasets/Rainfall-CHIRPS/WBHYDROSSD_CHIRPS_5km_Precipitation_SouthSudan_1981_2022_20240425.zarr...


Unnamed: 0,Array,Chunk
Bytes,2.19 MiB,3.00 kiB
Shape,"(12, 191, 251)","(1, 24, 32)"
Dask graph,768 chunks in 67 graph layers,768 chunks in 67 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 2.19 MiB 3.00 kiB Shape (12, 191, 251) (1, 24, 32) Dask graph 768 chunks in 67 graph layers Data type float32 numpy.ndarray",251  191  12,

Unnamed: 0,Array,Chunk
Bytes,2.19 MiB,3.00 kiB
Shape,"(12, 191, 251)","(1, 24, 32)"
Dask graph,768 chunks in 67 graph layers,768 chunks in 67 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


# Zonal statistics

In [26]:
raster_metadata = {
    "Precipitation": {"variable": "precipitation_amount", "time_coord": "month", "unit": "mm"},
    "Temperature": {"variable": "t2m", "time_coord": "month", "unit": "ºC"},
}

for raster_name, raster_layer in raster_layers.items():
    print("Computing zonal statistics for raster:", raster_name)
    metadata = raster_metadata[raster_name]
    raster_data = raster_layer.get_data().compute()
    for vector_name, vector_layer in vector_layers.items():
        print("  - Using vector dataset:", vector_name)
        vector_data = vector_layer.get_data()

        zonal_statistics = ZonalStatistics(
            raster_data=raster_data,
            vector_data=vector_data,
            time_coord=metadata["time_coord"],
            unit=metadata["unit"],
        )

        df = zonal_statistics.compute()

        # Save the results
        raster_name = raster_name.replace(" ", "_")
        vector_name = vector_name.lower().replace(" ", "_").replace("(", "").replace(")", "")
        df.to_csv(f"../data/processed/{raster_name}_{vector_name}.csv", index=False)

Computing zonal statistics for raster: Precipitation
Loading Zarr data from gs://wbhydross_deliverables/D3-Database/02- Meteorological datasets/Rainfall-CHIRPS/WBHYDROSSD_CHIRPS_5km_Precipitation_SouthSudan_1981_2022_20240425.zarr...
  - Using vector dataset: adm0
Loading data from https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm0_4326_SouthSudan_20230829_20240228.shp...
  - Using vector dataset: adm1
Loading data from https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_SubnationalAdministrativeBoundaries-adm1_4326_SouthSudan_20230829_20240228.shp...
  - Using vector dataset: adm2
Loading data from https://storage.googleapis.com/wbhydross_deliverables/D3-Database/00-%20Ancillary%20Layers/OCHA-SubnationalAdministrativeBoundaries/WBHYDROSSD_OCHA_Subnational