# Download and process soil data into single geotiffs at native resolution
Code based on: https://git.wur.nl/isric/soilgrids/soilgrids.notebooks/-/blob/master/markdown/webdav_from_Python.md

Map categories from http://maps.isric.org/:
- `wrb` (WRB classes and probabilities)
- `bdod` (bulk density)
- `cec` (Cation exchange capacity at ph 7)
- `cfvo` (Coarse fragments volumetric)
- `clay` (Clay content)
- `nitrogen` (Nitrogen)
- `phh2o` (Soil pH in H2O)
- `sand` (Sand content)
- `silt` (Silt content)
- `soc` (Soil organic carbon content)
- `ocs` (Soil organic carbon stock)
- `ocd` (Organic carbon densities)

Depths:
- `0-5cm`
- `5-15cm`
- `15-30cm`
- `30-60cm`
- `60-100cm`
- `100-200cm`

Layers per category per depth:
- `mean`
- `Q0.05`
- `Q0.5`
- `Q0.95`
- `uncertainty`

Construct full layer identifier as `[category]_[depth]_[layer]`, e.g. `phh2o_0-05cm_mean`.

In [1]:
import os
import shutil
import sys
from pathlib import Path
sys.path.append(str(Path().absolute().parent))
import python_cs_functions as cs

### Config handling                                                                                        

In [2]:
# Specify where the config file can be found
config_file = '../0_config/config.txt'

In [3]:
# Get the required info from the config file
data_path            = cs.read_from_config(config_file,'data_path')
geospatial_temp_path = cs.read_from_config(config_file,'geospatial_temp_path')
soil_path            = cs.read_from_config(config_file,'soil_path')
soil_url             = cs.read_from_config(config_file,'soil_url')
download_area        = cs.read_from_config(config_file,'geospatial_area')

### Convert geospatial coordinates to specific subsetting coordinates for this data product

In [4]:
subset_window = cs.geospatial_coordinates_to_download_coordinates(download_area, 'soilgrids')

Returning coordinates as type <class 'list'> for use with soilgrids download code.


### Initial folder settings

In [5]:
download_folder = Path(data_path) / geospatial_temp_path / 'soilgrids' / 'download' # downloads go into subfolders here
raw_folder =  Path(data_path) / geospatial_temp_path / 'soilgrids' / 'raw' # final outputs go into subfolders here

In [6]:
download_folder.mkdir(parents=True, exist_ok=True)

In [7]:
raw_folder.mkdir(parents=True, exist_ok=True)

### Processing
Given the workflow (download global data first at ~4.5GB, subset to domain of interest second at ~500MB) it makes sense to do the whole thing in one big loop and limit disk space usage that way.

In [8]:
# Create the download lists
fields = ['bdod','cec','cfvo','clay','nitrogen','phh2o','sand','silt','soc','ocs','ocd'] # wrb and landmask are have different subfolders
depths = ['0-5cm','5-15cm','15-30cm','30-60cm','60-100cm','100-200cm']
layers = ['mean','Q0.05','Q0.5','Q0.95','uncertainty']

In [9]:
# Create the download lists
fields = ['bdod','cfvo','clay','sand','silt','soc'] # wrb and landmask are have different subfolders
depths = ['0-5cm','5-15cm','15-30cm','30-60cm','60-100cm','100-200cm']
layers = ['mean']

In [10]:
for field in fields:
    for depth in depths:
        for layer in layers:

            # Prepare the folders and file names
            product = f'{field}_{depth}_{layer}'
            download_destination = download_folder/product
            output_folder = raw_folder/field
            output_file = f'{product}.tif'

            # Ensure we don't duplicate runs
            if os.path.isfile(output_folder/output_file):
                continue 
            
            # Download the data
            download_url = f'{soil_url}{field}/{product}/' # trailing '/' is critical or urljoin will ignore the final part later
            cs.download_all_soilgrids_tiles_into_folder(download_url, download_destination)

            # Process individual tiles into a single GeoTIFF of the domain of interest
            cs.process_soilgrids_tiles_into_single_geotiff(download_destination, 
                                                        output_folder, output_file, 
                                                        to_crs='EPSG:4326', subset_window=subset_window)

            # Clean up the download folder to save disk space
            shutil.rmtree(download_destination)

File C:\Globus endpoint\CAMELS_spat\geospatial_temp\soilgrids\download\bdod_15-30cm_mean\tileSG-000-019_3-3.tif exists and download_url_into_folder() argument overwrite is False. Skipping file.
File C:\Globus endpoint\CAMELS_spat\geospatial_temp\soilgrids\download\bdod_15-30cm_mean\tileSG-000-020_2-2.tif exists and download_url_into_folder() argument overwrite is False. Skipping file.
File C:\Globus endpoint\CAMELS_spat\geospatial_temp\soilgrids\download\bdod_15-30cm_mean\tileSG-000-020_2-3.tif exists and download_url_into_folder() argument overwrite is False. Skipping file.
File C:\Globus endpoint\CAMELS_spat\geospatial_temp\soilgrids\download\bdod_15-30cm_mean\tileSG-000-020_3-1.tif exists and download_url_into_folder() argument overwrite is False. Skipping file.
File C:\Globus endpoint\CAMELS_spat\geospatial_temp\soilgrids\download\bdod_15-30cm_mean\tileSG-000-020_3-2.tif exists and download_url_into_folder() argument overwrite is False. Skipping file.
File C:\Globus endpoint\CAMELS

KeyboardInterrupt: 