<a href="https://colab.research.google.com/github/anaguilarar/agwise_data_sourcing/blob/main/GEEMODIS_data_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ag-Wise Data Sourcing

## SOIL Downloader

This tutorial explains how to use this notebook to download soil data from google earth engine

In [None]:
import os

if not os.path.exists('agwise_data_sourcing'):
  !git clone https://github.com/anaguilarar/agwise_data_sourcing.git
  os.chdir('/content/agwise_data_sourcing')
else:
  os.chdir('/content/agwise_data_sourcing')

## Workflow Overview
1. **Country Example Configuration** – Select area and soil property.
2. **Soil Data visualization** – Visualize the soil property.

### Country Example Configuration


In this section, you will set the parameters for your analysis. Modify the dictionary below to match your region and product of interest.


- `ADM0_NAME` define the administrative levels.
- `property` sets the MODIS/VIIRS dataset.


Example: soil extraction for **Kenya – Coast Province (2023)**

In [3]:
### INITIAL configuration

configuration = {
    'GENERAL_SETTINGS':{
      'ee_project_name': 'ee-anaguilarar'
      },

    'DATA_DOWNLOAD':
    {
      'ADM0_NAME': 'Kenya',
      'ADM1_NAME': 'Kericho',
      'ADM2_NAME': None,
      'property': 'sand',

    },
    'OUTPUT':
      {
        'path': 'soil',
        'resolution': 250
        
      }
}


- The first time you run this notebook, GEE will request authentication (`ee.Authenticate()`).
- Depending on your area size, the data request might take a few minutes.

In [4]:
from gee_datasets.gee_data import GEESoilGrids
import ee
import geemap

ee.Initialize(project=configuration['GENERAL_SETTINGS']['ee_project_name'])


### Data Downloading


This section connects to Google Earth Engine, defines your region of interest, and retrieves the vegetation index time series.


Steps:
1. **Initialize Google Earth Engine (GEE)** with your project.
2. **Create the downloader object** (`GEESoilGrids`).
3. **Run the query** to retrieve the imagery.

In [5]:
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

data_downloader.list_of_products

{'bdod': 'projects/soilgrids-isric/bdod_mean',
 'cec': 'projects/soilgrids-isric/cec_mean',
 'cfvo': 'projects/soilgrids-isric/cfvo_mean',
 'clay': 'projects/soilgrids-isric/clay_mean',
 'sand': 'projects/soilgrids-isric/sand_mean',
 'silt': 'projects/soilgrids-isric/silt_mean',
 'nitrogen': 'projects/soilgrids-isric/nitrogen_mean',
 'soc': 'projects/soilgrids-isric/soc_mean',
 'phh2o': 'projects/soilgrids-isric/phh2o_mean',
 'wv0010': 'ISRIC/SoilGrids250m/v2_0/wv0010',
 'wv0033': 'ISRIC/SoilGrids250m/v2_0/wv0033',
 'wv1500': 'ISRIC/SoilGrids250m/v2_0/wv1500'}

In [6]:
data_downloader.initialize_query(configuration['DATA_DOWNLOAD']['property'])


In [7]:
band_names = data_downloader.query.bandNames().getInfo()
band_names

['sand_0-5cm_mean',
 'sand_5-15cm_mean',
 'sand_15-30cm_mean',
 'sand_30-60cm_mean',
 'sand_60-100cm_mean',
 'sand_100-200cm_mean']

### Soil Data visualization

In [8]:

# Create a map
Map = geemap.Map(center=[-1.37, 38.01], zoom=6) 

# Define visualization parameters

vis_parameters = {'min': 50, 'max': 1000,
 'palette': ['5d5851','635a4b','6a5b44','715c3d','785e36','7e5f30','856129','8c6222','92641c','996515','a0660e','a66808','ad6901']}
# Add the image layer
Map.addLayer(data_downloader.query.select(band_names[0]), vis_parameters, band_names[0])
# Display the map
Map

Map(center=[-1.37, 38.01], controls=(WidgetControl(options=['position', 'transparent_bg'], position='topright'…

### Download data for a specific administrative level

You can target data at different administrative levels using the configuration keys:

1. Set `ADM0_NAME` for the country (required).
2. Set `ADM1_NAME` for the first-level admin (province/state) if you want a subregion.
3. Set `ADM2_NAME` for the district/municipality if available and needed.


**Example configuration (Kenya, Coast province):**


```python
configuration['DATA_DOWNLOAD'].update({
'ADM0_NAME': 'Kenya',
'ADM1_NAME': 'Coast',
'ADM2_NAME': None,

})


In [9]:

soil_image = data_downloader.get_adm_level_data(adm_level='ADM1', feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'])

Map = geemap.Map(center=[-1.37, 38.01], zoom=8) 
Map.addLayer(soil_image.select(band_names[0]), vis_parameters, band_names[0])

# Display the map
Map

data will be processed for: Kericho


Map(center=[-1.37, 38.01], controls=(WidgetControl(options=['position', 'transparent_bg'], position='topright'…

In [10]:
### Download to local store
import os

## Reproject to epsg 4326

output_fn = os.path.join(configuration['OUTPUT']['path'], configuration['DATA_DOWNLOAD']['property'] + '.tif')
if not os.path.exists(configuration['OUTPUT']['path']): os.mkdir(configuration['OUTPUT']['path'])

data_downloader.download_data(soil_image, output_fn,  scale = configuration['OUTPUT']['resolution'])

### Create datacube

In [12]:
properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

data_downloader.download_multiple_properties('soil', properties_todownload, 
                                            adm_level='ADM1', 
                                            feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'],
                                            scale = configuration['OUTPUT']['resolution'])


data will be processed for: Kericho
bdod: data was downloaded in soil\bdod.tif
data will be processed for: Kericho
cec: data was downloaded in soil\cec.tif
data will be processed for: Kericho
cfvo: data was downloaded in soil\cfvo.tif
data will be processed for: Kericho
clay: data was downloaded in soil\clay.tif
data will be processed for: Kericho
sand: data was downloaded in soil\sand.tif
data will be processed for: Kericho
silt: data was downloaded in soil\silt.tif
data will be processed for: Kericho
nitrogen: data was downloaded in soil\nitrogen.tif
data will be processed for: Kericho
soc: data was downloaded in soil\soc.tif
data will be processed for: Kericho
phh2o: data was downloaded in soil\phh2o.tif
data will be processed for: Kericho
wv0010: data was downloaded in soil\wv0010.tif
data will be processed for: Kericho
wv0033: data was downloaded in soil\wv0033.tif
data will be processed for: Kericho
wv1500: data was downloaded in soil\wv1500.tif


In [None]:
import xarray
import rioxarray as rio
raster_list = [os.path.join('soil',i) for i in os.listdir('soil') if i.endswith('tif')]
xrdata_list = []
for i in range(len(raster_list)):
    xrdata = rio.open_rasterio(raster_list[i]).rename({'band': 'depth'})
    xrdata.name = os.path.basename(raster_list[i])[:-4]

    xrdata_list.append(xrdata.isel(depth =[1,2,3,4]))


[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[1 2 3 4 5 6]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
