<a href="https://colab.research.google.com/github/anaguilarar/agwise_data_sourcing/blob/main/GEESoilGrids_data_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ag-Wise Data Sourcing

## SOIL Downloader

This tutorial explains how to use this notebook to download soil data from google earth engine

In [1]:
import os

if not os.path.exists('agwise_data_sourcing'):
  !git clone https://github.com/anaguilarar/agwise_data_sourcing.git
  os.chdir('/content/agwise_data_sourcing')
else:
  os.chdir('/content/agwise_data_sourcing')

Cloning into 'agwise_data_sourcing'...
remote: Enumerating objects: 110, done.[K
remote: Counting objects: 100% (110/110), done.[K
remote: Compressing objects: 100% (80/80), done.[K
remote: Total 110 (delta 55), reused 63 (delta 24), pack-reused 0 (from 0)[K
Receiving objects: 100% (110/110), 367.56 KiB | 8.17 MiB/s, done.
Resolving deltas: 100% (55/55), done.


## Workflow Overview
1. **Country Example Configuration** – Select area and soil property.
2. **Soil Data visualization** – Visualize the soil property.

### Country Example Configuration


In this section, you will set the parameters for your analysis. Modify the dictionary below to match your region and product of interest.


- `ADM0_NAME` define the administrative levels.
- `property` sets the MODIS/VIIRS dataset.


Example: soil extraction for **Kenya – Coast Province (2023)**

In [2]:
### INITIAL configuration

configuration = {
    'GENERAL_SETTINGS':{
      'ee_project_name': 'ee-anaguilarar'
      },
    'DATA_DOWNLOAD':
    {
      'ADM0_NAME': 'Kenya',
      'ADM1_NAME': 'Kericho',
      'ADM2_NAME': None,
      'property': 'sand',
      'depths': ['0_5', '5_15', '15_30', '30_60', '60_100', '100_200']

    },
    'OUTPUT':
      {
        'path': 'soil',
        'resolution': 250

      }
}


- The first time you run this notebook, GEE will request authentication (`ee.Authenticate()`).
- Depending on your area size, the data request might take a few minutes.

In [3]:
from gee_datasets.gee_data import GEESoilGrids
import ee
import geemap

ee.Initialize(project=configuration['GENERAL_SETTINGS']['ee_project_name'])


### Data Downloading


This section connects to Google Earth Engine, defines your region of interest, and retrieves the vegetation index time series.


Steps:
1. **Initialize Google Earth Engine (GEE)** with your project.
2. **Create the downloader object** (`GEESoilGrids`).
3. **Run the query** to retrieve the imagery.

In [4]:
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

data_downloader.list_of_products

{'bdod': 'projects/soilgrids-isric/bdod_mean',
 'cec': 'projects/soilgrids-isric/cec_mean',
 'cfvo': 'projects/soilgrids-isric/cfvo_mean',
 'clay': 'projects/soilgrids-isric/clay_mean',
 'sand': 'projects/soilgrids-isric/sand_mean',
 'silt': 'projects/soilgrids-isric/silt_mean',
 'nitrogen': 'projects/soilgrids-isric/nitrogen_mean',
 'soc': 'projects/soilgrids-isric/soc_mean',
 'phh2o': 'projects/soilgrids-isric/phh2o_mean',
 'wv0010': 'ISRIC/SoilGrids250m/v2_0/wv0010',
 'wv0033': 'ISRIC/SoilGrids250m/v2_0/wv0033',
 'wv1500': 'ISRIC/SoilGrids250m/v2_0/wv1500'}

In [5]:
data_downloader.initialize_query(configuration['DATA_DOWNLOAD']['property'], depths= configuration['DATA_DOWNLOAD']['depths'])
band_names = data_downloader.query.bandNames().getInfo()
band_names

['sand_0-5cm_mean',
 'sand_5-15cm_mean',
 'sand_15-30cm_mean',
 'sand_30-60cm_mean',
 'sand_60-100cm_mean',
 'sand_100-200cm_mean']

### Soil Data visualization

In [6]:

# Create a map
Map = geemap.Map(center=[-1.37, 38.01], zoom=6)

# Define visualization parameters

vis_parameters = {'min': 50, 'max': 1000,
 'palette': ['5d5851','635a4b','6a5b44','715c3d','785e36','7e5f30','856129','8c6222','92641c','996515','a0660e','a66808','ad6901']}
# Add the image layer
Map.addLayer(data_downloader.query.select(band_names[0]), vis_parameters, band_names[0])
# Display the map
Map

Map(center=[-1.37, 38.01], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchDataGU…

### Download data for a specific administrative level

You can target data at different administrative levels using the configuration keys:

1. Set `ADM0_NAME` for the country (required).
2. Set `ADM1_NAME` for the first-level admin (province/state) if you want a subregion.
3. Set `ADM2_NAME` for the district/municipality if available and needed.


**Example configuration (Kenya, Coast province):**


```python
configuration['DATA_DOWNLOAD'].update({
'ADM0_NAME': 'Kenya',
'ADM1_NAME': 'Coast',
'ADM2_NAME': None,

})


In [None]:

soil_image = data_downloader.get_adm_level_data(adm_level='ADM1', feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'])

Map = geemap.Map(center=[-1.37, 38.01], zoom=8)
Map.addLayer(soil_image.select(band_names[0]), vis_parameters, band_names[0])

# Display the map
Map

data will be processed for: Kericho


Map(center=[-1.37, 38.01], controls=(WidgetControl(options=['position', 'transparent_bg'], position='topright'…

In [None]:
### Download to local store
import os

## Reproject to epsg 4326

output_fn = os.path.join(configuration['OUTPUT']['path'], configuration['DATA_DOWNLOAD']['property'] + '.tif')
if not os.path.exists(configuration['OUTPUT']['path']): os.mkdir(configuration['OUTPUT']['path'])

data_downloader.download_data(soil_image, output_fn,  scale = configuration['OUTPUT']['resolution'])

### Create datacube

In [7]:
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])


properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

data_downloader.download_multiple_properties('soil', properties_todownload,
                                            adm_level='ADM1',
                                            feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'],
                                            scale = configuration['OUTPUT']['resolution'],
                                            depths= configuration['DATA_DOWNLOAD']['depths'])


data will be processed for: Kericho
bdod: data was downloaded in soil/bdod.tif
data will be processed for: Kericho
cec: data was downloaded in soil/cec.tif
data will be processed for: Kericho
cfvo: data was downloaded in soil/cfvo.tif
data will be processed for: Kericho
clay: data was downloaded in soil/clay.tif
data will be processed for: Kericho
sand: data was downloaded in soil/sand.tif
data will be processed for: Kericho
silt: data was downloaded in soil/silt.tif
data will be processed for: Kericho
nitrogen: data was downloaded in soil/nitrogen.tif
data will be processed for: Kericho
soc: data was downloaded in soil/soc.tif
data will be processed for: Kericho
phh2o: data was downloaded in soil/phh2o.tif
data will be processed for: Kericho
wv0010: data was downloaded in soil/wv0010.tif
data will be processed for: Kericho
wv0033: data was downloaded in soil/wv0033.tif
data will be processed for: Kericho
wv1500: data was downloaded in soil/wv1500.tif


In [9]:
!pip install rioxarray

Collecting rioxarray
  Downloading rioxarray-0.20.0-py3-none-any.whl.metadata (5.4 kB)
Collecting rasterio>=1.4.3 (from rioxarray)
  Downloading rasterio-1.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Collecting affine (from rasterio>=1.4.3->rioxarray)
  Downloading affine-2.4.0-py3-none-any.whl.metadata (4.0 kB)
Collecting cligj>=0.5 (from rasterio>=1.4.3->rioxarray)
  Downloading cligj-0.7.2-py3-none-any.whl.metadata (5.0 kB)
Collecting click-plugins (from rasterio>=1.4.3->rioxarray)
  Downloading click_plugins-1.1.1.2-py2.py3-none-any.whl.metadata (6.5 kB)
Downloading rioxarray-0.20.0-py3-none-any.whl (62 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rasterio-1.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m22.3/22.3 MB[0m [31m77.7 MB/s[0m eta [36m0:00:00[0m
[?25

In [10]:
import xarray
import rioxarray as rio
raster_list = [os.path.join('soil',i) for i in os.listdir('soil') if i.endswith('tif')]
xrdata_list = []
for i in range(len(raster_list)):
    xrdata = rio.open_rasterio(raster_list[i]).rename({'band': 'depth'})
    xrdata.name = os.path.basename(raster_list[i])[:-4]

    xrdata_list.append(xrdata)




In [11]:
xarray.merge(xrdata_list).to_netcdf('soil.nc')

### Extracting data using coordinate

In [13]:
coordinate = [37.8, -1.4] # long and lat
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

properties_todownload

df = data_downloader.soildata_using_point(properties_todownload, coordinate,
                                     depths= configuration['DATA_DOWNLOAD']['depths'])

Convert SoilGrids point data to DSSAT format

In [15]:
!git clone https://github.com/anaguilarar/WeatherSoilDataProcessor.git

#import os
#os.chdir('/content/WeatherSoilDataProcessor')

import sys
import os

# Get the absolute path to the directory you want to add
new_path = os.path.abspath('/content/agwise_data_sourcing/WeatherSoilDataProcessor')

# Add the path to sys.path
sys.path.append(new_path)

!pip install -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt

fatal: destination path 'WeatherSoilDataProcessor' already exists and is not an empty directory.
Collecting cdsapi (from -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt (line 2))
  Downloading cdsapi-0.7.7-py2.py3-none-any.whl.metadata (3.1 kB)
Collecting DSSATTools==2.1.6 (from -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt (line 3))
  Downloading DSSATTools-2.1.6-py3-none-any.whl.metadata (14 kB)
Collecting rosetta-soil==0.1.1 (from -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt (line 4))
  Downloading rosetta_soil-0.1.1-py3-none-any.whl.metadata (1.0 kB)
Collecting fortranformat==1.2.2 (from -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt (line 5))
  Downloading fortranformat-1.2.2.tar.gz (22 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting geopandas==1.0.1 (from -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt (line 6))
  Downloa

In [25]:
import os
os.chdir('/content/agwise_data_sourcing')

from gee_datasets.gee_data import GEESoilGrids


import ee
ee.Initialize(project=configuration['GENERAL_SETTINGS']['ee_project_name'])

configuration = {
    'GENERAL_SETTINGS':{
      'ee_project_name': 'ee-anaguilarar'
      },
    'DATA_DOWNLOAD':
    {
      'ADM0_NAME': 'Kenya',
      'ADM1_NAME': 'Kericho',
      'ADM2_NAME': None,
      'property': 'sand',
      'depths': ['0_5', '5_15', '15_30', '30_60', '60_100', '100_200']

    },
    'OUTPUT':
      {
        'path': 'soil',
        'resolution': 250

      }
}

coordinate = [37.8, -1.4] # long and lat
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

df = data_downloader.soildata_using_point(properties_todownload, coordinate,
                                     depths= configuration['DATA_DOWNLOAD']['depths'])

## estandarize units
for column_name in df.columns:
  if column_name.startswith('wv'):
    df[column_name] = df[column_name] * 1000
  if column_name == 'nitrogen':
    df[column_name] = df[column_name] / 10

df

Unnamed: 0,depth,bdod,x,y,cec,cfvo,clay,sand,silt,nitrogen,soc,phh2o,wv0010,wv0033,wv1500
0,0-5,131,37.8,-1.4,154,41,304,501,195,128.6,173,64,342.0,285.0,129.0
1,100-200,134,37.8,-1.4,137,161,422,411,168,44.4,64,66,354.0,293.0,177.0
2,15-30,134,37.8,-1.4,135,61,377,442,181,85.8,81,64,343.0,280.0,133.0
3,30-60,135,37.8,-1.4,137,90,417,416,166,66.3,63,65,350.0,284.0,166.0
4,5-15,133,37.8,-1.4,129,45,311,492,197,118.0,107,64,336.0,273.0,127.0
5,60-100,134,37.8,-1.4,134,125,408,416,175,52.5,70,65,350.0,293.0,168.0


In [26]:
import sys
import os

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Get the absolute path to the directory you want to add
new_path = os.path.abspath('/content/agwise_data_sourcing/WeatherSoilDataProcessor')
# Add the path to sys.path
sys.path.append(new_path)

from crop_modeling.dssat.files_export import from_soil_to_dssat

from_soil_to_dssat(df, outputpath = '/content/agwise_data_sourcing/runs', outputfn = 'SOL', soil_id='TRAN00001', country=configuration['DATA_DOWNLOAD']['ADM0_NAME'], site = configuration['DATA_DOWNLOAD']['ADM1_NAME'])


sand 50.1 clay 30.400000000000002
