<a href="https://colab.research.google.com/github/anaguilarar/agwise_data_sourcing/blob/main/GEESoilGrids_data_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ag-Wise Data Sourcing

## SOIL Downloader

This tutorial explains how to use this notebook to download soil data from google earth engine

In [None]:
import os

if not os.path.exists('/content/agwise_data_sourcing'):
  !git clone https://github.com/anaguilarar/agwise_data_sourcing.git
  os.chdir('/content/agwise_data_sourcing')
else:
  os.chdir('/content/agwise_data_sourcing')

## Workflow Overview
1. **Country Example Configuration** – Select area and soil property.
2. **Soil Data visualization** – Visualize the soil property.

### Country Example Configuration


In this section, you will set the parameters for your analysis. Modify the dictionary below to match your region and product of interest.


- `ADM0_NAME` define the administrative levels.
- `property` sets the MODIS/VIIRS dataset.


Example: soil extraction for **Kenya – Coast Province (2023)**

In [None]:
### INITIAL configuration

configuration = {
    'GENERAL_SETTINGS':{
      'ee_project_name': 'ee-anaguilarar'
      },
    'DATA_DOWNLOAD':
    {
      'ADM0_NAME': 'Kenya',
      'ADM1_NAME': 'Kericho',
      'ADM2_NAME': None,
      'property': 'phosphorus',
      'depths': ['0_20', '20_50']

    },
    'OUTPUT':
      {
        'path': 'soil',
        'resolution': 250

      }
}


- The first time you run this notebook, GEE will request authentication (`ee.Authenticate()`).
- Depending on your area size, the data request might take a few minutes.

In [None]:
from gee_datasets.soil import GEESoilGrids
import ee
import geemap

ee.Initialize(project=configuration['GENERAL_SETTINGS']['ee_project_name'])


### Data Downloading


This section connects to Google Earth Engine, defines your region of interest, and retrieves the vegetation index time series.


Steps:
1. **Initialize Google Earth Engine (GEE)** with your project.
2. **Create the downloader object** (`GEESoilGrids`).
3. **Run the query** to retrieve the imagery.

In [None]:
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

data_downloader.list_of_products

In [None]:
data_downloader.initialize_query(configuration['DATA_DOWNLOAD']['property'], depths= configuration['DATA_DOWNLOAD']['depths'])
band_names = data_downloader.query.bandNames().getInfo()
band_names

### Soil Data visualization

In [None]:

# Create a map
Map = geemap.Map(center=[-1.37, 38.01], zoom=6)

# Define visualization parameters

vis_parameters = {'min': 7, 'max': 14,
 'palette': ['5d5851','635a4b','6a5b44','715c3d','785e36','7e5f30','856129','8c6222','92641c','996515','a0660e','a66808','ad6901']}
# Add the image layer
Map.addLayer(data_downloader.query.select(band_names[0]), vis_parameters, band_names[0])
# Display the map
Map

### Download data for a specific administrative level

You can target data at different administrative levels using the configuration keys:

1. Set `ADM0_NAME` for the country (required).
2. Set `ADM1_NAME` for the first-level admin (province/state) if you want a subregion.
3. Set `ADM2_NAME` for the district/municipality if available and needed.


**Example configuration (Kenya, Coast province):**


```python
configuration['DATA_DOWNLOAD'].update({
'ADM0_NAME': 'Kenya',
'ADM1_NAME': 'Coast',
'ADM2_NAME': None,

})


In [None]:

soil_image = data_downloader.get_adm_level_data(adm_level='ADM1', feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'])

Map = geemap.Map(center=[-1.37, 38.01], zoom=8)
Map.addLayer(soil_image.select(band_names[0]), vis_parameters, band_names[0])

# Display the map
Map

In [None]:
### Download to local store
import os

## Reproject to epsg 4326

output_fn = os.path.join(configuration['OUTPUT']['path'], configuration['DATA_DOWNLOAD']['property'] + '.tif')
if not os.path.exists(configuration['OUTPUT']['path']): os.mkdir(configuration['OUTPUT']['path'])

data_downloader.download_data(soil_image, output_fn,  scale = configuration['OUTPUT']['resolution'])

### Create datacube

In [None]:
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])


properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

data_downloader.download_multiple_properties('soil', properties_todownload,
                                            adm_level='ADM1',
                                            feature_name = configuration['DATA_DOWNLOAD']['ADM1_NAME'],
                                            scale = configuration['OUTPUT']['resolution'],
                                            depths= configuration['DATA_DOWNLOAD']['depths'])


In [None]:
!pip install rioxarray

In [None]:
import xarray
import rioxarray as rio
raster_list = [os.path.join('soil',i) for i in os.listdir('soil') if i.endswith('tif')]
xrdata_list = []
for i in range(len(raster_list)):
    xrdata = rio.open_rasterio(raster_list[i]).rename({'band': 'depth'})
    xrdata.name = os.path.basename(raster_list[i])[:-4]

    xrdata_list.append(xrdata)


In [None]:
xarray.merge(xrdata_list).to_netcdf('soil.nc')

### Extracting data using coordinate

In [None]:
coordinate = [37.8, -1.4] # long and lat
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

configuration['DATA_DOWNLOAD']['depths'] = ['0_5', '5_15', '15_30', '30_60', '60_100', '100_200']

properties_todownload

df = data_downloader.soildata_using_point(properties_todownload, coordinate,
                                     depths= configuration['DATA_DOWNLOAD']['depths'])

df

Convert SoilGrids point data to DSSAT format

In [None]:
!git clone https://github.com/anaguilarar/WeatherSoilDataProcessor.git


!pip install -r /content/agwise_data_sourcing/WeatherSoilDataProcessor/requirements.txt

In [None]:
import os
import warnings
import sys
import ee

warnings.filterwarnings("ignore", category=DeprecationWarning)

# Get the absolute path to the directory you want to add
new_path = os.path.abspath('/content/agwise_data_sourcing/WeatherSoilDataProcessor')

# Add the path to sys.path
sys.path.append(new_path)

os.chdir('/content/agwise_data_sourcing')

from gee_datasets.soil import GEESoilGrids
warnings.filterwarnings("ignore", category=DeprecationWarning)



configuration = {
    'GENERAL_SETTINGS':{
      'ee_project_name': 'ee-anaguilarar'
      },
    'DATA_DOWNLOAD':
    {
      'ADM0_NAME': 'Kenya',
      'ADM1_NAME': 'Kericho',
      'ADM2_NAME': None,
      'property': 'sand',
      'depths': ['0_5', '5_15', '15_30', '30_60', '60_100', '100_200']

    },
    'OUTPUT':
      {
        'path': 'soil',
        'resolution': 250

      }
}
ee.Initialize(project=configuration['GENERAL_SETTINGS']['ee_project_name'])

coordinate = [37.8, -1.4] # long and lat
data_downloader = GEESoilGrids(configuration['DATA_DOWNLOAD']['ADM0_NAME'])

properties_todownload = ['bdod', 'cec', 'cfvo', 'clay', 'sand', 'silt', 'nitrogen', 'soc', 'phh2o', 'wv0010', 'wv0033', 'wv1500']

df = data_downloader.soildata_using_point(properties_todownload, coordinate,
                                     depths= configuration['DATA_DOWNLOAD']['depths'])

## estandarize units
for column_name in df.columns:
  if column_name.startswith('wv'):
    df[column_name] = df[column_name] * 1000
  if column_name == 'nitrogen':
    df[column_name] = df[column_name] / 10

df

In [None]:
data_downloader.country.title()

In [None]:
import sys
import os

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Get the absolute path to the directory you want to add

from crop_modeling.dssat.files_export import from_soil_to_dssat
if not os.path.exists('/content/agwise_data_sourcing/runs'):
  os.mkdir('/content/agwise_data_sourcing/runs')

from_soil_to_dssat(df, outputpath = '/content/agwise_data_sourcing/runs', outputfn = 'SOL', soil_id='TRAN00001', country=configuration['DATA_DOWNLOAD']['ADM0_NAME'], site = configuration['DATA_DOWNLOAD']['ADM1_NAME'])
warnings.filterwarnings("ignore", category=DeprecationWarning)


In [None]:
import os
import sys

path = os.path.abspath('/content/agwise_data_sourcing/WeatherSoilDataProcessor')
sys.path.append(path)

os.chdir('/content/agwise_data_sourcing')

!python download_soildata.py -config yaml_configurations/soil_data_download.yaml