## Soil Data Collection and Preprocessing

[**Nutrient retention capacity (SQ2)**](https://storage.googleapis.com/fao-maps-catalog-data/geonetwork/gsoc/SQ/sq2.tif)

Nutrient retention capacity is of particular importance for the effectiveness of fertilizer applications and is therefore of special relevance for intermediate and high input level cropping conditions. Nutrient retention capacity refers to the capacity of the soil to retain added nutrients against losses caused by leaching. Plant nutrients are held in the soil on the exchange sites provided by the clay fraction, organic matter and the clay-humus complex. Losses vary with the intensity of leaching which is determined by the rate of drainage of soil moisture through the soil profile. Soil texture affects nutrient retention capacity in two ways, through its effects on available exchange sites on the clay minerals and by soil permeability.

[**Rooting Conditions data (SQ3)**](https://storage.googleapis.com/fao-maps-catalog-data/geonetwork/gsoc/SQ/sq3.tif)


[Oxygen availability to roots(SQ4)](https://storage.googleapis.com/fao-maps-catalog-data/geonetwork/gsoc/SQ/sq4.tif)

## Data source
On the basis of soil parameters provided by the **Harmonized World Soil Database (HWSD)** seven key soil qualities important for crop production have been derived, namely: nutrient availability, nutrient retention capacity, rooting conditions, oxygen availability to roots, excess salts, toxicities, and workability. Soil qualities are related to the agricultural use of the soil and more specifically to specific crop requirements and tolerances. For the illustration of soil qualities, maize was selected as reference crop because of its global importance and wide geographical distribution.

In [31]:
#import librabries
import pandas as pd
import geopandas as gpd
import numpy as np
import rasterio
from rasterio.mask import mask
import os
import matplotlib.pyplot as plt
from PIL import Image

## Clip The Data For Each Country

In [32]:
def clip_and_combine_tiffs(tif_files, shapefile_path, output_dir):
    """
    Clips multiple TIFF files to the extent of a shapefile and combines the data bands into a single TIFF file.
    
    Parameters:
    tif_files (list of str): List of paths to the input TIFF files.
    shapefile_path (str): Path to the shapefile for clipping.
    output_dir (str): Directory where the output TIFF file will be saved.
    
    Returns:
    None: 
    """
    # Load the shapefile
    shapefile = gpd.read_file(shapefile_path)
    
    # Initialize lists to store clipped data, band names, and metadata
    clipped_data = []
    band_names = []
    out_meta = None
    
    for tif_file in tif_files:
        with rasterio.open(tif_file) as src:
            # Clip the raster with the shapefile
            out_image, out_transform = mask(src, shapefile.geometry, crop=True)
            out_meta = src.meta.copy()
            
            # Update the metadata to reflect the new shape
            out_meta.update({
                "driver": "GTiff",
                "height": out_image.shape[1],
                "width": out_image.shape[2],
                "transform": out_transform
            })
            
            clipped_data.append(out_image)
            
            # Extract the filename without extension to use as band name
            band_name = os.path.splitext(os.path.basename(tif_file))[0]
            band_names.append(band_name)
    
    # Stack the clipped data arrays along the first axis (bands)
    combined_data = np.concatenate(clipped_data, axis=0)
    
    # Update the number of bands in the metadata
    out_meta.update({"count": combined_data.shape[0]})
    
    # Define the output file path
    output_filename = "clipped_combined.tif"
    output_path = os.path.join(output_dir, output_filename)
    
    # Write the combined data to a new TIFF file
    with rasterio.open(output_path, "w", **out_meta) as dest:
        dest.write(combined_data)
        dest.descriptions = tuple(band_names)
    
    # Provide message to the user
    print(f"Output saved to: {output_path}")

## Extract The Data For each District

In [41]:
def extract_soil_data_to_csv(tiff_file, shapefile_path, output_dir):
    """
    Extracts soil data in the TIFF file, calculates the mean values by region and district, 
    and saves the data to a CSV file and GeoJSON file.
    
    Parameters:
    tiff_file (str): Path to the input TIFF file.
    shapefile_path (str): Path to the district shapefile.
    output_dir (str): Directory where the output files will be saved.
    
    Returns:
    pd.DataFrame: Dataframe with region, district, and mean values of each band.
    """
    # Load the district shapefile
    districts = gpd.read_file(shapefile_path)
    
    # Determine the name of the region column
    if 'region' in districts.columns:
        region_column = 'region'
    elif 'province' in districts.columns:
        region_column = 'province'
    else:
        raise ValueError("The shapefile must contain a 'region' or 'province' column.")
    
    # Ensure there is a 'district' column
    if 'district' not in districts.columns:
        raise ValueError("The shapefile must contain a 'district' column.")
    
    # Load the TIFF file
    with rasterio.open(tiff_file) as src:
        band_data = []
        for idx, district in districts.iterrows():
            # Clip the raster with the district geometry
            out_image, out_transform = mask(src, [district.geometry], crop=True)
            
            # Calculate mean for each band
            mean_values = np.nanmean(out_image, axis=(1, 2))
            
            # Create a dictionary with region, district, and mean values
            data = {
                region_column: district[region_column],
                'district': district['district'],
            }
            for i, mean_value in enumerate(mean_values, start=1):
                data[src.descriptions[i-1]] = mean_value  # Using band names as keys
            
            band_data.append(data)
    
    # Create a DataFrame
    df = pd.DataFrame(band_data)
    
    # Define output filenames
    csv_filename = "soil_data.csv"
    geojson_filename = "soil_data.geojson"
    csv_output_path = os.path.join(output_dir, csv_filename)
    geojson_output_path = os.path.join(output_dir, geojson_filename)
    
    # Save the DataFrame to a CSV file
    df.to_csv(csv_output_path, index=False)
    print("The Data has been succesfully saved at ",csv_output_path)
    
    # Add geometry to the DataFrame for GeoJSON
    df_geo = districts.merge(df, on=[region_column, 'district'])
    
    # Save the DataFrame to a GeoJSON file
    df_geo.to_file(geojson_output_path, driver='GeoJSON')
    
    return df


## Visualization Function

In [52]:
def create_gif_from_tiff(tiff_file, output_gif_path):
    """
    Creates a GIF file displaying all the bands in a TIFF file with their respective titles.
    
    Parameters:
    tiff_file (str): Path to the input TIFF file.
    output_gif_path (str): Path where the output GIF file will be saved.
    """
    # Title mapping based on band names
    title_mapping = {
        'sq2': 'Nutrient Retention Capacity',
        'sq3': 'Rooting Conditions',
        'sq4': 'Oxygen Availability To Roots'
    }
    
    frames = []
    temp_image_paths = []
    
    with rasterio.open(tiff_file) as src:
        for band in range(1, src.count + 1):
            data = src.read(band)
            band_name = src.descriptions[band-1]
            title = title_mapping.get(band_name, f"Band {band}")
            
            # Create a plot for the band
            fig, ax = plt.subplots(figsize=(10, 10))
            cax = ax.imshow(data, cmap='viridis')
            ax.set_title(title)
            plt.colorbar(cax, ax=ax, orientation='vertical')
            
            # Save the plot to a temporary image file
            temp_image_path = f"band_{band}.png"
            plt.savefig(temp_image_path)
            plt.close(fig)
            
            # Open the image and add it to the frames list
            frame = Image.open(temp_image_path)
            frames.append(frame)
            temp_image_paths.append(temp_image_path)
    
    # Save frames as a GIF
    frames[0].save(output_gif_path, format='GIF', append_images=frames[1:], save_all=True, duration=1000, loop=0)
    
    # Remove the temporary image files
    for temp_image_path in temp_image_paths:
        os.remove(temp_image_path)
    
    print(f"GIF saved to: {output_gif_path}")

In [37]:
sq2 = 'soil_data/sq2.tif' # Nutrient Retention Capacity
sq3 = 'soil_data/sq3.tif' # Rooting Conditions
sq4 = 'soil_data/sq4.tif' # Oxygen Availability To Roots
tif_files = [sq2, sq3, sq4]

## Extract Soil Data of Tanzania

In [38]:
tz_dir = 'tanzania_data/'
country_shapefile = tz_dir + 'shapefiles/tz_country.shp'
output_dir = tz_dir + 'soil_quality_data'

#clip_and_combine_tiffs(tif_files, country_shapefile, output_dir)

In [58]:
#create gif file of soil data for Tanzania
tiff_file = tz_dir + 'soil_quality_data/clipped_combined.tif'
output_gif_path = tz_dir + 'soil_quality_data/processed/tz_soil_data.gif'
#create_gif_from_tiff(tiff_file, output_gif_path)

## Extract the Soil Data of Each District in Tanzania 

In [59]:
tiff_file = tz_dir + 'soil_quality_data/clipped_combined.tif'
shapefile_path = tz_dir + 'shapefiles/tz_districts.shp'
output_dir = tz_dir + 'soil_quality_data/processed'
#extract_soil_data_to_csv(tiff_file, shapefile_path, output_dir)

## Extract Soil Data of Rwanda

In [60]:
rw_dir = 'rwanda_data/'
country_shapefile = rw_dir + 'shapefiles/rw_country.shp'
output_dir = rw_dir + 'soil_quality_data'
#clip_and_combine_tiffs(tif_files, country_shapefile, output_dir)

In [61]:
#create gif file of soil data for Rwanda
tiff_file = rw_dir + 'soil_quality_data/clipped_combined.tif'
output_gif_path = rw_dir + 'soil_quality_data/processed/rw_soil_data.gif'
#create_gif_from_tiff(tiff_file, output_gif_path)

## Extract the Soil Data of Each District in Rwanda 

In [63]:
tiff_file = rw_dir + 'soil_quality_data/clipped_combined.tif'
shapefile_path = rw_dir + 'shapefiles/rw_district.shp'
output_dir = rw_dir + 'soil_quality_data/processed'
#extract_soil_data_to_csv(tiff_file, shapefile_path, output_dir)