# Soil attributes

Notebook to create the file `CAMELS_DE_soil_attributes.csv`.  

- columns in CAMELS-GB:
- gauge_id
- sand_perc
- sand_perc_missing
- silt_perc
- silt_perc_missing
- clay_perc
- clay_perc_missing
- organic_perc
- organic_perc_missing
- bulkdens
- bulkdens_missing
- bulkdens_5
- bulkdens_50
- bulkdens_95
- tawc
- tawc_missing
- tawc_5
- tawc_50
- tawc_95
- porosity_cosby
- porosity_cosby_missing
- porosity_cosby_5
- porosity_cosby_50
- porosity_cosby_95
- porosity_hypres
- porosity_hypres_missing
- porosity_hypres_5
- porosity_hypres_50
- porosity_hypres_95
- conductivity_cosby
- conductivity_cosby_missing
- conductivity_cosby_5
- conductivity_cosby_50
- conductivity_cosby_95
- conductivity_hypres
- conductivity_hypres_missing
- conductivity_hypres_5
- conductivity_hypres_50
- conductivity_hypres_95
- root_depth
- root_depth_missing
- root_depth_5
- root_depth_50
- root_depth_95
- soil_depth_pelletier
- soil_depth_pelletier_missing
- soil_depth_pelletier_5
- soil_depth_pelletier_50
- soil_depth_pelletier_95

In [1]:
import os
from glob import glob
import pandas as pd

from camelsp.util import OUTPUT_PATH



In [2]:
# get camels_ids from hydromet timeseries
camels_ids = [camels_id.split("_")[-1].split(".csv")[0] for camels_id in glob("../output_data/camels_de/timeseries/*.csv")]

# sort camels_ids
camels_ids = sorted(camels_ids)

print(f"Total number of stations in CAMELS-DE v1: {len(camels_ids)}")

Total number of stations in CAMELS-DE v1: 1460


## Read and save soil data

We extracted the hydrogeology attributes from the ISRIC soil grid dataset, read, merge and process here.

In [28]:
# get all soil grids variables
files = glob(os.path.join(OUTPUT_PATH, "raw_catchment_attributes/soils/isric/*"))

soil_data = pd.DataFrame(columns=["camels_id"], data=camels_ids)

# read files and merge, join on camels_id
for file in files:
    # read file
    data = pd.read_csv(file)

    # merge
    soil_data = soil_data.merge(data, how="left", on="camels_id")


# filter for camels_ids and sort by camels_id
soil_data = soil_data[soil_data["camels_id"].isin(camels_ids)].sort_values("camels_id").reset_index(drop=True)

# save to csv
soil_data.to_csv("../output_data/camels_de/CAMELS_DE_soil_attributes.csv", index=False)