## SoilStats demo

With `soilstats`, you can easily retrieve datasets from [the SoilGrids REST API](https://rest.isric.org/soilgrids/v2.0/), and use them to calculate soil properties. This notebook shows how to do this.

### Set up a collection

In this example, we want to collect 50 data points from within the grid (56.225297, 8.662215), (55.958103, 9.354390).
In other words: the latitude boundaries are 55.958103 and 56.225297, and the longitude boundaries are 8.662215 and 9.354390.

The data we want to collect is for clay, sand, silt, and ocs, and we want to collect it for the top layers of the soil (0-30cm).
There are various depths available in the SoilGrids API that meet that range: 0-5cm, 5-15cm, 15-30cm, and 0-30cm.
The value we are interested in is the mean.

To set up the collection, we use the `SoilCollect` class from `soilstats`:

In [1]:
from soilstats import SoilCollect as sc

# Create a SoilCollect object
collect = sc(
   lat_bounds = [56.225297, 55.958103],
   lon_bounds = [8.662215, 9.354390],
   properties = ['clay', 'sand', 'silt', 'ocs'],
   depths = ['0-5cm', '5-15cm', '15-30cm', '0-30cm'],
   values = 'mean',
   n = 50
)

This setup prepares the collection.
We can manually verify the setup by looking at the URLs for each data point:

In [2]:
[point.url for point in collect.soildatapoints[:10]]

['https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.208731567334121&lat=55.97950897417508&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.144549067984933&lat=56.119503043452404&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.904997017519737&lat=56.22014397312186&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.005866662100203&lat=56.14485639479264&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.987752139816028&lat=56.21024808094749&property=clay&

To make the call to the API, and retrieve the data, we use the `get_data()` method:

In [None]:
df = collect.get_data()


This stores the data in the object as well: you can retrieve it with `collect.df`.

### Calculate soil properties
We want to check for each point what the dominant soil type is.
To do this, we check for the three properties sand, clay, and silt, which has the highest value for each point.

In [None]:
top = collect.top_property(['clay', 'sand', 'silt'])

top

### Correlate soil properties

To investigate how soil properties correlate to each other, we set up a linear regression model.

The formula we use is `clay + sand + silt ~ ocs`.
We use the `regression` method to set up the model.

In [None]:
model = collect.regression(formula = "clay + sand + silt ~ ocs")

Running the model should print summary statistics to the screen.
However, the main statistics are also stored in the model's `stats` attribute:

In [None]:
model.stats