## Before starting..
Go to https://cemsbv.crux-nuclei.com/#/ to make an account if you didn't do it yet.

To check the available endpoints and their definition go to https://cemsbv.crux-nuclei.com/#/gef-model/

# Cpt interpretation
______________________________________________________________________

This notebook shows how the cpt model can be used for classifying cpt's. The model is a neural network that can map both the cpt's location as standard cpt features to a soil distribution. The features that it uses are:

- Cone resistance: $qc$
- Depth:
- Friction ratio: $R_f$

The model is trained on a certain amount of location clusters. The model has learning location specific soil biases per cluster. It is advised to use this information when making predictions. However you have the freedom to turn it off. This notebook we'll go through getting inference results from the model.

Run the next cell to import the libraries that are required, we are using :

- `nuclei` https://github.com/cemsbv/nuclei
- `pygef` https://github.com/cemsbv/pygef

In [None]:
# First we import the libraries that are needed
from nuclei import call_endpoint, get_endpoints
from pygef import Cpt
import numpy as np

In [None]:
# The soil classification app is called gef-model
APP = "gef-model"

# The endpoints exposed by the model
get_endpoints(APP)

## Gef file
In the next cell we parse the gef file with the help of `pygef` and convert it into a `cpt_object`

In [None]:
# parse the cpt file in GEF object
cpt = Cpt("cpt_test.gef")

if not cpt.groundwater_level:
    cpt.groundwater_level = cpt.zid

# create cpt_object for cpt classification
cpt_object =  {
    "name": cpt.test_id,
    "x": cpt.x,
    "y": cpt.y,
    "ref": cpt.zid,
    "index": np.array(cpt.df["elevation_with_respect_to_nap"], dtype=float),
    "qc": np.array(cpt.df["qc"], dtype=float).clip(0),
    "fs": np.array(cpt.df["fs"], dtype=float).clip(0),
    "groundwater_level": cpt.groundwater_level,
}
# The api is not yet working properly with the cpt_object so for the time being we will use the cpt_content, 
# but keep in mind that cpt_object will be the future ;)

cpt_content = cpt.s

## Schema

The [api console](https://cemsbv.crux-nuclei.com/#/gef-model/) for this app describes the schema you'll need for calling the api.
The schema defines the following parameters:

- cpt_content
- include_features (optional default: True)
- include_location (optional default: True)

In [None]:
schema = {
    "cpt_content": cpt_content,
    "include_features": True,
    "include_location": True,
}

call_endpoint(APP, "/plot", schema)

# Result

As you can see the plot seems like a very reasonable classification!

The classification colors are:

- GREY: Gravel
- YELLOW: Sand
- BLUE: Silt
- GREEN: Clay
- BROWN: Peat

We can only get numeric results by calling the `/classify` endpoint with the same schema. The result is a dictionary containing (among others) the following keys:

- cluster_distances (list): Distance to top N closest clusters in meters.
- in_cluster (boolean): Whether the cpt was taken in a known cluster or not.
- prediction (pandas DataFrame): Prediction result.

By keeping an eye out to `in_cluster` and `cluster_distances` we know if the model is based on data in that cluster or on surrounding clusters. If the `in_cluster` evaluates to True, the model bases its prediction on that cluster only. If not it takes a weighted average of surrounding clusters. The weights are determined by the distance to the cluster centroids. It is recommended to choose 3 clusters as standard.

Below you see the boundaries and centroids of the current clusters.

![](img/clusters.png)

In [None]:
result = call_endpoint(APP, "/classify", schema)
print(result.keys())

In [None]:
result["prediction"].head()

## Location bias

You can choose to set the features off. In that case the model only predicts based on the location information. This way you'll get an insight in the location bias of the model.

In [None]:
schema["include_features"] = False

call_endpoint(APP, "/plot", schema)

## No location

You can also choose to turn off location information during embedding. As a default, this is not recommended, but if you are affraid that the location bias has too much influence for a certain location, you can turn it off. Below we'll see that result. The prediciton now shows more Gravel, than when the location is included as conditional.

In [None]:
schema = {
    "cpt_content": cpt_content,
    "include_features": True,
    "include_location": False,
}

call_endpoint(APP, "/plot", schema)

## Grouping

We can also aggregate layers by setting `aggregate_layers_penalty` > 0. This parameter dictates how many layers are defined. A higher value leads to less layers than lower values. Don't set this value too high as you throw away information with aggregating. If you choose to group your input, a recommended value is between 1 and 3.

In [None]:
schema = {
    "aggregate_layers_penalty": 3,
    "cpt_content": cpt_content,
    "include_features": True,
    "include_location": True,
}

call_endpoint(APP, "/plot", schema)

## Check the NEN classification

After the grouping you can also get the **NEN classification** from the classify method, this might be usefull for a fast first classification.

In [None]:
schema = {
    "aggregate_layers_penalty": 3,
    "cpt_object": cpt_object,
    "include_features": True,
    "include_location": True,
    "merge_nen_table": True,
    "interpolate_nen_table_values": True,
}

result = call_endpoint(APP, "/classify", schema)
result["layer_table"]

Let's check the new result keys:

In [None]:
result.keys()

In [None]:
result["changepoint_depths"]