# Building an E-Model for a single-compartment cell

This notebook provides a simple example on how to run the pipeline to create an e-model of a single-compartment cell with two free parameters. In this instance, we will use "rheobase independant" optimisation. 

When the optimisation is "rheobase dependent," it means that we need to calculate the rheobase value (the minimum current required to trigger a neuron) to normalise the trace values. In this case, the normalisation is done by expressing the trace values as a percentage of the rheobase. This approach ensures that the traces are scaled relative to the neuron’s threshold current, providing a consistent basis for comparison. 

For rheobase dependant" optimisation or additional details, please refer to the [L5PC](./../L5PC/) example.

In [1]:
import json

from bluepyemodel.emodel_pipeline.emodel_pipeline import EModel_pipeline
from bluepyemodel.efeatures_extraction.targets_configurator import TargetsConfigurator

In [None]:
# Clear any existing checkpoints to avoid conflicts with previous runs
!rm -r ./checkpoints

## Setting up the pipeline

The pipeline setup involves six key steps: extraction of e-features from electrophysiological recordings; optimisation of a NEURON cell model based on these e-features; storing the optimised model parameters; validating the models against specified protocols; and plotting the results, including traces, e-feature scores, and parameter distributions.

The [recipes.json](./config/recipes.json) file (displayed below) contains the key settings for the various stages of the e-model building pipeline.

* ``morph_path``: Directory path for morphologies
* ``morphology``: Contains a list with the morphology's arbitrary name and file name, located in morph_path.
* ``params``: Specifies mechanisms, locations, distributions, and parameters.
* ``features``: Path to the file with extraction outputs.
- **`pipeline_settings`**: Sets up the pipeline with several configuration options, including:  
  - `extract_absolute_amplitudes`: Set to `true` for performing optimisation independently of rheobase (threshold current).  
  - `optimiser`: Specifies the optimisation algorithm, set to `"SO-CMA"` (a single-objective Covariance Matrix Adaptation algorithm).  
  - `max_ngen`: Defines the maximum number of generations for the optimisation process, set to `5`.  
  - `optimisation_params`: Additional optimisation parameters, such as `offspring_size` set to `20`, indicating the number of solutions generated per generation.  
  - `validation_protocols`: Lists protocols used for validation, e.g., `["IDrest"]`.  
  - `morph_modifiers`: Set to an empty list `[]`, meaning no specific modifications to morphologies are applied by default.

In [3]:
recipes_path = "./config/recipes.json"
with open(recipes_path, 'r') as file:
    recipe = json.load(file)

print(json.dumps(recipe, indent=4))

{
    "simplecell": {
        "morph_path": "./morphologies/",
        "morphology": [
            [
                "simple",
                "simple.swc"
            ]
        ],
        "params": "config/params/simple.json",
        "features": "config/features/simplecell.json",
        "pipeline_settings": {
            "path_extract_config": "config/extract_config/simplecell_config.json",
            "extract_absolute_amplitudes": true,
            "optimiser": "SO-CMA",
            "max_ngen": 5,
            "optimisation_params": {
                "offspring_size": 20
            },
            "validation_protocols": [
                "IDrest_0.4"
            ],
            "morph_modifiers": []
        }
    }
}


To begin, we need to instantiate the pipeline using the ``EModel_pipeline`` class. This class is responsible for loading the ``recipes.json`` file and configuring the pipeline settings based on its content. The following are the minimal required parameters:

- **`emodel`**: Name of the e-model

- **`etype`**: electrical type, `"cADpyr"`, indicating continuous adapting pyramidal cells.

- **`species`**: Biological species, `"rat"`, for which the model is developed.

- **`brain_region`**: Target brain region, `"SSCX"` (somatosensory cortex).

- **`morphology`**: Filename of the morphology file located in the `./morphologies` folder, with a spherical shape characterized by a diameter and length of 10 micrometers.

- **`morphology_format`**: Specifies the format of the morphology file, "swc", with support for SWC, ASC, and H5 formats.

In [4]:
emodel = "simplecell"
etype = "cADpyr"
species = "rat"
brain_region = "SSCX"
morphology = "simple"
morphology_format = "swc"

In [5]:
pipeline = EModel_pipeline(
    emodel=emodel,
    etype=etype,
    species=species,
    brain_region=brain_region,
    recipes_path=recipes_path,
    data_access_point="local",
)

## Extracting the features

We need to download the required data using the script [download_ephys_data.sh](./download_ephys_data.sh). This dataset features continuous adapting pyramidal cells (cADpyr) e-type models from the rat somatosensory cortex. The data can be obtained from this [repository](<https://github.com/BlueBrain/SSCxEModelExamples/tree/main/feature_extraction/input-traces/C060109A1-SR-C1>).

In [None]:
!sh ./download_ephys_data.sh

In [7]:
filenames = [
    "./ephys_data/C060109A1-SR-C1/X_IDrest_ch0_326.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDrest_ch0_327.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDrest_ch0_328.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDrest_ch0_329.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDrest_ch0_330.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDthresh_ch0_349.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDthresh_ch0_350.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDthresh_ch0_351.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDthresh_ch0_352.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IDthresh_ch0_353.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IV_ch0_266.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IV_ch0_267.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IV_ch0_268.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IV_ch0_269.ibw",
    "./ephys_data/C060109A1-SR-C1/X_IV_ch0_270.ibw"
]

We can now define the targets which contains the protocols (ecodes) and the features. We define two protocols, IDrest and IV protocols.

For the IDrest protocol with an amplitude of 0.256 nA, we select the Spikecount, mean_frequency, and voltage_base capture the neuron's spiking activity and resting potential. In the IV protocol, at an amplitude of -0.147 nA, voltage_base and ohmic_input_resistance_vb_ssse assess the neuron's baseline potential and input resistance. 

The tolerance of 0.1 specifies the allowable deviation from the target amplitudes when extracting e-features from traces, meaning values within ±0.1 of the specified amplitudes are considered acceptable.

In [8]:
targets = {
    "IDrest": {
        "amplitudes": [0.256, 0.4],
        "efeatures": [
            "Spikecount",
            "mean_frequency",
            "voltage_base",
        ],
    },
    "IV": {
        "amplitudes": [-0.147],
        "efeatures": [
            "voltage_base",
            "ohmic_input_resistance_vb_ssse",
        ],
    }
}

tolerance = 0.1

The ecodes_metadata dictionary defines parameters for each protocol: ljp is the liquid junction potential correction (14.0 mV), while ton and toff represent the start and stop times (in ms) for current injection.

In [9]:
ecodes_metadata = {
    "IDthresh": {"ljp": 14.0, "ton": 700, "toff": 2700},
    "IDrest": {"ljp": 14.0, "ton": 700, "toff": 2700},
    "IV": {"ljp": 14.0, "ton": 20, "toff": 1020},
}

We save this targets in an object called `ExtractionTargetConfigurator` (ETC), which will serve as the input for the feature extractor

In [10]:
files_metadata = []
for filename in filenames:
    fn = filename.split("/")[-1]
    for ecode in ecodes_metadata:
        if ecode in fn:
            files_metadata.append(
                {
                    "cell_name": filename.split("/")[-2],
                    "filename": filename.split("/")[-1].split(".")[0],
                    "ecodes": {ecode: ecodes_metadata[ecode]},
                    "other_metadata": {
                        "i_file": filename,
                        "v_file": filename.replace("ch0", "ch1"),
                        "i_unit": "A",
                        "v_unit": "V",
                        "t_unit": "s",
                    },
                }
            )


targets_formated = []
for ecode in targets:
    for amplitude in targets[ecode]["amplitudes"]:
        for efeature in targets[ecode]["efeatures"]:
            targets_formated.append(
                {
                    "efeature": efeature,
                    "protocol": ecode,
                    "amplitude": amplitude,
                    "tolerance": tolerance,
                }
            )


configurator = TargetsConfigurator(pipeline.access_point)
configurator.new_configuration(files_metadata, targets_formated)
configurator.save_configuration()

We can now proceed to extract the e-features

In [None]:
pipeline.extract_efeatures()

The results of the feature extraction is stored in [simplecell.json](./config/features/simplecell.json). Let's take a look at the extracted features

In [12]:
fcc_path = "./config/features/simplecell.json"
with open(fcc_path, 'r') as file:
    fcc = json.load(file)

print(json.dumps(fcc, indent=4))

{
    "efeatures": [
        {
            "efel_feature_name": "Spikecount",
            "protocol_name": "IDrest_0.256",
            "recording_name": "soma.v",
            "threshold_efeature_std": null,
            "default_std_value": 0.001,
            "mean": 6.6,
            "original_std": 4.5431266766402185,
            "sample_size": 5,
            "efeature_name": "Spikecount",
            "weight": 1.0,
            "efel_settings": {
                "interp_step": 0.025,
                "strict_stiminterval": true,
                "Threshold": -50.11250305175781
            }
        },
        {
            "efel_feature_name": "mean_frequency",
            "protocol_name": "IDrest_0.256",
            "recording_name": "soma.v",
            "threshold_efeature_std": null,
            "default_std_value": 0.001,
            "mean": 4.6549665308108965,
            "original_std": 1.7066939148229534,
            "sample_size": 4,
            "efeature_name": "mean_frequency"

## Setting the parameter for the optimisation

The parameters for the optimisation are defined in [simple.json](./config/simple.json). In this case, we are optimising two parameters: ``gnabar_hh`` and ``gkbar_hh``. These parameters determine the maximum conductances for sodium and potassium ion channels in a Hodgkin-Huxley neuron model, with optimisation ranges of 0.05–0.125 (S/cm2) and 0.01–0.075 (S/cm2), respectively.

In [13]:
# EMC
emc_path = "./config/params/simple.json"
with open(emc_path, 'r') as file:
    emc = json.load(file)

print(json.dumps(emc, indent=4))

{
    "mechanisms": {
        "somatic": {
            "mech": [
                "hh"
            ]
        }
    },
    "distributions": {},
    "parameters": {
        "__comment": "define constants as single values and params to optimise as tuples of bounds: [lower, upper]",
        "global": [
            {
                "name": "v_init",
                "val": -80
            },
            {
                "name": "celsius",
                "val": 34
            }
        ],
        "somatic": [
            {
                "name": "Ra",
                "val": 100
            },
            {
                "name": "cm",
                "val": 1
            },
            {
                "name": "ena",
                "val": 50
            },
            {
                "name": "ek",
                "val": -90
            },
            {
                "name": "gnabar_hh",
                "val": [
                    0.05,
                    0.125
                ]
  

## Running the optimisation

We can now run the optimisation using the `EModel_pipeline` class. The optimisation will run for 5 generations, with a population size of 20.

In [None]:
pipeline.optimise()

Once the model has been fitted, we can store the results in the [final.json](./final.json) file.

In [None]:
pipeline.store_optimisation_results()

## Validating the model

We can also run the validation of the e-model, executing the protocols defined under the `validation_protocols` key in  recipes.json

In [None]:
pipeline.validation()

## Plotting the results

The plots are stored in the `./figures/` directory, organised into subfolders: efeatures_extraction for e-feature figures by protocol, distributions for parameter distributions, optimisation for optimisation curves and progress, parameter_evolution for parameter changes over generations, scores for z-scores of optimised e-features, traces for optimised e-model traces, and currentscape for Currentscape plots.

In [None]:
pipeline.plot(only_validated=False)