# Feature Extraction

This notebook illustrates the feature extraction step applied on the cells of continuous adapting pyramidal cells (cADpyr) e-type.

Feature extraction step is performed using BluePyEfe software. BluePyEfe extracts electrical features from a group of cells.

In [1]:
import json
from pathlib import Path

import bluepyefe as bpefe
import pandas as pd

In [2]:
etype = "cADpyr"

Load the configuration file specifying features to be extracted and the voltage traces to be used.

In [3]:
with open("feature_extraction_config.json", "r") as json_file:
    config = json.load(json_file)

Responses to fixed electrophysiological protocols were recorded for each biological neuron.

Feature extraction step is performed on those recordings that are listed below.

In [4]:
config["features"].keys()

dict_keys(['APWaveform', 'IDrest', 'IDthresh', 'IV', 'SpikeRec', 'Step'])

The features extracted from the IDthresh protocol responses.

In [5]:
print(config["features"]["IDthresh"])

['adaptation_index2', 'mean_frequency', 'time_to_first_spike', 'ISI_log_slope', 'ISI_log_slope_skip', 'time_to_last_spike', 'inv_time_to_first_spike', 'inv_first_ISI', 'inv_second_ISI', 'inv_third_ISI', 'inv_fourth_ISI', 'inv_fifth_ISI', 'inv_last_ISI', 'voltage_deflection', 'voltage_deflection_begin', 'steady_state_voltage', 'decay_time_constant_after_stim']


The features extracted from the APWaveform protocol responses.

In [6]:
print(config["features"]["APWaveform"])

['AP_height', 'AHP_slow_time', 'doublet_ISI', 'AHP_depth_abs_slow', 'AP_width', 'time_to_first_spike', 'AHP_depth_abs', 'AHP_depth', 'fast_AHP', 'AHP_time_from_peak', 'AP1_peak', 'AP2_AP1_peak_diff', 'AP2_width', 'AP1_begin_width', 'AP2_peak', 'AHP2_depth_from_peak', 'AP1_width', 'AP2_begin_width', 'AP2_AP1_begin_width_diff', 'AHP1_depth_from_peak', 'AP1_amp', 'AP2_amp', 'AP_amplitude', 'AP1_amp', 'APlast_amp', 'AP_duration_half_width', 'fast_AHP', 'AHP_time_from_peak']


Extracting the features using the Extractor class.

The primary use-case of the Extractor class is to produce efeatures and protocols json files that can be used as input for single cell model building using BluePyOpt.

In [7]:
extractor = bpefe.Extractor(etype, config)
extractor.disable_extra_feature_plots()
extractor.create_dataset()

# does not produce output, stores object attributes
extractor.extract_features(threshold=-30)
extractor.mean_features()

extractor.analyse_threshold()
extractor.feature_config_cells()
extractor.feature_config_all()

INFO:root: Filling dataset
INFO:root: Extracting features
INFO:root: Setting spike threshold to -30.00 mV


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  log_freq = numpy.log(freq)
  log_freq = numpy.log(freq)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
  f = extra.spikerate_tau_log(peak_times)
100%|██████████| 6/6 [00:27<00:00,  4.60s/it]

INFO:root: Calculating mean features
INFO:root: C060109A1-SR-C1 threshold amplitude: 0.180531 hypamp: -0.024070





INFO:root: C060109A2-SR-C1 threshold amplitude: 0.306797 hypamp: -0.202258
INFO:root: C060109A3-SR-C1 threshold amplitude: 0.225251 hypamp: -0.122463
INFO:root: C070109A4-C1 threshold amplitude: 0.334056 hypamp: -0.267563
INFO:root: C080501A5-SR-C1 threshold amplitude: 0.296731 hypamp: -0.278196
INFO:root: C080501B2-SR-C1 threshold amplitude: 0.325019 hypamp: -0.141169


  return numpy.nanmean(a)
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


INFO:root: Analysing threshold and hypamp and saving files to cADpyr/
INFO:root: Saving config files to cADpyr/C060109A1-SR-C1/
INFO:root: Saving config files to cADpyr/C060109A2-SR-C1/
INFO:root: Saving config files to cADpyr/C060109A3-SR-C1/
INFO:root: Saving config files to cADpyr/C070109A4-C1/
INFO:root: Saving config files to cADpyr/C080501A5-SR-C1/
INFO:root: Saving config files to cADpyr/C080501B2-SR-C1/
INFO:root: Saving config files to cADpyr/


The features are extracted into the './cADpyr' folder.

## Features extracted from single cells

In [8]:
with open(Path(etype) / "C060109A1-SR-C1" / "features.json", "r") as features_file:
    single_cell_features = json.load(features_file)

We are going to use the following function to display the features.

In [9]:
def features_df(features_config: dict, protocol: str) -> pd.DataFrame:
    """Returns the dataframe containing features for the given protocol."""
    df = pd.DataFrame(features_config[protocol]["soma.v"])
    df["mean"] = df["val"].apply(lambda x : x[0])
    df["variance"] = df["val"].apply(lambda x : x[1])
    df["relative_variance"] = df["variance"] / abs(df["mean"])
    df = df.drop(['val', 'fid', 'strict_stim'], axis=1)
    return df

In [10]:
step_120_protocol = "Step_120"
apwaveform_280_protocol = "APWaveform_280"
idrest_all_protocol = "IDrest_all"

The features extracted from the responses of Step_120 protocol are contained in the DataFrame below.

"n" stands for the number of responses used in computing this feature.

The relative variances are computed relative to the absolute value of mean as explained in here.
https://en.wikipedia.org/wiki/Index_of_dispersion

In [11]:
step_df = features_df(single_cell_features, step_120_protocol)
step_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,8,18.557,1.0302,0.055515
1,AHP_slow_time,8,0.2468,0.0225,0.091167
2,ISI_CV,8,0.1303,0.1329,1.019954
3,doublet_ISI,8,305.95,184.2179,0.602118
4,adaptation_index2,8,-0.0467,0.1095,2.344754
5,mean_frequency,8,4.1766,0.788,0.18867
6,AHP_depth_abs_slow,8,-77.4808,0.6311,0.008145
7,AP_width,8,1.6853,0.0208,0.012342
8,time_to_first_spike,8,54.8125,7.3155,0.133464
9,AHP_depth_abs,8,-74.2768,1.845,0.02484


Similarly, the features extracted from the APWaveform_280 responses are below

In [12]:
apwaveform_df = features_df(single_cell_features, apwaveform_280_protocol)
apwaveform_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,3,22.7563,0.4487,0.019718
1,doublet_ISI,3,33.6333,1.7308,0.051461
2,AP_width,3,1.4,0.001,0.000714
3,time_to_first_spike,3,11.7,0.1633,0.013957
4,AHP_depth_abs,3,-61.4458,0.1875,0.003051
5,AHP_depth,3,22.2649,0.2012,0.009037
6,fast_AHP,3,-25.8708,0.1827,0.007062
7,AHP_time_from_peak,3,3.1667,0.2625,0.082894
8,AP1_peak,3,21.9938,0.4467,0.02031
9,AP2_AP1_peak_diff,3,1.525,0.3909,0.256328


The "IDrest_all" protocol contains all responses retrieved from various configurations of the "IDrest" protocols such as "IDrest_120", "IDrest_150" etc.

In [13]:
idrest_all_df = features_df(single_cell_features, idrest_all_protocol)
idrest_all_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,31,17.0403,1.807,0.106043
1,AHP_slow_time,31,0.3168,0.0618,0.195076
2,ISI_CV,30,0.0707,0.0912,1.289958
3,doublet_ISI,31,129.571,256.4417,1.97916
4,adaptation_index2,30,-0.0082,0.061,7.439024
5,mean_frequency,31,11.6463,5.1806,0.444828
6,AHP_depth_abs_slow,31,-71.926,3.346,0.04652
7,AP_width,31,1.9139,0.2142,0.111918
8,time_to_first_spike,31,26.0484,47.8691,1.837698
9,AHP_depth_abs,31,-71.0222,2.7827,0.039181


## Features extracted from a group of cells

In this section we will look at the features extracted from a group of cells having the (cADpyr) e-type.

In [14]:
with open(Path(etype) / "features.json", "r") as features_file:
    etype_features = json.load(features_file)

The "n" column in this dataframe stands for the number of cells used in feature extraction.

In [15]:
etype_step_df = features_df(etype_features, step_120_protocol)
etype_step_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,6,18.0617,5.6467,0.312634
1,AHP_slow_time,6,0.1613,0.0568,0.352139
2,ISI_CV,5,0.2399,0.1884,0.785327
3,doublet_ISI,6,648.1057,349.2538,0.538884
4,adaptation_index2,5,-0.1367,0.1574,1.151426
5,mean_frequency,6,8.4229,2.3862,0.283299
6,AHP_depth_abs_slow,6,-71.8349,4.7759,0.066484
7,AP_width,6,2.1144,0.618,0.292281
8,time_to_first_spike,6,50.8854,7.418,0.145779
9,AHP_depth_abs,6,-70.3445,4.2667,0.060654


Similarly, the features extracted from the APWaveform responses are below.

In [16]:
etype_apwaveform_df = features_df(etype_features, apwaveform_280_protocol)
etype_apwaveform_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,5,19.4467,6.327,0.325351
1,doublet_ISI,5,23.9,7.1127,0.297603
2,AP_width,5,2.01,0.6168,0.306866
3,time_to_first_spike,5,12.0533,0.9368,0.077721
4,AHP_depth_abs,5,-57.4887,3.2991,0.057387
5,AHP_depth,5,25.3753,2.873,0.11322
6,fast_AHP,5,-37.2592,9.6825,0.259869
7,AHP_time_from_peak,5,5.7667,2.3573,0.408778
8,AP1_peak,5,18.8471,7.136,0.378626
9,AP2_AP1_peak_diff,5,1.1992,2.5558,2.131254


The features extracted from "IDrest_all" protocol for the e-type.

In [17]:
etype_idrest_all_df = features_df(etype_features, idrest_all_protocol)
etype_idrest_all_df

Unnamed: 0,feature,n,mean,variance,relative_variance
0,AP_height,6,17.168,4.4762,0.260729
1,AHP_slow_time,6,0.305,0.0219,0.071803
2,ISI_CV,6,0.092,0.0316,0.343478
3,doublet_ISI,6,84.7086,30.3353,0.358114
4,adaptation_index2,6,-0.0028,0.0073,2.607143
5,mean_frequency,6,13.5539,1.873,0.138189
6,AHP_depth_abs_slow,6,-63.1301,6.3405,0.100435
7,AP_width,6,3.0183,1.3588,0.450187
8,time_to_first_spike,6,20.6241,3.9515,0.191596
9,AHP_depth_abs,6,-63.057,6.0201,0.095471
