# Feature Extraction

This notebook illustrates the feature extraction step applied on the cells of continuous adapting pyramidal cells (cADpyr) e-type.

Feature extraction step is performed using BluePyEfe software. BluePyEfe extracts electrical features from a group of cells.

In [None]:
import json
from pathlib import Path

import bluepyefe as bpefe
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import pandas as pd
import seaborn as sns


%matplotlib inline
sns.set_style("darkgrid")
plt.rcParams["figure.figsize"] = (12,12)

In [None]:
etype = "cADpyr"

Load the configuration file specifying features to be extracted and the voltage traces to be used.

In [None]:
with open("feature_extraction_config.json", "r") as json_file:
    config = json.load(json_file)

Responses to fixed electrophysiological protocols were recorded for each biological neuron.

Feature extraction step is performed on those recordings that are listed below.

In [None]:
config["features"].keys()

The features extracted from the IDthresh protocol responses.

In [None]:
print(config["features"]["IDthresh"])

The features extracted from the APWaveform protocol responses.

In [None]:
print(config["features"]["APWaveform"])

Extracting the features using the Extractor class.

The primary use-case of the Extractor class is to produce efeatures and protocols json files that can be used as input for single cell model building using BluePyOpt.

In [None]:
extractor = bpefe.Extractor(etype, config)
extractor.disable_extra_feature_plots()
extractor.create_dataset()

# does not produce output, stores object attributes
extractor.extract_features(threshold=-30)
extractor.mean_features()

extractor.analyse_threshold()
extractor.feature_config_cells()
extractor.feature_config_all()

The features are extracted into the './cADpyr' folder.

## Features extracted from single cells

In [None]:
with open(Path(etype) / "C060109A1-SR-C1" / "features.json", "r") as features_file:
    single_cell_features = json.load(features_file)

We are going to use the following functions in plotting the features.

In [None]:
def features_df(features_config: dict, protocol: str) -> pd.DataFrame:
    """Returns the dataframe containing features for the given protocol."""
    df = pd.DataFrame(features_config[protocol]["soma.v"])
    df["mean"] = df["val"].apply(lambda x : x[0])
    df["variance"] = df["val"].apply(lambda x : x[1])
    df["relative_variance"] = df["variance"] / abs(df["mean"])
    df = df.drop(['val', 'fid', 'strict_stim'], axis=1)
    return df

def feature_plot(df: pd.DataFrame, protocol: str) -> None:
    """Plots the features of a dataframe containing features extracted from a protocol."""
    _, axs = plt.subplots()
    axs.errorbar(y = range(len(df)), x = df["mean"], xerr=df['relative_variance'], fmt='o', color='midnightblue',
                ecolor='steelblue', elinewidth=2.5, capsize=6)
    axs.set_yticks(range(len(df)))
    axs.set_yticklabels(df["feature"])
    axs.xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
    axs.set_xlabel("feature values")
    plt.title(f"Features extracted on {protocol} protocol responses")
    plt.show()

def drop_large_values_features(df: pd.DataFrame) -> pd.DataFrame:
    """Returns a dataframe with the features that have a mean value smaller than 100."""
    return df[df["mean"] < 100]

In [None]:
step_120_protocol = "Step_120"
apwaveform_280_protocol = "APWaveform_280"
idrest_all_protocol = "IDrest_all"

The features extracted from the responses of Step_120 protocol are contained in the DataFrame below.

"n" stands for the number of responses used in computing this feature.


In [None]:
step_df = features_df(single_cell_features, step_120_protocol)
step_df

Here we drop the features with a mean value larger than 100, for plotting.
Otherwise in the presence of large mean values, the variances become very hard to see.

In [None]:
step_df = drop_large_values_features(step_df)

The figure below illustrates the feature mean and relative variances computed from the Step_120 protocol responses.

The relative variances are computed relative to the absolute value of mean as explained in here.
https://en.wikipedia.org/wiki/Index_of_dispersion

In [None]:
feature_plot(step_df, step_120_protocol)

Similarly, the features extracted from the APWaveform_280 responses are below

In [None]:
apwaveform_df = features_df(single_cell_features, apwaveform_280_protocol)
apwaveform_df = drop_large_values_features(apwaveform_df)
apwaveform_df.head(10)

In [None]:
feature_plot(apwaveform_df, apwaveform_280_protocol)

The "IDrest_all" protocol contains all responses retrieved from various configurations of the "IDrest" protocols such as "IDrest_120", "IDrest_150" etc.

In [None]:
idrest_all_df = features_df(single_cell_features, idrest_all_protocol)
idrest_all_df = drop_large_values_features(idrest_all_df)
idrest_all_df.head(10)

In [None]:
feature_plot(idrest_all_df, idrest_all_protocol)

## Features extracted from a group of cells

In this section we will look at the features extracted from a group of cells having the (cADpyr) e-type.

In [None]:
with open(Path(etype) / "features.json", "r") as features_file:
    etype_features = json.load(features_file)

The "n" column in this dataframe stands for the number of cells used in feature extraction.

In [None]:
etype_step_df = features_df(etype_features, step_120_protocol)
etype_step_df = drop_large_values_features(etype_step_df)
etype_step_df.head(10)

In [None]:
feature_plot(etype_step_df, step_120_protocol)

Similarly, the features extracted from the APWaveform responses are below.

In [None]:
etype_apwaveform_df = features_df(etype_features, apwaveform_280_protocol)
etype_apwaveform_df = drop_large_values_features(etype_apwaveform_df)
etype_apwaveform_df.head(10)

In [None]:
feature_plot(etype_apwaveform_df, apwaveform_280_protocol)

The features extracted from "IDrest_all" protocol for the e-type.

In [None]:
etype_idrest_all_df = features_df(etype_features, idrest_all_protocol)
etype_idrest_all_df = drop_large_values_features(etype_idrest_all_df)
etype_idrest_all_df.head(10)

In [None]:
feature_plot(etype_idrest_all_df, idrest_all_protocol)