# 02 Features basics

Let's start with an existing analysis configuration, that we copy into a temporary location to be used as working directory.

In [1]:
from pathlib import Path
import shutil
import tempfile

workdir = Path(tempfile.gettempdir(), "blueetl_tmp")
workdir.mkdir(exist_ok=True)

config_file = Path("../../../tests/functional/data/sonata/config/analysis_config_09.yaml")
config_file = Path(shutil.copy(config_file, workdir))
print(config_file)
print(config_file.read_text())

/var/folders/9y/pv21h2ld5h76ph0hplxwcy_17tvc86/T/blueetl_tmp/analysis_config_09.yaml
# simple configuration with extraction and analysis, and combination of parameters
version: 3
simulation_campaign: /gpfs/bbp.cscs.ch/project/proj12/NSE/blueetl/data/sim-campaign-sonata/a04addca-bda3-47d7-ad2d-c41187252a2b/config.json
output: analysis_output
analysis:
  spikes:
    extraction:
      report:
        type: spikes
      neuron_classes:
        Rt_EXC: {layer: [Rt], synapse_class: [EXC]}
        VPL_EXC: {layer: [VPL], synapse_class: [EXC]}
        Rt_INH: {layer: [Rt], synapse_class: [INH]}
        VPL_INH: {layer: [VPL], synapse_class: [INH]}
      limit: 1000
      population: thalamus_neurons
      node_set: null
      windows:
        w1: {bounds: [20, 90], window_type: spontaneous}
        w2: {bounds: [10, 70], initial_offset: 10, n_trials: 3, trial_steps_value: 10}
    features:
    - type: multi
      groupby: [simulation_id, circuit_id, neuron_class, window]
      function: blueet

We can now initialize a MultiAnalyzer object with the following code, where you can specify different parameters if needed:

In [2]:
from blueetl.analysis import run_from_file

ma = run_from_file(
    config_file,
    extract=False,
    calculate=False,
    show=False,
    clear_cache=True,
    loglevel="ERROR",
)
print(ma)

<blueetl.analysis.MultiAnalyzer object at 0x115f47550>


Since we passed `extract=False` to the previous call, we have to extract the repository explicitly:

In [3]:
ma.extract_repo()

And since we passed `calculate=False` to the previous call, we have to calculate the features explicitly:

In [4]:
ma.calculate_features()

We can now inspect the list of analyses in the MultiAnalyzer object:

In [5]:
ma.names

['spikes']

and access each of them as an Analyzer object:

In [6]:
ma.spikes

<blueetl.analysis.Analyzer at 0x104cf2bd0>

Each Analyzer object provides two special attributes: `repo` and `features`, that can be used to access the extracted data and the calculated features.

You can inspect the list of extracted and calculated DataFrames calling `names` on them, as shown below:

In [7]:
ma.spikes.repo.names

['simulations',
 'neurons',
 'neuron_classes',
 'trial_steps',
 'windows',
 'report']

In [8]:
ma.spikes.features.names

['by_gid',
 'by_gid_0_0__0',
 'by_gid_0_0__1',
 'by_gid_0_1__0',
 'by_gid_0_1__1',
 'by_gid_1_0__0',
 'by_gid_1_0__1',
 'by_gid_1_1__0',
 'by_gid_1_1__1',
 'by_gid_2_0__0',
 'by_gid_2_0__1',
 'by_gid_2_1__0',
 'by_gid_2_1__1',
 'by_gid_and_trial',
 'by_gid_and_trial_0_0__0',
 'by_gid_and_trial_0_0__1',
 'by_gid_and_trial_0_1__0',
 'by_gid_and_trial_0_1__1',
 'by_gid_and_trial_1_0__0',
 'by_gid_and_trial_1_0__1',
 'by_gid_and_trial_1_1__0',
 'by_gid_and_trial_1_1__1',
 'by_gid_and_trial_2_0__0',
 'by_gid_and_trial_2_0__1',
 'by_gid_and_trial_2_1__0',
 'by_gid_and_trial_2_1__1',
 'by_neuron_class',
 'by_neuron_class_0_0__0',
 'by_neuron_class_0_0__1',
 'by_neuron_class_0_1__0',
 'by_neuron_class_0_1__1',
 'by_neuron_class_1_0__0',
 'by_neuron_class_1_0__1',
 'by_neuron_class_1_1__0',
 'by_neuron_class_1_1__1',
 'by_neuron_class_2_0__0',
 'by_neuron_class_2_0__1',
 'by_neuron_class_2_1__0',
 'by_neuron_class_2_1__1',
 'by_neuron_class_and_trial',
 'by_neuron_class_and_trial_0_0__0',
 'by_

You can access the wrapped DataFrames using the `df` attribute on each object:

In [9]:
ma.spikes.repo.report.df

Unnamed: 0,time,gid,window,trial,simulation_id,circuit_id,neuron_class
0,40.475,32622,w1,0,0,0,Rt_INH
1,44.200,31621,w1,0,0,0,Rt_INH
2,45.600,30061,w1,0,0,0,Rt_INH
3,47.225,29823,w1,0,0,0,Rt_INH
4,47.675,31448,w1,0,0,0,Rt_INH
...,...,...,...,...,...,...,...
2187,68.325,42493,w2,2,1,0,VPL_INH
2188,68.550,42474,w2,2,1,0,VPL_INH
2189,69.425,42479,w2,2,1,0,VPL_INH
2190,69.650,42508,w2,2,1,0,VPL_INH


The DataFrames of features can be accessed in the same way:

In [10]:
ma.spikes.features.by_neuron_class_0_0__0.df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,mean_of_mean_spike_counts,mean_of_mean_firing_rates_per_second,std_of_mean_firing_rates_per_second,mean_of_spike_times_normalised_hist_1ms_bin,min_of_spike_times_normalised_hist_1ms_bin,max_of_spike_times_normalised_hist_1ms_bin,argmax_spike_times_hist_1ms_bin
simulation_id,circuit_id,neuron_class,window,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0,Rt_INH,w1,0.091,1.3,6.95054,0.0013,0.0,0.007,35
0,0,Rt_INH,w2,0.090333,1.505556,8.007633,0.001506,0.0,0.004333,31
0,0,VPL_EXC,w1,0.054,0.771429,5.589421,0.000771,0.0,0.005,22
0,0,VPL_EXC,w2,0.048667,0.811111,5.83382,0.000811,0.0,0.002333,10
0,0,VPL_INH,w1,0.45614,6.516291,17.493663,0.006516,0.0,0.023392,29
0,0,VPL_INH,w2,0.374269,6.237817,16.636431,0.006238,0.0,0.01462,9
1,0,Rt_INH,w1,0.092,1.314286,7.471524,0.001314,0.0,0.006,39
1,0,Rt_INH,w2,0.090333,1.505556,8.522952,0.001506,0.0,0.003667,32
1,0,VPL_EXC,w1,0.048,0.685714,5.339418,0.000686,0.0,0.004,21
1,0,VPL_EXC,w2,0.043,0.716667,5.671999,0.000717,0.0,0.002,10


and in this case also the `attrs` dictionary attached to the DataFrame is populated with the parameters used for the computation:

In [11]:
ma.spikes.features.by_neuron_class_0_0__0.df.attrs

{'config': {'type': 'multi',
  'name': None,
  'groupby': ['simulation_id', 'circuit_id', 'neuron_class', 'window'],
  'function': 'blueetl.external.bnac.calculate_features.calculate_features_multi',
  'neuron_classes': [],
  'windows': [],
  'params': {'export_all_neurons': True,
   'ratio': 0.25,
   'nested_example': {'params': {'bin_size': 1}},
   'param1': 10,
   'param2': 11},
  'params_product': {},
  'params_zip': {},
  'suffix': '_0_0__0'}}

The parameters have been automatically calculated combining `params`, `params_product`, and `params_zip` from the original configuration.

In this case, it may be convenient to access a single DataFrame contaning the concatenation of the features of the same type, where the varying parameters are added as new columns.

The name of the DataFrame is the same as the split DataFrames, without the suffix:

In [12]:
ma.spikes.features.by_neuron_class.df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,mean_of_mean_spike_counts,mean_of_mean_firing_rates_per_second,std_of_mean_firing_rates_per_second,mean_of_spike_times_normalised_hist_1ms_bin,min_of_spike_times_normalised_hist_1ms_bin,max_of_spike_times_normalised_hist_1ms_bin,argmax_spike_times_hist_1ms_bin,params_id,ratio,bin_size,param1,param2
simulation_id,circuit_id,neuron_class,window,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,0,Rt_INH,w1,0.091000,1.300000,6.950540,0.001300,0.000000,0.007000,35,0,0.25,1,10,11
0,0,Rt_INH,w2,0.090333,1.505556,8.007633,0.001506,0.000000,0.004333,31,0,0.25,1,10,11
0,0,VPL_EXC,w1,0.054000,0.771429,5.589421,0.000771,0.000000,0.005000,22,0,0.25,1,10,11
0,0,VPL_EXC,w2,0.048667,0.811111,5.833820,0.000811,0.000000,0.002333,10,0,0.25,1,10,11
0,0,VPL_INH,w1,0.456140,6.516291,17.493663,0.006516,0.000000,0.023392,29,0,0.25,1,10,11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1,0,Rt_INH,w2,0.090333,1.505556,8.522952,0.001506,0.000000,0.003667,32,11,0.75,2,20,21
1,0,VPL_EXC,w1,0.048000,0.685714,5.339418,0.000686,0.000000,0.004000,21,11,0.75,2,20,21
1,0,VPL_EXC,w2,0.043000,0.716667,5.671999,0.000717,0.000000,0.002000,10,11,0.75,2,20,21
1,0,VPL_INH,w1,0.461988,6.599833,17.901046,0.006600,0.000000,0.017544,26,11,0.75,2,20,21


Note that the column names in the previous DataFrame have been shortened. You can see the full names in the `aliases` DataFrame:

In [13]:
ma.spikes.features.by_neuron_class.aliases

Unnamed: 0,column,alias
0,ratio,ratio
1,nested_example.params.bin_size,bin_size
2,param1,param1
3,param2,param2


You can also inspect all the parameters that were used for the computation, accessing the `params` attribute:

In [14]:
ma.spikes.features.by_neuron_class.params

Unnamed: 0_level_0,export_all_neurons,ratio,nested_example.params.bin_size,param1,param2
params_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,True,0.25,1,10,11
1,True,0.25,1,20,21
2,True,0.25,2,10,11
3,True,0.25,2,20,21
4,True,0.5,1,10,11
5,True,0.5,1,20,21
6,True,0.5,2,10,11
7,True,0.5,2,20,21
8,True,0.75,1,10,11
9,True,0.75,1,20,21


During the extraction and computation, some files have been created to be used as cache.

Usually you don't need to access them directly, and if they are deleted they will be created again at the next run.

They may be automatically deleted when the cache is invalidated.

In [15]:
!tree {workdir}

[01;34m/var/folders/9y/pv21h2ld5h76ph0hplxwcy_17tvc86/T/blueetl_tmp[0m
├── [00manalysis_config_09.yaml[0m
└── [01;34manalysis_output[0m
    └── [01;34mspikes[0m
        ├── [01;34mconfig[0m
        │   ├── [00manalysis_config.cached.yaml[0m
        │   ├── [00mchecksums.cached.yaml[0m
        │   └── [00msimulations_config.cached.yaml[0m
        ├── [01;34mfeatures[0m
        │   ├── [00mby_gid_0_0__0.parquet[0m
        │   ├── [00mby_gid_0_0__1.parquet[0m
        │   ├── [00mby_gid_0_1__0.parquet[0m
        │   ├── [00mby_gid_0_1__1.parquet[0m
        │   ├── [00mby_gid_1_0__0.parquet[0m
        │   ├── [00mby_gid_1_0__1.parquet[0m
        │   ├── [00mby_gid_1_1__0.parquet[0m
        │   ├── [00mby_gid_1_1__1.parquet[0m
        │   ├── [00mby_gid_2_0__0.parquet[0m
        │   ├── [00mby_gid_2_0__1.parquet[0m
        │   ├── [00mby_gid_2_1__0.parquet[0m
        │   ├── [00mby_gid_2_1__1.parquet[0m
        │   ├── [00mby_gid_and_trial_0_0__0.pa

You can remove the full working directory if you don't need it anymore:

In [16]:
shutil.rmtree(workdir)