# Configuring a topological analysis of circuit subvolumes.

Here we describe how a scientist can configure a complete analysis of subtargets
in the circuit's flatmap.

Each individual step in a topological analysis is specified in a master configuration.
Here we describe the various sections in such a configuration.

We have implemented config loaders. So let use them to load our current working
configuration.

In [1]:
from importlib import reload
from pathlib import Path
from connsense.io import logging, read_config, write_results

LOG = logging.get_logger("TopoAnalysis Configurations", "INFO")


 2021-11-12 11:48:32,949: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
 2021-11-12 11:48:32,950: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
 2021-11-12 11:48:32,950: NumExpr defaulting to 8 threads.


In [2]:
reload(read_config); reload(write_results)
proj83 = Path("/gpfs/bbp.cscs.ch/project/proj83")
path_results = (proj83 
                /"home/sood/analyses/manuscript/topological-analysis-subvolumes"
                /"notebooks" / "results")

raw_config = read_config.read(path_results / "develop-config.json", raw=True)
LOG.info("Load a configuration from %s", path_results)

config = read_config.read(path_results / "develop-config.json")

 2021-11-12 11:48:33,576: Load a configuration from /gpfs/bbp.cscs.ch/project/proj83/home/sood/analyses/manuscript/topological-analysis-subvolumes/notebooks/results


The configuration we have loaded above is the raw dict specified as JSON.
Internally, the *paths* provided in the configuration are parsed to resolve
the absolute path that can be specified in the JSON.

In the JSON, we have

In [4]:
LOG.info("Sections defined in the configuration:")
for label, section in raw_config.items():
    LOG.info("\t%s: %s", label, type(section))

2021-11-07 17:39:08 INFO     Sections defined in the configuration:
2021-11-07 17:39:08 INFO     	paths: <class 'dict'>
2021-11-07 17:39:08 INFO     	parameters: <class 'dict'>


Currently we have only two sections that we discuss next.

## Paths
To run an analysis, we need the paths to the inputs, and paths where the results of the
anlysis must be saved.
Input / output paths are provided in the configuration in the *paths* section.


### Input paths


As inputs, a topological analysis requires the circuit and it's flatmap.
Inputs to the pipeline must not be be confused with *input-parameters* for
each of its individual steps, which the config specifies in a separated section.

Input paths are specified in the JSON as separated entries for `circuit`, and `flatmap`
in the section for `paths`
In our working config, the paths are specified for the input circuits as follows:

In [5]:
for label, specified in raw_config["paths"]["circuit"].items():
    LOG.info("%s: %s", label, specified)

2021-11-07 17:39:12 INFO     root: /gpfs/bbp.cscs.ch/project/proj83/circuits
2021-11-07 17:39:12 INFO     files: {'Bio_M': '20200805/CircuitConfig_TC_WM'}


The dict format providing a separate entry for `root` allows the loading of 
several circuits specified in the mapping `files` that are stored under `root`.
This feature will be useful in the analysis of more than on SSCx variants,
If no `root` is provided, the paths specified in `files` will be assumed to be
absolute paths to `bluepy.Circuit`'s `CircuitConfig`s.

The paths specified in the config are parsed to be absolute paths when loaded:

In [6]:
LOG.info("input paths %s", config["paths"]["circuit"])

2021-11-07 17:39:14 INFO     input paths {'Bio_M': '/gpfs/bbp.cscs.ch/project/proj83/circuits/20200805/CircuitConfig_TC_WM'}


A similar entry allows for specifying a single flatmap for all analyzed circuits,
or one for each individual one. 
Entry for `flatmap` may be omitted if the analyzed circuits already have a flatmap
entered in their `atlas`.

### Paths for pipeline steps

Circuit, and flatmap inputs can be used to run the first step of the pipeline,
*i.e* `default-subtargets`. The output of `default-subtargets` is piped to
the next step `extract-neurons`, and so on...
Thus the pipeline will require a location to store the results for each path.
We have decided to use a single HDF root archive for all the results.
Thus pipeline steps' paths must be configured by specifying a `root` and `keys`
in a JSON hash:

In [7]:
LOG.info("Analysis data will be saved at \n\t: %s\n",
         raw_config["paths"]["steps"]["root"])

LOG.info("Analysis step HDF5 groups: \n")
for step, group in raw_config["paths"]["steps"]["groups"].items():
    LOG.info("%s: %s", step, group)

2021-11-07 17:39:16 INFO     Analysis data will be saved at 
	: /gpfs/bbp.cscs.ch/project/proj83/home/sood/analyses/manuscript/define_subtargets/notebooks/results/topological_sampling.h5

2021-11-07 17:39:16 INFO     Analysis step HDF5 groups: 

2021-11-07 17:39:16 INFO     define-subtargets: subtargets
2021-11-07 17:39:16 INFO     extract-neurons: neurons
2021-11-07 17:39:16 INFO     evaulate-subtargets: subtarget_quality
2021-11-07 17:39:16 INFO     extract-connectivity: con_mats/original
2021-11-07 17:39:16 INFO     randomize-matrices: con_mats/randomized
2021-11-07 17:39:16 INFO     analyze-connecttivity: analysis


Not all steps need to be specified, and the code will complain if a required step's path are missing. Note that the config must provide paths for steps not only to be run, but also paths to the steps whose output is required as input to the step run. In the future we will auto-run all the required steps whose input is required and not already available in the archive.


The group con_mats/con_mats_original indicates that all connection matrices will be stored under one group, A family of connection matrices will be stored under subgroups of con_mats, with the original connection matrices extracted in this step finding a home under the subgroup con_mats_original. Another group you can read in the config as one specified for storing randomized connection matrices as con_mats/randomized.

TODO Should con_mats/randomized be con_mats/con_mats_randomized? or con_mats/con_mats_original be renamed to con_mats/original?

To run extraction of connections, we will need the output of define-subtargets:

## Parameters

**TODO** Describe the *parameters* section of the config.