Skip to content

The configuration file

drivenbyentropy edited this page Feb 5, 2018 · 13 revisions

Graphical User Interface

The configuration file with automatically be generated when creating a new experiment and updated with the according parameters when running a particular algorithm. Note that once created, the file is compatible with the command line version and can be used interchangeably.

Command Line Interface

Since AptaSUITE comprises a collection of tools, passing all relevant configuration parameters over the command line would be very time consuming and ineffective. Hence, a central configuration files together with sensible default values where applicable is used and passed to AptaSUITE with each call. In the following the general architecture of the file is discussed, as well as a list of default parameters providing a basic configuration for AptaSUITE. Additional required parameters for each subroutine are discussed in their corresponding sections of the manual.

Each parameter in the file takes the shape of

# Optional comment
Category.property = value

Mandatory Parameters

A minimal configuration file must contain at least a description of the SELEX experiment to be processed. This includes a name for the experiment (e.g. the target it was selected against), a short description of the experiment, a project path which will be used to store all data in, and the primer sequences used for the amplification stages of the protocol. Please note that the 3' primers must be specified in 5' to 3' direction of the aptamer sequence as used during the selection.

# Experiment configuration
Experiment.name = "SELEX against target X"
Experiment.description = "5 rounds of selection including the initial pool"
# on windows use C:/path/to/project/folder 
Experiment.projectPath = /path/to/project/folder 

Experiment.primer5 = GGGAGGACGATGCGG
# OPTIONAL, only specify if the 3' primer was part of the sequenced data.
# If not specified, we need to specify the randomized region size
Experiment.primer3 = CAGACGACTCGCCCGA
# Experiment.randomizedRegionSize = 40

Next, for each sequenced selection cycle, AptaSUITE requires a name, the round number, and information regarding control/counter selection to be specified.

### Selection Cycle Information ###
SelectionCycle.name = Round0
SelectionCycle.round = 0
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

SelectionCycle.name = Round1
SelectionCycle.round = 1
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

SelectionCycle.name = Round3
SelectionCycle.round = 3
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

SelectionCycle.name = Round5
SelectionCycle.round = 5
SelectionCycle.isControlSelection = False
SelectionCycle.isCounterSelection = False

Default Parameters

In addition, the following optional parameters can be changed from their default values. This however should only be required if AptaSUITE runs into performance or memory issues.

# Specifies the maximal number of CPU cores to be used in the multi-threaded parts 
# of AptaSUITE. If this value is larger then the available number of cores, the 
# smaller of both values is taken.
Performance.maxNumberOfCores = 30

AptaSUITE is designed with flexibility in mind. It provides APIs for storage, retrieval, and manipulation of aptamer sequences, selection cycle composition, structure information, as well as cluster information which can be implemented by different data structures. These back-ends can be chosen with the following parameters:

# The default back-end for storing aptamer sequence information
AptamerPool.backend = MapDBAptamerPool

# The default back-end for storing the counts of each aptamer in a
# particular selection cycle
SelectionCycle.backend = MapDBSelectionCycle

# The default back-end storing the secondary structure information 
StructurePool.backend = MapDBStructurePool

Currently, the only available back-ends for the these APIs are the ones indicated above, however additional back-ends are planned in future releases. The current implementations are based on MapDB and designed for memory efficient storage of structured data. In fact, using MapDB allows AptaSUITE to run efficiently on modern desktop systems instead of having to rely on high-performance, memory-extensive server solutions.

In what follows, any parameters and their default values specific to the MapDB back-ends are described.

# In order to avoid time-consuming disk I/O, a bloom filter is used to store the information 
# whether an aptamer is present in the pool or not. This value should be at least a large as
# the total number of reads that were sequenced.
MapDBAptamerPool.bloomFilterCapacity = 500000000

# The corresponding collision probability of the bloom filter. The smaller this value the 
# more memory this data structure will consume.
MapDBAptamerPool.bloomFilterCollisionProbability = 0.001

# To prevent the creation of large files on disk which would yield slower lookup times,
# the pool is partitioned into smaller units with the capacity as specified below.
MapDBAptamerPool.maxTreeMapCapacity = 1000000

# The corresponding collision probability of the bloom filter used for the selection 
# cycle implementation. The capacity has the same value as MapDBAptamerPool.bloomFilterCapacity
MapDBSelectionCycle.bloomFilterCollisionProbability = 0.001

# The corresponding collision probability of the bloom filter used for the structure 
# information. The capacity has the same value as MapDBAptamerPool.bloomFilterCapacity
MapDBStructurePool.bloomFilterCollisionProbability = 0.001

# To prevent the creation of large files on disk which would yield slower lookup times,
# the structure information is partitioned into smaller units with the capacity as specified below.
MapDBStructurePool.maxTreeMapCapacity = 500000