Skip to content
Sergey Aganezov jr edited this page Oct 30, 2016 · 10 revisions

CAMSA is primarily written in Python programming language and is distributed via PyPI. This allows for the seamless installation process and further usage. During the installation process all runnable scripts are automatically added to the path of the user (system), where CAMSA in installed.

If scaffold assemblies, that one wishes to process with CAMSA, are not in the CAMSA format of sets of assembly points, input can be prepared into CAMSA suitable using built-in conversion scripts. Please refer to the input wiki page for more details.

Running CAMSA

Once you have the input prepared in the CAMSA assembly points format, the actual execution of CAMSA (with no additional options) is very simple and straightforward: f1.camsa.points [f2.camsa.points ...] -o output_dir

We emphasize, that input assemblies can be all represented in a single file, or be spread across multiple files. The columns origin is responsible for the name of the assembly, that each assembly points comes from. We find it convenient, to separate different input scaffold assemblies into different files, but CAMSA is flexible in terms of how input fi spread out.

Below is the output of CAMSA being run with -h/--help flags, that explains all possible command line arguments, as well as their description:

  usage: [-h] [-c CONFIG] [--c-cw-exact C_CW_EXACT]
                    [--c-cw-candidate C_CW_CANDIDATE]
                    [--c-merging-cw-min C_MERGING_CW_MIN]
                    [--c-merging-strategy {greedy,maximal-matching}]
                    [--c-merging-cycles] [--version] [-o O_DIR]
                    [--o-merged-format O_MERGED_FORMAT]
                    [--o-subgroups-format O_SUBGROUPS_FORMAT]
                    [--o-collapsed-format O_COLLAPSED_FORMAT]
                    [--o-original-format O_ORIGINAL_FORMAT]
                    [--c-logging-level {0,10,20,30,40,50}]
                    [--c-logging-formatter-entry C_LOGGING_FORMATTER_ENTRY]
                    points [points ...]
  Args that start with '--' (eg. --c-cw-exact) can also be set in a config file (camsa/run_camsa.ini or
  /camsa/logging.ini or specified via -c). The recognized syntax for setting (key, value) pairs is based on the
  INI and YAML formats (e.g. key=value or foo=TRUE). For full documentation of the differences from the 
  standards please refer to the ConfigArgParse documentation. If an arg is specified in more than one place, 
  then command line values override config file values which override defaults.

  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Config file path with settings for CAMSA to run with.
                        Overwrites the default CAMSA configuration file.
                        Values in config file can be overwritten by command line arguments.
  --c-cw-exact C_CW_EXACT
                        A confidence weight value assigned to oriented assembly points and respective exact assembly edges,
                        in case "?" is specified as the respective assembly point confidence weight.
                        DEFAULT: 1.0
  --c-cw-candidate C_CW_CANDIDATE
                        A confidence weight value assigned to semi/un-oriented assembly points and respective candidate assembly edges,
                        in case "?" is specified as the respective assembly point confidence weight.
                        DEFAULT: 0.75
  --c-merging-cw-min C_MERGING_CW_MIN
                        A threshold for the minimum cumulative confidence weight for merged assembly edges in MSAG.
                        Edges with confidence weight below are not considered in the "merged" assembly construction.
                        DEFAULT: 0.0
  --c-merging-strategy {greedy,maximal-matching}
                        A strategy to produced a merged assembly from the given ones.
                        DEFAULT: maximal-matching
  --c-merging-cycles    Whether to allow cycles in the produced merged assembly.
                        DEFAULT: False
  --version             show program's version number and exit
  -o O_DIR, --o-dir O_DIR
                        A directory, where CAMSA will store all of the produced output (report, assets, etc).
                        DEFAULT: camsa_{date}
  --o-merged-format O_MERGED_FORMAT
                        The CAMSA-out formatting for the merged scaffold assemblies in a form of CAMSA points.
  --o-subgroups-format O_SUBGROUPS_FORMAT
                        The CAMSA-out formatting for the merged scaffold assemblies in a form of CAMSA points.
   --o-collapsed-format O_COLLAPSED_FORMAT
                        The CAMSA-out formatting for the collapsed assembly points and their computed conflicts.
  --o-original-format O_ORIGINAL_FORMAT
                        The CAMSA-out formatting for the non-collapsed assembly points and their computed conflicts.

  --c-logging-level {0,10,20,30,40,50}
                        Logging level for CAMSA.
                        DEFAULT: 20
  --c-logging-formatter-entry C_LOGGING_FORMATTER_ENTRY
                        Format string for python logger.


CAMSA package is equipped with several examples. Here we are going to demonstrate how to run CAMSA on GAGE1 example: a set of 4 scaffold assemblies obtained by ScaffMatch, SOAPdenovo2, SGA, SSPACE on the S. Aureus contig assembly (produced by Allpaths-LG) using the same shortjump library.

All of the corresponding fasta files were translated into CAMSA assembly points using the following command: contigs.fasta scaffmatch.fasta sga.fasta soap2.fasta sspace.fasta -o .

where contigs.fasta is the Allpaths-LG contig-stage assembly of S. Aureus and other xxx.fasta files represent the scaffold assemblies obtained by the the corresponding xxx scaffolder.

This script, using NUCmer software will infer assembly points between pairs of scaffolds in each assembly and create 4 files: soap2.camsa.points, sga.camsa.points, sspace.camsa.points, scaffmatch.camsa.points. The corresponding .camsa.points file can be found in CAMSA distribution in the camsa/examples/gage/exp1/ subfolder. From now on we consider that this subfolder is the working directory. Once these files are located, CAMSA can be run: sga.camsa.points sspace.camsa.points scaffmatch.camsa.points soap2.camsa.points -o output_dir

We remark, that by default creates individual files with assembly points for each input scaffold assembly, but CAMSA doesn't care whether scaffold assemblies are separated into different files or not, as long as the origin column reflects the source for a specific assembly point. So the mention above CAMSA running command will produce the same results as the command below: as.camsa.points -o output_dir

where as.camsa.points file contains information about assembly points in all of 4 input assemblies. We also note that we use the .camsa.points extensions for the files, where the information about the assembly points is stored, but as said in the input wiki page, these files are nothing more that tab-separated text files with headers.

Once CAMSA is finished running (for this example CAMSA finishes in less than 2(!!!) seconds), the output_dir will contain all the analysis and merging information produced by CAMSA. For more details about CAMSA output please refer to the output wiki page, and here we will just who how to obtain merged assembly results in a FASTA format. The command shown below translates CAMSA assembly points that constitute the merged assembly into FASTA sequences, using the contigs.fasta file as the source or the actual sequence information. --points output_dir/merged/merged.camsa.points --fasta contigs.fasta -o merged.fasta

For more details on different conversion utils provided with CAMSA please refer to the dedicated utils wiki page.