Skip to content
lryaninsilico edited this page Dec 13, 2017 · 20 revisions

SafeSeqS Installation

Use pip to install the safeseqs package:

$ pip install safeseqs

SafeSeqS depends on several other packages and these will be installed automatically by pip if they are not already installed.

Required packages:

  • scipy

Contents

Install
Quick Start
Example

Quick Start

Running SafeSeqS requires (1) a set of input files in the Study Directory (2) a settings JSON file identifying study parameters in the Study Data Directory.

Runtime parameter descriptions

-d Required Directory containing the Study input files.
-r Required Run directory. Will be created under the Study directory.
-sf Required Settings file that identifies the parameters for this run. Settings file must exist in the SafeSeqs Data Directory.
-w Optional Number of concurrent worker sub-processes to run. Default: 1
-s Optional Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step.
-e Optional End Stepname. Used when a partial run is desired. Processing will end before the named step.

Example assuming input files are in a directory called C:\SAFESEQS\input. The results should be stored in the sub-directory \Dec2017 under the C:\SAFESEQS\input directory. The file SettingsTemplate.json should exist in the SafeSeqS Project Data directory and will be used to determine the run time settings. Up to 6 sub-processes may be run concurrently.

python -m safeseqs_controller -d C:\SAFESEQS\input -r Dec2017 -sf SettingsTemplate.json -w 6

Required Input Files in the Study Directory

  1. safeseqs.json - Contains information about the study data. It identifies the FASTQ input files, the well barcode association filename, the UIDlength being used in the study data, and the ascii adjustment being used in the study data. Format of the file is:

{"reads_pattern" : "", "barcodes_pattern" : "", "reads_files" : ["studydata_R1_001.fastq.gz", "studydata_R1_002.fastq.gz", ... ], "barcodes_files" : ["studydata_I1_001.fastq.gz", "studydata_I1_002.fastq.gz", ...], "barcodemap" : "barcodemap.txt", "uidlength" : 14, "ascii_adj" : 33}

  1. Study Data FASTQ Files - One or more sets of reads and barcode files - must be in the Study Directory. There must be a barcodes file for each reads file. Files can be identified by providing a list of file names in the 'reads_files' and 'barcodes_files' parameters. Files can alternatively be identified by providing a text pattern that is unique to the filenames in the 'reads_pattern' and the 'barcodes_pattern' parameters. To facilitate properly pairing files, the patterns provided should be the only difference in the file names. The safeseqs controller will gather all FASTQ files with that pattern in the filename from the Study Directory.
  2. barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets. An example file is provided in the \example sub-directory.
  3. primers.txt - Tab delimited file containing primers for the study. An example file is provided in the \example sub-directory.

Parameters in Settings File

Parameters in the settings file control decision making during processing runs. A SettingsTemplate.json is delivered with the package. It is located in the Study Data Directory. It can be cloned and modified to create different run scenarios.

fh_limit Required File handle limit for system. Example 2048
max_mismatches_for_used_reads Required Integer. Example 3
max_indels_for_used_reads Required Integer. Example 1
max_amp_per_UID_family Required Integer. Example 1
min_good_reads_usable_family Required Integer. Example 2
min_perc_good_reads_per_UID_family Required Integer. Example 95 = 95%
super_mut_perc_homegeneity Required Integer. Example 90 = 90%
default_indel_rate Required Float. Example .001
default_sbs_rate Required Float. Example .001
mark_UIDs_with_Ns_UnUsable Optional Valid Values: Yes or No Default: No
perform_opt_dup_removal Optional Valid Values: Yes or No Default: No
opt_dup_distance Required with peform_opt_dup_removal Integer. Example 5000
load_bad_bc Optional Valid Values: Yes or No Default: No
load_not_used_bc Optional Valid Values: Yes or No Default: No
save_merge Optional Valid Values: Yes or No Default: No

Optional Data Files in the Data Directory

  1. COSMIC.txt
  2. SNP.txt

Clone this wiki locally