Skip to content
lryaninsilico edited this page Dec 12, 2017 · 20 revisions

SafeSeqS Installation

Use pip to install the safeseqs package:

$ pip install safeseqs

SafeSeqS depends on several other packages and these will be installed automatically by pip if they are not already installed.

Required packages:

  • scipy
  • pywin32 (on Windows machines)

Quick Start

Running SafeSeqS requires (1) a set of input files in the Study Directory (2) a settings JSON file identifying study parameters.

Runtime parameter descriptions

-d Required Directory containing the Study input files.
-r Required Run directory. Will be created under the Study directory.
-sf Required Settings file that identifies the parameters for this run. Settings file must exist in the SafeSeqs Data Directory.
-w Optional Number of concurrent worker sub-processes to run. Default: 1
-s Optional Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step.
-e Optional End Stepname. Used when a partial run is desired. Processing will end before the named step.

Example assuming input files are in a directory called C:\SAFESEQS\input. The results should be stored in the sub-directory \Nov09 under the C:\SAFESEQS\input directory. The file SettingsTemplate.json should exist in the SafeSeqS Project Data directory and will be used to determine the run time settings. Up to 6 sub-processes may be run concurrently.

python -m safeseqs_controller -d C:\SAFESEQS\input -r Nov09 -sf SettingsTemplate.json -w 6

Required Input Files in the Study Directory

  1. safeseqs.json - File identifying the Fastq input files, the barcodemap filename, and the ascii adjustment being used in the study data. Format of the file is:

{"reads_pattern" : "", "barcodes_pattern" : "", "reads_files" : ["fastq file 1 (just file name - must be in Study Directory", "fastq file 2", ... ], "barcodes_files" : ["barcode file 1", "barcode file 2", ...], "barcodemap" : "well barcode association filename", "uidlength" : 14, "ascii_adj" : 33}

  1. Study Data Fastq Files - One or more sets of reads and barcode files.
  2. barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets. BarcodeMapRecord format 'barcodeNumber', 'barcode', 'wbcPlateNumber', 'template', 'purpose', 'gEsWellOrTotalULUsed', 'mutOrTotalGEsWell', 'ampMatchName', 'row', 'col'.
  3. primers.txt - Tab delimited file containing primers for the study. PrimerRecord format 'ampMatchName', 'gene', 'read1', 'read2', 'ampSeq', 'target_len', 'chrom', 'readStrand', 'hg19_start', 'hg19_end'.

Parameters in Settings File

Parameters in the settings file control decision making during processing runs. A SettingsTemplate.json is delivered with the package. It is located in the Study Data Directory. It can be cloned and modified to create different run scenarios.

fh_limit Required File handle limit for system. Example 2048
max_mismatches_for_used_reads Required Integer. Example 3
max_indels_for_used_reads Required Integer. Example 1
max_amp_per_UID_family Required Integer. Example 1
min_good_reads_usable_family Required Integer. Example 2
min_perc_good_reads_per_UID_family Required Integer. Example 95 = 95%
super_mut_perc_homegeneity Required Integer. Example 90 = 90%
default_indel_rate Required Float. Example .001
default_sbs_rate Required Float. Example .001
mark_UIDs_with_Ns_UnUsable Optional Valid Values: Yes or No Default: No
perform_opt_dup_removal Optional Valid Values: Yes or No Default: No
opt_dup_distance Required with peform_opt_dup_removal Integer. Example 5000
load_bad_bc Optional Valid Values: Yes or No Default: No
load_not_used_bc Optional Valid Values: Yes or No Default: No
save_merge Optional Valid Values: Yes or No Default: No

Optional Data Files in the Data Directory

  1. COSMIC.txt
  2. SNP.txt

Clone this wiki locally