Quick Start

SafeSeqS accepts the following list of run time parameters and requires (1) a set of input files in the Study Directory (2) a results directory.

Run Time Parameter Descriptions


-d	Required	Full path to the Study Directory containing the input fastq files.
-r	Required	Results directory. This directory will contain all result files and will be created in the Study Directory.
-sf	Optional	Settings file with the safeseqs configuration parameters. The settings file must exist in safeseqs\data folder in the directory where safeseqs is installed. (usually python site_packages directory). A default file settings.json is delivered with the package. If different configuration scenarios are desired, it is recommended that multiple settings files be created and the desired file be passed in with this parameter at run time.
-w	Optional	Number of concurrent processes to run to speed up processing (usually set to number of cores on the machine). Default: 1
-s	Optional	Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step.
-e	Optional	End Stepname. Used when a partial run is desired. Processing will end before the named step.

Example: python -m safeseqs_controller -d C:\labName\studyName -r run121217 -sf mySettings.json

Required Input Files in the Study Directory

An example of each file is provided in the \example sub-directory where safeseqs is installed.

safeseqs.json - There must be a "safeseqs.json" file in the Study Directory identified by the -d option. It contains information about the study data including the paired read/barcode FASTQ files, the well barcode to sample association file, the UID length being used in the study data, and the ascii offset for read quality flags.
Study Data FASTQ Files - One or more pairs of reads and barcode files - must be in the Study Directory. Both FASTQ and compressed FASTQ.gz are supported.
barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets.
primers.txt - Tab delimited file containing primers for the study.

Settings File in the safeseqs/data Directory

A settings file must be identified at run time with the -sf argument. Parameters in the settings file control decision making during processing runs. A settings.json is delivered with the package. It can be copied and modified to create different run settings profiles.


max_mismatches_for_used_reads	Required	Integer. Example 3
max_indels_for_used_reads	Required	Integer. Example 1
max_amp_per_UID_family	Required	Integer. Example 1
min_good_reads_usable_family	Required	Integer. Example 2
min_perc_good_reads_per_UID_family	Required	Integer. Example 95 = 95%
super_mut_perc_homegeneity	Required	Integer. Example 90 = 90%
default_indel_rate	Required	Float. Example .001
default_sbs_rate	Required	Float. Example .001
mark_UIDs_with_Ns_UnUsable	Optional	Valid Values: Yes or No Default: No
perform_opt_dup_removal	Optional	Valid Values: Yes or No Default: No
opt_dup_distance	Required with peform_opt_dup_removal	Integer. Example 5000
fh_limit	Required	File handle limit for system. Example 2048
load_bad_bc	Optional	Valid Values: Yes or No Default: No
load_not_used_bc	Optional	Valid Values: Yes or No Default: No

Optional Reference Data Files in the safeseqs/data directory

An example of each file is provided in the \data sub-directory where safeseqs is installed. The example file is empty with appropriate file headers.

COSMIC.txt - Tab delimited file containing known COSMIC single base pair mutations.
dbSNP.txt - Tab delimited file containing dbSNP single base pair mutations. This is used to allow for expected variants when calculating mismatches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

Run Time Parameter Descriptions

Required Input Files in the Study Directory

Settings File in the safeseqs/data Directory

Optional Reference Data Files in the safeseqs/data directory

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally