-
Notifications
You must be signed in to change notification settings - Fork 8
Quick Start
SafeSeqS accepts the following list of run time parameters and requires (1) a set of input files in the Study Directory (2) a results directory.
| -d | Required | Full path to the Study Directory containing the input fastq files. |
| -r | Required | Results directory. This directory will contain all result files and will be created in the Study Directory. |
| -sf | Optional | Settings file with the safeseqs configuration parameters. The settings file must exist in safeseqs\data folder in the directory where safeseqs is installed. (usually python site_packages directory). A default file settings.json is delivered with the package. If different configuration scenarios are desired, it is recommended that multiple settings files be created and the desired file be passed in with this parameter at run time. |
| -w | Optional | Number of concurrent processes to run to speed up processing (usually set to number of cores on the machine). Default: 1 |
| -s | Optional | Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step. |
| -e | Optional | End Stepname. Used when a partial run is desired. Processing will end before the named step. |
Example:
python -m safeseqs_controller -d C:\labName\studyName -r run121217 -sf mySettings.json
An example of each file is provided in the \example sub-directory where safeseqs is installed.
-
safeseqs.json - There must be a "safeseqs.json" file in the Study Directory identified by the -d option. It contains information about the study data including the paired read/barcode FASTQ files, the well barcode to sample association file, the UID length being used in the study data, and the ascii offset for read quality flags.
-
Study Data FASTQ Files - One or more pairs of reads and barcode files - must be in the Study Directory. Both FASTQ and compressed FASTQ.gz are supported.
-
barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets.
-
primers.txt - Tab delimited file containing primers for the study.
A settings file must be identified at run time with the -sf argument. Parameters in the settings file control decision making during processing runs. A settings.json is delivered with the package. It can be copied and modified to create different run settings profiles.
| max_mismatches_for_used_reads | Required | Integer. Example 3 |
| max_indels_for_used_reads | Required | Integer. Example 1 |
| max_amp_per_UID_family | Required | Integer. Example 1 |
| min_good_reads_usable_family | Required | Integer. Example 2 |
| min_perc_good_reads_per_UID_family | Required | Integer. Example 95 = 95% |
| super_mut_perc_homegeneity | Required | Integer. Example 90 = 90% |
| default_indel_rate | Required | Float. Example .001 |
| default_sbs_rate | Required | Float. Example .001 |
| mark_UIDs_with_Ns_UnUsable | Optional | Valid Values: Yes or No Default: No |
| perform_opt_dup_removal | Optional | Valid Values: Yes or No Default: No |
| opt_dup_distance | Required with peform_opt_dup_removal | Integer. Example 5000 |
| fh_limit | Required | File handle limit for system. Example 2048 |
| load_bad_bc | Optional | Valid Values: Yes or No Default: No |
| load_not_used_bc | Optional | Valid Values: Yes or No Default: No |
An example of each file is provided in the \data sub-directory where safeseqs is installed. The example file is empty with appropriate file headers.
- COSMIC.txt - Tab delimited file containing known COSMIC single base pair mutations.
- dbSNP.txt - Tab delimited file containing dbSNP single base pair mutations. This is used to allow for expected variants when calculating mismatches.