-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Use pip to install the safeseqs package:
$ pip install safeseqs
SafeSeqS depends on several other packages and these will be installed automatically by pip if they are not already installed.
Required packages:
- scipy
- pywin32 (on Windows machines)
Running SafeSeqS requires (1) a set of input files in the Study Directory (2) a settings JSON file identifying study parameters.
| -d | Required | Directory containing the Study input files. |
| -r | Required | Run directory. Will be created under the Study directory. |
| -sf | Required | Settings file that identifies the parameters for this run. Settings file must exist in the SafeSeqs Data Directory. |
| -w | Optional | Number of concurrent worker sub-processes to run. Default: 1 |
| -s | Optional | Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step. |
| -e | Optional | End Stepname. Used when a partial run is desired. Processing will end before the named step. |
Example assuming input files are in a directory called C:\SAFESEQS\input. The results should be stored in the sub-directory \Nov09 under the C:\SAFESEQS\input directory. The file SettingsTemplate.json should exist in the SafeSeqS Project Data directory and will be used to determine the run time settings. Up to 6 sub-processes may be run concurrently.
python -m safeseqs_controller -d C:\SAFESEQS\input -r Nov09 -sf SettingsTemplate.json -w 6
- safeseqs.json - File identifying the Fastq input files, the barcodemap filename, and the ascii adjustment being used in the study data. Format of the file is:
{"reads_pattern" : "", "barcodes_pattern" : "", "reads_files" : ["fastq file 1 (just file name - must be in Study Directory", "fastq file 2", ... ], "barcodes_files" : ["barcode file 1", "barcode file 2", ...], "barcodemap" : "well barcode association file", "ascii_adj" : 33}
- Study Data Fastq Files - One or more sets of reads and barcode files.
- barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets. BarcodeMapRecord format 'barcodeNumber', 'barcode', 'wbcPlateNumber', 'template', 'purpose', 'gEsWellOrTotalULUsed', 'mutOrTotalGEsWell', 'ampMatchName', 'row', 'col'.
- primers.txt - Tab delimited file containing primers for the study. PrimerRecord format 'ampMatchName', 'gene', 'read1', 'read2', 'ampSeq', 'target_len', 'chrom', 'readStrand', 'hg19_start', 'hg19_end'.
Parameters in the settings file control decision making during processing runs. A SettingsTemplate.json is delivered with the package. It is located in the Study Data Directory. It can be cloned and modified to create different run scenarios.
| uidlength | Required | Length of the UID in the study data. |
| max_mismatches_for_used_reads | Required | Integer. Example 3 |
| max_indels_for_used_reads | Required | Integer. Example 1 |
| max_amp_per_UID_family | Required | Integer. Example 1 |
| min_good_reads_usable_family | Required | Integer. Example 2 |
| min_perc_good_reads_per_UID_family | Required | Integer. Example 95 = 95% |
| super_mut_perc_homegeneity | Required | Integer. Example 90 = 90% |
| default_indel_rate | Required | Float. Example .001 |
| default_sbs_rate | Required | Float. Example .001 |
| mark_UIDs_with_Ns_UnUsable | Optional | Valid Values: Yes or No Default: No |
| perform_opt_dup_removal | Optional | Valid Values: Yes or No Default: No |
| load_bad_bc | Optional | Valid Values: Yes or No Default: No |
| load_not_used_bc | Optional | Valid Values: Yes or No Default: No |
| save_merge | Optional | Valid Values: Yes or No Default: No |
- COSMIC.txt
- SNP.txt