-
Notifications
You must be signed in to change notification settings - Fork 8
Quick Start
SafeSeqS accepts the following list of run time parameters and requires (1) a set of input files in the Study Directory (2) a results directory.
| -d | Required | Full path to the Study Directory containing the input fastq files. |
| -r | Required | Results directory. This directory will contain all result files and will be created in the Study Directory. |
| -sf | Optional | Settings file with the safeseqs configuration parameters. The settings file must exist in safeseqs\data folder in the directory where safeseqs is installed. (usually python site_packages directory). A default file settings.json is delivered with the package. If different configuration scenarios are desired, it is recommended that multiple settings files be created and the desired file be passed in with this parameter at run time. |
| -w | Optional | Number of concurrent processes to run to speed up processing (usually set to number of cores on the machine). Default: 1 |
| -s | Optional | Start Stepname. Used when re-starting partially completed run. Processing will begin at the named step. |
| -e | Optional | End Stepname. Used when a partial run is desired. Processing will end before the named step. |
Example:
python -m safeseqs.safeseqs_controller -d C:\labName\studyName -r run121217
The safeseqs controller breaks down processing into multiple steps. The controller records checkpoints as it completes steps and upon restart, will begin processing after the last successful checkpoint. The user may override the checkpoint starting position using the optional '-s' run time parameter and specifying the desired starting step listed below. Users may also specify an end step using the '-e' paramter at run time. The controller will stop processing before peforming that end step. Controlling the start and stop points can be useful if new optional reference data becomes available or different configuration parameters are desired.
| split | Reads all fastq input files and splits the reads by barcode. A separate .reads file is then created for each barcode. This step is time consuming and only needs to be performed once. |
| unique | Processes each barcode's .reads file separately. It identifies unique reads for the barcode to reduce workload for the downstream processes. It also associates unique reads with their UID family. |
| align | Compares the test sequence in the read sequence to the target sequence in the primer. It identifies and records any changes. |
| optdup | Optional step to determine if any read records in the original input were optical duplicate and should be ignored. |
| uidstats | Collects alignment statistics on the UID families to begin assessing whether changes in its reads should be considered supermutants. |
| supermutant | Collects potential super mutant changes for UIDs that fall within configuration parameter settings. |
| wellsupermutant | Aggregates potential super mutants by well family and evaluates tabulations based on configuration parameter settings. |
| samplesupermutant | Aggregates statistics for super mutant changes by Sample Name. |
An example of each file is provided in the \example sub-directory where safeseqs is installed.
-
safeseqs.json - There must be a "safeseqs.json" file in the Study Directory identified by the -d option. It contains information about the study data including the paired read/barcode FASTQ files, the well barcode to sample association file, the UID length being used in the study data, and the ascii offset for read quality flags.
-
Study Data FASTQ Files - One or more pairs of reads and barcode files - must be in the Study Directory. Both FASTQ and compressed FASTQ.gz are supported.
-
barcodemap.txt - Tab delimited file containing mapping of samples to barcodes and primer sets.
-
primers.txt - Tab delimited file containing primers for the study.
There is a setting file, settings.json, in the safeseqs/data directory where safeseqs is installed. Settings can be changed in this file and will be used the next time safeseqs is run. If you would like to have multiple settings files for different types of processing runs, the settings files must be in the safeseqs/data directory and are selected when running safeseqs with the -sf command line option.
| max_mismatches_for_used_reads | Required | Integer. Example 3 |
| max_indels_for_used_reads | Required | Integer. Example 1 |
| max_amp_per_UID_family | Required | Integer. Example 1 |
| min_good_reads_usable_family | Required | Integer. Example 2 |
| min_perc_good_reads_per_UID_family | Required | Integer. Example 95 = 95% |
| super_mut_perc_homegeneity | Required | Integer. Example 90 = 90% |
| default_indel_rate | Required | Float. Example .001 |
| default_sbs_rate | Required | Float. Example .001 |
| mark_UIDs_with_Ns_UnUsable | Optional | Valid Values: Yes or No Default: No |
| perform_opt_dup_removal | Optional | Valid Values: Yes or No Default: No |
| opt_dup_distance | Required with peform_opt_dup_removal | Integer. Example 5000 |
| fh_limit | Required | File handle limit for system. Example 2048 |
| load_bad_bc | Optional | Valid Values: Yes or No Default: No |
| load_not_used_bc | Optional | Valid Values: Yes or No Default: No |
An example of each file is provided in the \data sub-directory where safeseqs is installed. The example file is empty with appropriate file headers.
- COSMIC.txt - Tab delimited file containing known COSMIC single base pair mutations.
- dbSNP.txt - Tab delimited file containing dbSNP single base pair mutations. This is used to allow for expected variants when calculating mismatches.
The safeseqs process identifies super mutants by sample. The controller will create a \superMutants subdirectory under the \Study Directory\Results Directory as specified in the run time parameters. A file will be created in this subdirectory for each sample in the barcodemap.file provided. The format of the output file is as follows:
| Template | - |
| Purpose | - |
| GEsWellOrTotalULUsed | - |
| WellAmpMatchName | - |
| PrimerAmpMatchName | - |
| SampleWells | - |
| SampleWellsPositiveForAmplimer | - |
| SumTotalUIDsForAmplimer | - |
| SumGoodReadsForAmplimer | - |
| MinTotalUIDsForAmplimerPerWell | - |
| AverageTotalUIDsForAmplimerPerWell | - |
| MaxTotalUIDsForAmplimerPerWell | - |
| MinSumGoodReadsForAmplimerPerWell | - |
| AverageSumGoodReadsForAmplimerPerWell' | - |
| MaxSumGoodReadsForAmplimerPerWell | - |
| Chrom | - |
| MutType | - |
| BaseFrom | - |
| BaseTo | - |
| MutantWellCount | - |
| MutantDistinctUidCount | - |
| SumMutReadCounts | - |
| SumFamilyGoodReadCountFromMutantFamilies | - |
| MinMutantDistinctUidCountPerWell | - |
| AverageMutantDistinctUidCountPerWell | - |
| MaxMutantDistinctUidCountPerWell | - |
| MinMutCount | - |
| AverageMutCount | - |
| MaxMutCount | - |
| MinGoodReadCountInMutantWells | - |
| AverageFamilyGoodReadCountInMutantWells | - |
| MaxFamilyGoodReadCountInMutantWells | - |
| MinMismatchCountInMutantReads | - |
| AverageMismatchCountInMutantReads | - |
| MaxMismatchCountInMutantReads | - |
| MinCorrectedMismatchCountInMutantReads | - |
| AverageCorrectedMismatchCountInMutantReads | - |
| MaxCorrectedMismatchCountInMutantReads | - |
| MinIndelCountInMutantReads | - |
| AverageIndelCountInMutantReadsInMutantReads | - |
| MaxIndelCountInMutantReadsInMutantReads | - |
| MinInsertedBasesInMutantReads | - |
| AverageInsertedBasesInMutantReads | - |
| MaxInsertedBasesInMutantReads | - |
| MinDeletedBasesInMutantReads | - |
| AverageDeletedBasesInMutantReads | - |
| MaxDeletedBasesInMutantReads | - |
| MinQualityScoreForMutation | - |
| AverageQualityScoreForMutation | - |
| MaxQualityScoreForMutation | - |
| Min%MutantByAllReadsByWellInMutantWells | - |
| Average%MutantByAllReadsByWellInMutantWells | - |
| Max%MutantByAllReadsByWellInMutantWells | - |
| Min%MutantByUidFAmiliesByWellInMutantWells | - |
| Average%MutantByUidFAmiliesByWellInMutantWells | - |
| Max%MutantByUidFAmiliesByWellInMutantWells | - |
| MinSumGoodReadsByWellInMutantWells | - |
| AverageSumGoodReadsByWellInMutantWells | - |
| MaxSumGoodReadsByWellInMutantWells | - |
| MinTotalUIDsByWellInMutantWells | - |
| AverageTotalUIDsByWellInMutantWells | - |
| MaxTotalUIDsByWellInMutantWells | - |
| SumGoodReadsFromMutantWells | - |
| TotalUIDsFromMutantWells | - |
| %WellsWithMutation | - |
| %MutantByAllReads | - |
| %MutantByUidFAmilies | - |
| %BackgroundRate | - |