diff --git a/README.md b/README.md index d09449cae..37e069b62 100644 --- a/README.md +++ b/README.md @@ -98,15 +98,17 @@ To see details on all the arguments that the program accepts, run: design.py --help ``` -[`design.py`](./bin/design.py) requires one or more `dataset`s that specify input sequence data to target: +[`design.py`](./bin/design.py) requires one or more `dataset`s that specify input sequence data to target, as well as a path to which the probe sequences are written: ```bash -design.py [dataset] [dataset ...] +design.py [dataset] [dataset ...] -o OUTPUT ``` Each `dataset` can be a path to a FASTA file. If you [downloaded](#downloading-viral-sequence-data) viral sequence data, it can also simply be a label for one of [550+ viral datasets](./catch/datasets/README.md) (e.g., `human_immunodeficiency_virus_1` or `zika`) distributed as part of this package. Each of these datasets includes all available whole genomes (genome neighbors) in [NCBI's viral genome data](https://www.ncbi.nlm.nih.gov/genome/viruses/) for a species that has human as a host, as of Oct. 2018. +The probe sequences are written to OUTPUT in FASTA format. + Below is a summary of some useful arguments to `design.py`: * `-pl PROBE_LENGTH`/`-ps PROBE_STRIDE`: Design probes to be PROBE_LENGTH nt long, and generate candidate probes using a stride of PROBE_STRIDE nt. @@ -136,7 +138,6 @@ If not set, CATCH uses its default model of hybridization based on `-m/--mismatc * `--filter-with-lsh-hamming FILTER_WITH_LSH_HAMMING`/`--filter-with-lsh-minhash FILTER_WITH_LSH_MINHASH`: Use locality-sensitive hashing to reduce the space of candidate probes. This can significantly improve runtime and memory requirements when the input is especially large and diverse. See `design.py --help` for details on using these options and downsides. -* `-o OUTPUT`: Write probe sequences in FASTA format to OUTPUT. ### Pooling across many runs ([`pool.py`](./bin/pool.py)) diff --git a/bin/design.py b/bin/design.py index a159cfc2f..f5dd4b5c2 100755 --- a/bin/design.py +++ b/bin/design.py @@ -275,9 +275,8 @@ def main(args): allow_small_seqs=args.small_seq_min) pb.design() - if args.output_probes: - # Write the final probes to the file args.output_probes - seq_io.write_probe_fasta(pb.final_probes, args.output_probes) + # Write the final probes to the file args.output_probes + seq_io.write_probe_fasta(pb.final_probes, args.output_probes) if (args.print_analysis or args.write_analysis_to_tsv or args.write_sliding_window_coverage): @@ -320,6 +319,12 @@ def main(args): "'collection:' (e.g., 'collection:viruses_with_human_host'), " "then this reads from an available collection of datasets.")) + # Outputting probes + parser.add_argument('-o', '--output-probes', + required=True, + help=("The file to which all final probes should be " + "written; they are written in FASTA format")) + # Parameters on probe length and stride parser.add_argument('-pl', '--probe-length', type=int, @@ -473,11 +478,7 @@ def check_coverage(val): "are blacklisted. See --custom-hybridization-fn for details " "of how this function should be implemented and provided.")) - # Outputting probe sequences and coverage analyses - parser.add_argument('-o', '--output-probes', - help=("(Optional) The file to which all final probes should be " - "written; if not specified, the final probes are not " - "written to a file")) + # Outputting coverage analyses parser.add_argument('--print-analysis', dest="print_analysis", action="store_true",