Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,17 @@ To see details on all the arguments that the program accepts, run:
design.py --help
```

[`design.py`](./bin/design.py) requires one or more `dataset`s that specify input sequence data to target:
[`design.py`](./bin/design.py) requires one or more `dataset`s that specify input sequence data to target, as well as a path to which the probe sequences are written:

```bash
design.py [dataset] [dataset ...]
design.py [dataset] [dataset ...] -o OUTPUT
```

Each `dataset` can be a path to a FASTA file. If you [downloaded](#downloading-viral-sequence-data) viral sequence data, it can also simply be a label for one of [550+ viral datasets](./catch/datasets/README.md) (e.g., `human_immunodeficiency_virus_1` or `zika`) distributed as part of this package.
Each of these datasets includes all available whole genomes (genome neighbors) in [NCBI's viral genome data](https://www.ncbi.nlm.nih.gov/genome/viruses/) for a species that has human as a host, as of Oct. 2018.

The probe sequences are written to OUTPUT in FASTA format.

Below is a summary of some useful arguments to `design.py`:

* `-pl PROBE_LENGTH`/`-ps PROBE_STRIDE`: Design probes to be PROBE_LENGTH nt long, and generate candidate probes using a stride of PROBE_STRIDE nt.
Expand Down Expand Up @@ -136,7 +138,6 @@ If not set, CATCH uses its default model of hybridization based on `-m/--mismatc
* `--filter-with-lsh-hamming FILTER_WITH_LSH_HAMMING`/`--filter-with-lsh-minhash FILTER_WITH_LSH_MINHASH`: Use locality-sensitive hashing to reduce the space of candidate probes.
This can significantly improve runtime and memory requirements when the input is especially large and diverse.
See `design.py --help` for details on using these options and downsides.
* `-o OUTPUT`: Write probe sequences in FASTA format to OUTPUT.

### Pooling across many runs ([`pool.py`](./bin/pool.py))

Expand Down
17 changes: 9 additions & 8 deletions bin/design.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,9 +275,8 @@ def main(args):
allow_small_seqs=args.small_seq_min)
pb.design()

if args.output_probes:
# Write the final probes to the file args.output_probes
seq_io.write_probe_fasta(pb.final_probes, args.output_probes)
# Write the final probes to the file args.output_probes
seq_io.write_probe_fasta(pb.final_probes, args.output_probes)

if (args.print_analysis or args.write_analysis_to_tsv or
args.write_sliding_window_coverage):
Expand Down Expand Up @@ -320,6 +319,12 @@ def main(args):
"'collection:' (e.g., 'collection:viruses_with_human_host'), "
"then this reads from an available collection of datasets."))

# Outputting probes
parser.add_argument('-o', '--output-probes',
required=True,
help=("The file to which all final probes should be "
"written; they are written in FASTA format"))

# Parameters on probe length and stride
parser.add_argument('-pl', '--probe-length',
type=int,
Expand Down Expand Up @@ -473,11 +478,7 @@ def check_coverage(val):
"are blacklisted. See --custom-hybridization-fn for details "
"of how this function should be implemented and provided."))

# Outputting probe sequences and coverage analyses
parser.add_argument('-o', '--output-probes',
help=("(Optional) The file to which all final probes should be "
"written; if not specified, the final probes are not "
"written to a file"))
# Outputting coverage analyses
parser.add_argument('--print-analysis',
dest="print_analysis",
action="store_true",
Expand Down