Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: adds --p-n-threads option to trim-alignment #174

Merged
merged 4 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 7 additions & 4 deletions rescript/plugin_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -1056,23 +1056,26 @@
'primer_fwd': Str,
'primer_rev': Str,
'position_start': Int % Range(1, None),
'position_end': Int % Range(1, None)
'position_end': Int % Range(1, None),
'n_threads': Int % Range(1, None),
},
outputs=[('trimmed_sequences', FeatureData[AlignedSequence]), ],
input_descriptions={'aligned_sequences': 'Aligned DNA sequences.', },
parameter_descriptions={
'primer_fwd': 'Forward primer used to find the start position '
'for alignment trimming.',
'for alignment trimming. Provide as 5\'-3\'.',
'primer_rev': 'Reverse primer used to find the end position '
'for alignment trimming.',
'for alignment trimming. Provide as 5\'-3\'.',
'position_start': 'Position within the alignment where the trimming '
'will begin. If not provided, alignment will not'
'be trimmed at the beginning. If forward primer is'
'specified this parameter will be ignored.',
'position_end': 'Position within the alignment where the trimming '
'will end. If not provided, alignment will not be '
'trimmed at the end. If reverse primer is specified '
'this parameter will be ignored.'
'this parameter will be ignored.',
'n_threads': 'The number of threads. (Use `auto` to automatically use '
'all available cores)'
},
output_descriptions={
'trimmed_sequences': 'Trimmed sequence alignment.', },
Expand Down
15 changes: 10 additions & 5 deletions rescript/trim_alignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,11 +190,13 @@ def _trim_alignment(expand_alignment_action,
primer_fwd=None,
primer_rev=None,
position_start=None,
position_end=None) -> AlignedDNAFASTAFormat:
position_end=None,
n_threads=1) -> AlignedDNAFASTAFormat:
"""
Trim alignment based on primer alignment or explicitly specified
positions. When at least one primer sequence is given, primer-based
trimming will be performed, otherwise position-based trimming is done.
trimming will be performed via the `addfragments` option of mafft,
otherwise position-based trimming is done.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the n_threads param would not be effective as far as I can tell when position-based trimming is done, so this should be mentioned in the param description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE param description: Oooh, good catch! Will fix!

RE multi threading: Yes this does scale well. The table below is based on ~5k sequences randomly subsampled from SILVA's 50k position FASTA alignment file, on my M1 Max laptop. Mirroring @colinbrislawn's results.

threads time
8 6m 40.278s
4 9m 20.686s
2 14m 36.920s
1 25m 13.873s

Example command:

qiime rescript trim-alignment \
    --i-aligned-sequences silva-138-1-nr99-aln-dna-ss01.qza \
    --p-primer-fwd GTGYCAGCMGCCGCGGTAA \
    --p-primer-rev CCGYCAATTYMTTTRAGTTT \
    --p-n-threads  8 \
    --o-trimmed-sequences silva-138-1-nr99-aln-dna-ss01-v4v5-trimmed.qza \
    --verbose


Arguments:
expand_alignment_action: qiime action for multiple seq. alignment
Expand Down Expand Up @@ -222,7 +224,8 @@ def _trim_alignment(expand_alignment_action,
alignment_with_primers, = expand_alignment_action(
alignment=aligned_sequences,
sequences=primers,
addfragments=True)
addfragments=True,
n_threads=n_threads)

# find trim positions based on primer positions within alignment
trim_positions = _locate_primer_positions(alignment_with_primers)
Expand All @@ -244,7 +247,8 @@ def trim_alignment(ctx,
primer_fwd=None,
primer_rev=None,
position_start=None,
position_end=None):
position_end=None,
n_threads=1):
"""
Trim an existing alignment based on provided primers or specific,
pre-defined positions. Primers take precedence over the positions,
Expand All @@ -263,6 +267,7 @@ def trim_alignment(ctx,
primer_fwd,
primer_rev,
position_start,
position_end)
position_end,
n_threads=n_threads)

return qiime2.Artifact.import_data('FeatureData[AlignedSequence]', result)