Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Fasta records with multiple marker definitions #124

Merged
merged 3 commits into from
May 1, 2023

Conversation

standage
Copy link
Member

@standage standage commented Apr 24, 2023

This PR updates the FASTA output mode of the microhapdb marker command. MicroHapDB has long been build around a "one locus, one marker" paradigm, and accordingly microhapdb marker --format=fasta would generate one FASTA record per marker. As part of MicroHapDB's improved support for alternative marker definitions at a locus, only one sequence is output per locus, with multiple SNP lists included for convenience as requested.

Closes #117.

Comment on lines +89 to +93
loci = defaultdict(Locus)
for marker in markers:
print(marker.fasta)
loci[marker.locus].markers.append(marker)
for locus in loci.values():
print(locus.fasta)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI update: store requested markers by locus, and then output the relevant locus sequences.

Comment on lines +435 to +437
class Locus:
def __init__(self, markers=None):
self.markers = list() if markers is None else markers
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code enabling this improved handling is part of this new Locus class. It replicates some of the features originally implemented for the Marker class. I considered removing those configuration features and only allowing configuration of e.g. delta, min length, etc., for Locus objects. But I think we need to maintain these features on a per-marker for other output modes.

Comment on lines +667 to +674
>mh02KK-134 GRCh38:chr2:160222879-160223013 mh02KK-134.v3=20,44,59 mh02KK-134.v1=20,44,59,123 mh02KK-134.v2=20,44,59,65,107,123 mh02KK-134.v4=20,44,59,65,72,87,107,123
TACCCTTGGCAGGAACCCTCACTACCTAAGGATGGGCAATGGCTTATGAGTGAGAAACACGGAGCCGTGGGAACTCAGAA
TGACATGCTACCTGGAGATTGTGGTAACGCCCTGTTTTTTTGTGGGCATATCTA
>mh14SHY-003 GRCh38:chr14:57983921-57984213 mh14SHY-003.v1=10,14,16,26,102,108,109,161,192,199 mh14SHY-003.v4=10,14,102,108,199,262,281 mh14SHY-003.v3=102,108,199 mh14SHY-003.v2=102,108,199,262,281
GTAGGAGTGATGTACGGGGCACCTACTTGGGGTTCACATGCTGGCCCCTTTATTGAGTTCATTCTGAATCCAGAAGCTTG
GCAGAGTTCAGCCAGATGGCAGGGTGAGCGCCCTGCCTTCCTGGTAGTCTCTTCTTCTGCAAGGGAATAGGAGGCGTTCA
CCCTCCTTTGTTCAAGAGTCTATTTCTAGGGGCCTATCAGCCCAGGGTCCCTTCTCCAGCTTTCTCAGGAGGCCCCACAT
CATCAGGCAATTAGCTCTCTAGTGGGTATAACTGCTACTGCCACAACCACTG
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test captures what the new output looks like for loci with multiple marker definitions.

@standage standage merged commit 1181182 into master May 1, 2023
@standage standage deleted the multifasta branch May 1, 2023 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of loci with multiple marker definitions for --format=fasta and --format=offsets
1 participant