-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Fasta records with multiple marker definitions #124
Conversation
loci = defaultdict(Locus) | ||
for marker in markers: | ||
print(marker.fasta) | ||
loci[marker.locus].markers.append(marker) | ||
for locus in loci.values(): | ||
print(locus.fasta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI update: store requested markers by locus, and then output the relevant locus sequences.
class Locus: | ||
def __init__(self, markers=None): | ||
self.markers = list() if markers is None else markers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code enabling this improved handling is part of this new Locus
class. It replicates some of the features originally implemented for the Marker
class. I considered removing those configuration features and only allowing configuration of e.g. delta, min length, etc., for Locus objects. But I think we need to maintain these features on a per-marker for other output modes.
>mh02KK-134 GRCh38:chr2:160222879-160223013 mh02KK-134.v3=20,44,59 mh02KK-134.v1=20,44,59,123 mh02KK-134.v2=20,44,59,65,107,123 mh02KK-134.v4=20,44,59,65,72,87,107,123 | ||
TACCCTTGGCAGGAACCCTCACTACCTAAGGATGGGCAATGGCTTATGAGTGAGAAACACGGAGCCGTGGGAACTCAGAA | ||
TGACATGCTACCTGGAGATTGTGGTAACGCCCTGTTTTTTTGTGGGCATATCTA | ||
>mh14SHY-003 GRCh38:chr14:57983921-57984213 mh14SHY-003.v1=10,14,16,26,102,108,109,161,192,199 mh14SHY-003.v4=10,14,102,108,199,262,281 mh14SHY-003.v3=102,108,199 mh14SHY-003.v2=102,108,199,262,281 | ||
GTAGGAGTGATGTACGGGGCACCTACTTGGGGTTCACATGCTGGCCCCTTTATTGAGTTCATTCTGAATCCAGAAGCTTG | ||
GCAGAGTTCAGCCAGATGGCAGGGTGAGCGCCCTGCCTTCCTGGTAGTCTCTTCTTCTGCAAGGGAATAGGAGGCGTTCA | ||
CCCTCCTTTGTTCAAGAGTCTATTTCTAGGGGCCTATCAGCCCAGGGTCCCTTCTCCAGCTTTCTCAGGAGGCCCCACAT | ||
CATCAGGCAATTAGCTCTCTAGTGGGTATAACTGCTACTGCCACAACCACTG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new test captures what the new output looks like for loci with multiple marker definitions.
This PR updates the FASTA output mode of the
microhapdb marker
command. MicroHapDB has long been build around a "one locus, one marker" paradigm, and accordinglymicrohapdb marker --format=fasta
would generate one FASTA record per marker. As part of MicroHapDB's improved support for alternative marker definitions at a locus, only one sequence is output per locus, with multiple SNP lists included for convenience as requested.Closes #117.