Skip to content

extract_tracts.py: msp scanning excludes window end position #2

@JosieLikesCats

Description

@JosieLikesCats

Dear Tractor-Mix team,

Firstly, thank you for developing such a great tool! I'm looking forward to implementing it in my current dataset. I have a query about extract_tracts.py. When testing it on chr22 of my dataset, I get warning messages that certain positions are not in msp windows. However, I've checked my VCF and these are in msp windows - they just all seem to be the end position of the window. Unless gnomix windows are not end-inclusive, I think this is a bug.

For example, extract_tracts.py gives me the following:
INFO (main 96): # Running script version : 1.2.0
INFO (main 97): # VCF File : admix_chr22_qc.vcf.gz
INFO (main 98): # Prefix of output file names : admix_chr22_qc
INFO (main 99): # VCF File is compressed? : True
INFO (main 100): # Number of Ancestries in VCF : 2
INFO (main 101): # Output Directory : None
INFO (main 109): Creating output files for 2 ancestries
INFO (main 124): Iterating through VCF file
INFO (main 183): VCF position, 11933821 is not in an msp window, skipping site
INFO (main 183): VCF position, 12814864 is not in an msp window, skipping site
INFO (main 183): VCF position, 15860257 is not in an msp window, skipping site
INFO (main 183): VCF position, 16064423 is not in an msp window, skipping site
INFO (main 183): VCF position, 16125206 is not in an msp window, skipping site
INFO (main 183): VCF position, 16167211 is not in an msp window, skipping site

However, !cat admix_chr22_full_results_qc.msp | cut -f 1-3 | head gives:
#chm spos epos
22 10580920 11933821
22 11934161 12814864
22 15166065 15860257
22 15862268 16064423
22 16064594 16125206
22 16125370 16167211
22 16167225 16195316
22 16195357 16230311
22 16230346 16262902

All six flagged positions are the ends of the first six windows (I stopped the script early, but this was true for more windows when I let it run). Grepping for the positions in the dosage output files also shows that they are definitely excluded from the output files.

I was able to solve this in line 94 of def extract_tracts using the following:
while not (row[0] == window[0] and (window[1] <= pos <= window[2])): instead of
while not (row[0] == window[0] and (window[1] <= pos < window[2])):.

I've confirmed that these positions now appear in the dosage output file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions