# Converting LepMap IDs back to Stacks IDs and retreiving RADtags for linkage mapped markers

During LepMap analyses, LepMap renumbers all the markers. So here I will write a small script to replace the lepmap marker names with the original Stacks ID. This is necessary for mapping the RADtags to the genome down the line. 

I will use the final map file and the original VCF which was used to make the LepMap3 input file. The third column contains the Stacks IDs for each marker. 

LepMap essentially just renames markers from this VCF file in order from 1 to N_markers. So marker 100 in the LepMap output is the 100th marker in the VCF. 


## Getting Stacks IDs

In [26]:
def LepMap_2_Stacks_IDs(Map_path, VCF_path):

    """
    This script will replace the marker IDs in the map file outputted by the LepMap3 "OrderMarkers2" module
    with those from a VCF that was used to create it. The VCF must be the one used to create the LepMap2
    inputs. This script was written based on VCFs output by Stacks 1.48.
    
    USAGE:
    
        LepMap_2_Stacks_IDs.py  /full/path/to/map_file  /full/path/to/VCF
    
    OUPUTS:
    
        Will create a new map file with a similar name with LepMap IDs replaced by Stacks IDs
    
    """
    
    
    ## Make VCF ID dictionary

    VCF_ID_dict = {}

    tag_index = 1 ## keeps track of the order of loci in the vcf. This is essentially the LepMap ID. 

    for line in open(VCF_path, 'r').readlines():
        if not line.startswith("#"):        
            tag_ID = line.split()[2]

            VCF_ID_dict[str(tag_index)] = tag_ID

            tag_index += 1
            
            
    # Make new file to write the new IDs to - will be the same name as the input, just with "RealIDs" in it.

    new_map_file_path = "%s_%s.%s" % (Map_path.rpartition(".")[0], "RealIDs", Map_path.rpartition(".")[2])
    New_map_file = open(new_map_file_path, 'w')

    ## For each marker in the map file, rewrite the line to the new map file with the Stacks ID instead. 
    
    for line in open(Map_path, 'r').readlines():

        if line.startswith("#"):
            
            New_map_file.write(line)
            
        else:
            
            LepMap_ID = line.split()[0]
            Stacks_ID = VCF_ID_dict[LepMap_ID]  ## getting stacks ID here
            rest_of_line = "\t".join(line.split("\t")[1:])
            New_map_file.write("%s\t%s" % (Stacks_ID, rest_of_line))
            
    New_map_file.close()
    
    print "\nDone\nYour new map file is here: %s\n" % new_map_file_path

In [28]:
my_Map_path = "/home/djeffrie/Data/Genomes/Rtemp_hybrid/ALLMAPS_2019/SbfI/LepMap3_2019/OUTPUTS/map_23.8_js_23_ordered_infmask_23.txt"
my_VCF_path = "/home/djeffrie/Data/Genomes/Rtemp_hybrid/ALLMAPS_2019/SbfI/LepMap3_2019/INPUTS/batch_1.vcf"

In [29]:
LepMap_2_Stacks_IDs(my_Map_path,my_VCF_path)