## Finding markers on the sex chromosome in <i>Hyla orientalis</i>

Ok, so far we have done a phylogeography using ~10,000 markers. 

Now I want to map those markers to scaffolds in the <i>Hyla arborea</i> genome which have been mapped to LG1 using a linkage map (I have these scaffold names).

So here I will:

1. Get scaffold names that are on LG1

2. Map all RADtags to these scaffolds. 

3. Filter VCF and .gen files for just these loci

We can then use the filtered files to find the male and female Fstats per population, using something like adegenet. 



In [13]:
## Getting scaffold names that are on H. arborea Chr 01

Harb_maps = open("/home/djeffrie/Data/Genomes/Hyla/hyla_arborea_linkage_maps.txt", 'r').readlines()

Chrom_1_harb_scaffs = []

for i in Harb_maps[1:]:
    Xen_scaff = i.split()[3]
    if Xen_scaff == "scaffold_1":
        Chrom_1_harb_scaffs.append(i.split()[2])
Chrom_1_harb_scaffs = set(Chrom_1_harb_scaffs)

In [50]:
Chrom_1_harb_scaffs

{'scaffold1000188',
 'scaffold1003840',
 'scaffold1004672',
 'scaffold1004884',
 'scaffold100612',
 'scaffold1007365',
 'scaffold1008769',
 'scaffold1009238',
 'scaffold101',
 'scaffold1024125',
 'scaffold1029538',
 'scaffold1030986',
 'scaffold1033690',
 'scaffold1035724',
 'scaffold103707',
 'scaffold104040',
 'scaffold104917',
 'scaffold1053012',
 'scaffold1053407',
 'scaffold1054739',
 'scaffold1055051',
 'scaffold1057470',
 'scaffold1059263',
 'scaffold106131',
 'scaffold106141',
 'scaffold106565',
 'scaffold1069241',
 'scaffold107021',
 'scaffold1075982',
 'scaffold1076988',
 'scaffold107738',
 'scaffold1078947',
 'scaffold107900',
 'scaffold108194',
 'scaffold1087059',
 'scaffold109008',
 'scaffold1091324',
 'scaffold1091471',
 'scaffold1091907',
 'scaffold109414',
 'scaffold11005',
 'scaffold110263',
 'scaffold1103163',
 'scaffold1103225',
 'scaffold110355',
 'scaffold1112102',
 'scaffold1112303',
 'scaffold1112790',
 'scaffold1118151',
 'scaffold1118394',
 'scaffold113049',
 '

In [23]:
## Now map all of the RADtags to the genome.

## First get the loci of interest from the catalog and put into a fasta

vcf = open("/home/djeffrie/Data/RADseq/Hori/Stacks_new/Populations_all_kept_final/batch_1.vcf").readlines()

vcf_tags = []

for line in vcf[10:]:
    vcf_tags.append(line.split()[2])
vcf_tags = set(vcf_tags)

In [25]:
print "Number of tags in vcf:", len(vcf_tags)

Number of tags in vcf: 5764


In [30]:
## Making fasta from catalog.

import MISC_RAD_tools as MISC

catalog_path = "/home/djeffrie/Data/RADseq/Hori/Stacks_new/batch_1.catalog.tags.tsv.gz"

MISC.fasta_maka(vcf_tags, catalog_path,"/home/djeffrie/Data/RADseq/Hori/Per_chrom_analyses/Phylogeog_RADtags.fa" )



Help on function fasta_maka in module MISC_RAD_tools:

fasta_maka(whitey, cat, out=None)
    whitey = whitelist (either a python list or a file path) containing locus IDs in the form of "<Tag_ID>_<Position>"
    cat    = path to the catalog file to get sequences from

Number of tags in whitelist: 5764
5764 sequences written to /home/djeffrie/Data/RADseq/Hori/Per_chrom_analyses/Phylogeog_RADtags.fa


In [36]:
# Ok, now map these to the genome. . . 

from Bio.Blast.Applications import NcbiblastnCommandline

tags_fasta = "/home/djeffrie/Data/RADseq/Hori/Per_chrom_analyses/Phylogeog_RADtags.fa"
Genome_db = "/home/djeffrie/Data/Genomes/Hyla/Harb_assembly/HarbK44final_blastn_db"
blast_out_path = "/home/djeffrie/Data/RADseq/Hori/Per_chrom_analyses/Phylogeog_RADtags_blast_outs.xml"
Threads = 7

blastn_cline = NcbiblastnCommandline(query=tags_fasta, db=Genome_db, outfmt=5, out=blast_out_path, num_threads = Threads)
stdout, stderr = blastn_cline()



In [41]:
## Now filter for just those that hit chromosome 1. 

Genome_fasta = ""
Eval_threshold = 1e-20
best_hit_crit = 1e-5
Window = 10

blastn_filtered_outs = MISC.BlastParseExtra(blast_out_path, Genome_fasta , best_hit_crit, Eval_threshold, Window, 0 )


In [57]:
print len(blastn_filtered_outs), "tags map confidently to the H. arborea genome"

2387 tags map confidently to the H. arborea genome


In [54]:
Chrom_01_tags = []
for hit in blastn_filtered_outs:
    if blastn_filtered_outs[hit]["Ref_hit_id"].split(".")[0] in Chrom_1_harb_scaffs:
        Chrom_01_tags.append(hit)

In [58]:
print len(Chrom_01_tags), "tags map confidently to the H. arborea Chromosome 1"

36 tags map confidently to the H. arborea Chromosome 1


So its not that many, but might be enough to get some reliable stats. . . . Note that I imagine about 4 times this many could be found if the analyses was performed per clade or populaition. . . but it will take quite a lot of time. 