# Get the chimpanzee alleles for each variable position in Y chromosomes

In [1]:
(cd ..; make init)

make[1]: Entering directory `/mnt/expressions/mp/archaic-y'
make[1]: Nothing to be done for `init'.
make[1]: Leaving directory `/mnt/expressions/mp/archaic-y'


In [2]:
cd ../tmp



## Download the UCSC utilities fetchChromSizes and axtToMaf

Can be downloaded from [here](http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads).

### fetchChromSizes

In [3]:
curl -s http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes > fetchChromSizes



In [4]:
chmod +x fetchChromSizes



### axtToMaf

In [5]:
curl -s http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/axtToMaf > axtToMaf



In [6]:
chmod +x axtToMaf



## Download the human-chimp alignment for Y chromosome

The data is in the [AXT format](https://genome.ucsc.edu/goldenPath/help/axt.html) and it will be necessary to convert it to a [MAF format](https://genome.ucsc.edu/FAQ/FAQformat.html#format5) for further processing.

AXT files can be downloaded from [here](
Source: http://hgdownload.cse.ucsc.edu/downloads.html#human).

In [7]:
curl -s -O http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsPanTro4/axtNet/chrY.hg19.panTro4.net.axt.gz



## Get sizes of all chromosomes for hg19 and chimp for `axtToMaf`

Database names to fetch the information from are here: https://genome.ucsc.edu/FAQ/FAQreleases.html#release1 (hg19 is easy, but was not sure about the exact name of the most recent chimp database - panTro4).

### hg19 chromosome sizes

In [8]:
./fetchChromSizes hg19 > hg19.chrom.sizes

INFO: trying WGET /usr/bin/wget for database hg19
url: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes


### chimpanzee chromosome sizes

In [9]:
./fetchChromSizes panTro4 > panTro4.chrom.sizes

INFO: trying WGET /usr/bin/wget for database panTro4
url: http://hgdownload.cse.ucsc.edu/goldenPath/panTro4/bigZips/panTro4.chrom.sizes


## Convert AXT file to MAF format

In [10]:
./axtToMaf chrY.hg19.panTro4.net.axt.gz hg19.chrom.sizes panTro4.chrom.sizes chrY.hg19.panTro4.maf -tPrefix=hg19. -qPrefix=panTro4. 



# Generate a VCF file with chimpanzee states from a MAF alignment file

In [11]:
chimp_hg19_alignment="../tmp/chrY.hg19.panTro4.maf"



### Lippold et al sites

In [12]:
lippold_chimp_vcf="../vcf/lippold_chimp.vcf"
lippold_sites="../input/lippold_sites.bed"



In [13]:
echo "##fileformat=VCFv4.1" >> ${lippold_chimp_vcf}
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">" >> ${lippold_chimp_vcf}
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tChimp" >> ${lippold_chimp_vcf}



In [14]:
cut -f1,3 $lippold_sites \
    | /home/pruefer/src/BamSNPTool/BamSNPAddMaf $chimp_hg19_alignment hg19 panTro4 \
    | awk '{print toupper($0)}' \
    | awk -vOFS="\t" '
        {
            if ($3 == "N") {
                next
            }
            else if ($4 == "N" || $4 == "-") {
                alt = "."
                gt = "."
            } else if ($3 == $4) {
                alt = "."
                gt = "0"
            } else {
                alt = $4
                gt = "1"
            }
            { print $1, $2, ".", $3, alt, ".", ".", ".", "GT", gt}
        }' \
    >> $lippold_chimp_vcf



In [15]:
bgzip $lippold_chimp_vcf



In [16]:
tabix $lippold_chimp_vcf.gz



### Exome sites

In [17]:
exome_chimp_vcf="../vcf/exome_chimp.vcf"
exome_sites="../input/exome_sites.bed"



In [18]:
echo "##fileformat=VCFv4.1" >> ${exome_chimp_vcf}
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">" >> ${exome_chimp_vcf}
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tChimp" >> ${exome_chimp_vcf}



In [19]:
cut -f1,3 $exome_sites \
    | /home/pruefer/src/BamSNPTool/BamSNPAddMaf $chimp_hg19_alignment hg19 panTro4 \
    | awk '{print toupper($0)}' \
    | awk -vOFS="\t" '
        {
            if ($3 == "N") {
                next
            }
            else if ($4 == "N" || $4 == "-") {
                alt = "."
                gt = "."
            } else if ($3 == $4) {
                alt = "."
                gt = "0"
            } else {
                alt = $4
                gt = "1"
            }
            { print $1, $2, ".", $3, alt, ".", ".", ".", "GT", gt}
        }' \
    >> $exome_chimp_vcf



In [20]:
bgzip $exome_chimp_vcf



In [21]:
tabix $exome_chimp_vcf.gz

