I want to annotate the vcf files with phyloP scores. 

I will convert the bigWig into a format more ameniable to `bcftools annotate`. First, I will produce a wig file, which is more easially manipulable.

In [1]:
%%bash
bigWigToWig /home/mcn26/varef/data/Zoonomia/mammal_phyloP/241-mammalian-2020v2.bigWig ./out.wig

Now, we will re-format it.  From the manpage:

> Bgzip-compressed and tabix-indexed file with annotations. The file can be VCF, BED, or a tab-delimited file with mandatory columns CHROM, POS (or, alternatively, FROM and TO), optional columns REF and ALT, and arbitrary number of annotation columns. BED files are expected to have the ".bed" or ".bed.gz" suffix (case-insensitive), otherwise a tab-delimited file is assumed. Note that in case of tab-delimited file, the coordinates POS, FROM and TO are one-based and inclusive. When REF and ALT are present, only matching VCF records will be annotated. If the END coordinate is present in the annotation file and given on command line as "-c ~INFO/END", then VCF records will be matched also by the INFO/END coordinate. If ID is present in the annotation file and given as "-c ~ID", then VCF records will be matched also by the ID column.

Examining our current format:

In [2]:
%%bash
head out.wig

#bedGraph section chr1:10074-11098
chr1	10074	10075	0.053
chr1	10075	10076	0.064
chr1	10076	10077	0.064
chr1	10077	10078	0.064
chr1	10078	10079	-2.109
chr1	10079	10080	0.053
chr1	10080	10081	0.053
chr1	10081	10082	0.064
chr1	10082	10083	0.064


We see that it is quite similar. Unfortunately, since the first record is not at the beginning of the chromosome, I can't immediately tell if it is 0 or 1 based. According to  [UCSC Genome Browser Blog : The UCSC Genome Browser Coordinate Counting Systems](https://genome-blog.gi.ucsc.edu/blog/2016/12/12/the-ucsc-genome-browser-coordinate-counting-systems/) bigWigs can be 0-start, half-open or 1-start fully closed. Given that all the values are one apart, this is clearly 0-based.


```
Chr1        T   A   C   G   T
          | | | | | | | | | |
1 based   | 1 | 2 | 3 | 4 | 5

0 based   0   1   2   3   4
```

Grabbing the second index of a zero-based coordinate specifying a single base will specify the same base in 1-based coordinates.

In [3]:
#write the header to file
!echo $'#CHROM\tPOS\tP_ANNO' > out_processed.tsv

#Grep strips the comments
#awk kills the third column
!cat out.wig | grep --invert-match '^#' | awk '{print $1, $3, $4}' OFS="\t" >> out_processed.tsv

Examine the file we just made to make sure it is OK

In [4]:
%%bash
head out_processed.tsv

#CHROM	POS	P_ANNO
chr1	10075	0.053
chr1	10076	0.064
chr1	10077	0.064
chr1	10078	0.064
chr1	10079	-2.109
chr1	10080	0.053
chr1	10081	0.053
chr1	10082	0.064
chr1	10083	0.064


Delete the first wig we created : no longer useful to us. 

In [5]:
!rm out.wig