## 4. SNP Calling

*Tools* 

**mpileup** 

* Conda: comes with `samtools` --> now deprecated - use bcftools mpileup
* Manual: http://samtools.sourceforge.net/mpileup.shtml

**bcftools**

* Conda: https://bioconda.github.io/recipes/bcftools/README.html#package-bcftools
* Manual: http://samtools.github.io/bcftools/


**FreeBayes** 
* Conda: https://bioconda.github.io/recipes/freebayes/README.html#package-freebayes
* Manual: https://github.com/ekg/freebayes
* Tutorial: http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

**VarScan**
* Conda: https://bioconda.github.io/recipes/varscan/README.html#package-varscan
* Manual: http://varscan.sourceforge.net/using-varscan.html

**vcflib** *To manipulate vcf files, filter and stats* 
* Conda: https://bioconda.github.io/recipes/vcflib/README.html#package-vcflib
* Manual: https://github.com/vcflib/vcflib 

**vcftools*
* Conda: https://bioconda.github.io/recipes/vcftools/README.html#package-vcftools
* Manual: https://vcftools.github.io/

#### Useful tools 

**maftools** 

*To plot SNPs (in R):* https://bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html
https://www.researchgate.net/post/Tools_for_data_visualization_and_summary-VCF_files

** ** 
*bcftools* 

### Variant calling steps

#### 1. Variant Calling
`bcftools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF \
  --output-tags DP,AD \
  --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
  -f "{Ref_FASTA}" \
  --BCF "{repBAM1}" "{repBAM2}" "{repBAM3}" "{repBAM4}" | \
bcftools call --multiallelic-caller --variants-only -Ob > out.bcf`

**bcftools mpileup** command automatically scans every position supported by an aligned read, computes all the possible genotypes supported by raw reads, and then calculates the probability that each of these genotypes is truly present in your sample.

For example, let’s consider the first 1000 bases in Reference Genome file. Suppose the position 35 (in reference G) will have 27 reads with a G base and two reads with a T nucleotide. Total read depth will be 29. In this case, the app concludes with high probability that the sample has a genotype of G, and the T reads are likely due to sequencing errors. In contrast, if the position 400 in reference genome is T, but it is covered by 2 reads with a C base and 66 reads with a G (total read depth equal to 68), it means that the sample more likely will have G genotype.

**bcftools call** command uses the genotype likelihoods generated from samtools mpileup to call genetic variants and outputs the all identified variants.

So, it means, that file.bcf will contain all possible genotypes in the genome, but the bcftools bcf file will contain only sites which were found to be variant.

If you are interested in specific sites that were not called by bcftools, you can break it down into two separate steps

#### 2. SNP Filtering
Options:
*   -Q INT  minimum RMS mapping quality for SNPs [10]
*   -d INT  minimum read depth [2]
*   -D INT  maximum read depth [10000000]
*   -a INT  minimum number of alternate bases [2]
*   -w INT  SNP within INT bp around a gap to be filtered [3]
*   -W INT  window size for filtering adjacent gaps [10]
*   -1 FLOAT    min P-value for strand bias (given PV4) [0.0001]
*   -2 FLOAT    min P-value for baseQ bias [1e-100]
*   -3 FLOAT    min P-value for mapQ bias [0]
*   -4 FLOAT    min P-value for end distance bias [0.0001]
*   -e FLOAT    min P-value for HWE (plus F<0) [0.0001]
*   -p      print filtered variants

`bcftools view -Ov out.bcf | misc/vcfutils.pl varFilter -d 18 -w 1 -W 3 -a 1 \
-1 0.05 -2 0.05 -3 0.05 -4 0.05 \
-e 0.05 -p > out.filt.vcf ;`

`misc/vcfutils.pl` should come bundled with BCFtools

In [3]:
! ls ./P1

human_cambridge_reference_mito.fasta mito_yoruba_reads_pe1.sorted.bam


In [4]:
# Index the bam file with samtools (5. step of Alignment)
! samtools index ./P1/mito_yoruba_reads_pe1.sorted.bam

In [5]:
! ls ./P1

human_cambridge_reference_mito.fasta mito_yoruba_reads_pe1.sorted.bam.bai
mito_yoruba_reads_pe1.sorted.bam


Use `mpileup` and `bcftools` to call the SNP
* Mpileup use **Base Alignment Quality (BAC)** to identify false SNP

In [6]:
# 1. Call SNP
! samtools mpileup --help

samtools: unrecognized option `--help'

Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

Input options:
  -6, --illumina1.3+      quality is in the Illumina-1.3+ encoding
  -A, --count-orphans     do not discard anomalous read pairs
  -b, --bam-list FILE     list of input BAM filenames, one per line
  -B, --no-BAQ            disable BAQ (per-Base Alignment Quality)
  -C, --adjust-MQ INT     adjust mapping quality; recommended:50, disable:0 [0]
  -d, --max-depth INT     max per-file depth; avoids excessive memory usage [8000]
  -E, --redo-BAQ          recalculate BAQ on the fly, ignore existing BQs
  -f, --fasta-ref FILE    faidx indexed reference sequence file
  -G, --exclude-RG FILE   exclude read groups listed in FILE
  -l, --positions FILE    skip unlisted positions (chr pos) or regions (BED)
  -q, --min-MQ INT        skip alignments with mapQ smaller than INT [0]
  -Q, --min-BQ INT        skip bases with baseQ/BAQ smaller than INT [13]
  -r, --region REG        region in 

**bcftools mpileup**

`bcftools mpileup [OPTIONS] -f ref.fa in.bam [in2.bam […]]`

*Generate VCF or BCF containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files* 

This is based on the original samtools mpileup command (with the -v or -g options) producing genotype likelihoods in VCF or BCF format, but not the textual pileup output. The mpileup command was transferred to bcftools in order to avoid errors resulting from use of incompatible versions of samtools and bcftools when using in the mpileup+bcftools call pipeline.

Individuals are identified from the `SM` tags in the `@RG` header lines. Multiple individuals can be pooled in one alignment file, also one individual can be separated into multiple files. If sample identifiers are absent, each input file is regarded as one sample.

In [10]:
# Test: producing mpileup file to see structure
! bcftools mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam > ./P1/1.test_mpileup.vcf

[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250


In [11]:
! ls ./P1

1.test_mpileup.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [13]:
! cat ./P1/1.test_mpileup.vcf

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.10.2+htslib-1.10.2
##bcftoolsCommand=mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam
##reference=file://./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test 

gi|251831106|ref|NC_012920.1|	91	.	C	<*>	0	.	DP=90;I16=90,0,0,0,1530,26010,0,0,1800,36000,0,0,1505,32379,0,0;QS=1,0;MQ0F=0	PL	0,255,73
gi|251831106|ref|NC_012920.1|	92	.	G	T,<*>	0	.	DP=86;I16=85,0,1,0,1445,24565,17,289,1700,34000,20,400,1486,31912,25,625;QS=0.988372,0.0116279,0;SGB=-0.379885;RPB=1;MQB=1;BQB=1;MQ0F=0	PL	0,239,70,255,73,70
gi|251831106|ref|NC_012920.1|	93	.	A	C,<*>	0	.	DP=87;I16=86,0,1,0,1462,24854,17,289,1720,34400,20,400,1506,32578,9,81;QS=0.988506,0.0114943,0;SGB=-0.379885;RPB=1;MQB=1;BQB=1;MQ0F=0	PL	0,242,70,255,73,71
gi|251831106|ref|NC_012920.1|	94	.	G	<*>	0	.	DP=89;I16=89,0,0,0,1513,25721,0,0,1780,35600,0,0,1516,32640,0,0;QS=1,0;MQ0F=0	PL	0,255,72
gi|251831106|ref|NC_012920.1|	95	.	A	C,<*>	0	.	DP=90;I16=89,0,1,0,1513,25721,17,289,1780,35600,20,400,1511,32463,4,16;QS=0.988889,0.0111111,0;SGB=-0.379885;RPB=1;MQB=1;BQB=1;MQ0F=0	PL	0,251,71,255,74,71
gi|251831106|ref|NC_012920.1|	96	.	C	G,<*>	0	.	DP=88;I16=86,0,2,0,1462,24854,34,578,1720,34400,40,800,1473,31415,4

gi|251831106|ref|NC_012920.1|	1665	.	C	G,A,<*>	0	.	DP=163;I16=71,88,2,2,2700,45858,68,1156,3180,63600,80,1600,2392,47784,73,1425;QS=0.975434,0.0122832,0.0122832,0;VDB=0.516498;SGB=-0.556411;RPB=0.986275;MQB=1;MQSB=1;BQB=0.99977;MQ0F=0	PL	0,255,197,255,200,197,255,203,203,197
gi|251831106|ref|NC_012920.1|	1666	.	T	A,G,<*>	0	.	DP=161;I16=72,87,1,1,2703,45951,31,485,3180,63600,40,800,2451,48939,4,10;QS=0.988661,0.006218,0.0051207,0;VDB=0.98;SGB=-0.453602;RPB=0.940252;MQB=1;MQSB=1;BQB=0.5;MQ0F=0	PL	0,255,199,255,202,199,255,202,202,199
gi|251831106|ref|NC_012920.1|	1667	.	C	G,<*>	0	.	DP=160;I16=72,87,1,0,2702,45918,17,289,3180,63600,20,400,2418,48002,25,625;QS=0.993748,0.0062523,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,200,255,203,200
gi|251831106|ref|NC_012920.1|	1668	.	T	G,<*>	0	.	DP=157;I16=70,86,0,1,2652,45084,17,289,3120,62400,20,400,2408,47714,25,625;QS=0.993631,0.00636943,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,200,255,203,200
gi|251831106|ref|NC

gi|251831106|ref|NC_012920.1|	1912	.	A	C,T,<*>	0	.	DP=180;I16=81,97,1,1,3024,51378,32,514,3560,71200,40,800,2655,53461,2,4;QS=0.989529,0.00556283,0.00490838,0;VDB=0.06;SGB=-0.453602;RPB=0.0393258;MQB=1;MQSB=1;BQB=0.505618;MQ0F=0	PL	0,255,206,255,209,206,255,209,209,206
gi|251831106|ref|NC_012920.1|	1913	.	G	C,T,<*>	0	.	DP=175;I16=79,93,1,2,2924,49708,51,867,3440,68800,60,1200,2590,51884,62,1394;QS=0.982857,0.0114286,0.00571429,0;VDB=0.301053;SGB=-0.511536;RPB=0.749919;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,202,255,207,203,255,208,206,203
gi|251831106|ref|NC_012920.1|	1914	.	A	T,<*>	0	.	DP=179;I16=83,95,0,1,3026,51442,17,289,3560,71200,20,400,2640,53026,8,64;QS=0.994413,0.00558659,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,207,255,210,207
gi|251831106|ref|NC_012920.1|	1915	.	C	G,A,<*>	0	.	DP=177;I16=80,95,1,1,2975,50575,34,578,3500,70000,40,800,2617,52389,33,569;QS=0.988701,0.00564972,0.00564972,0;VDB=0.76;SGB=-0.453602;RPB=0.854286;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,205,25

gi|251831106|ref|NC_012920.1|	2721	.	G	T,C,<*>	0	.	DP=166;I16=71,92,2,1,2771,47107,49,803,3260,65200,60,1200,2626,54062,28,634;QS=0.982624,0.0113475,0.00602837,0;VDB=0.411594;SGB=-0.511536;RPB=0.128853;MQB=1;MQSB=1;BQB=0.613838;MQ0F=0	PL	0,255,201,255,206,202,255,208,205,202
gi|251831106|ref|NC_012920.1|	2722	.	A	T,<*>	0	.	DP=169;I16=77,91,0,1,2856,48552,17,289,3360,67200,20,400,2659,54975,5,25;QS=0.994083,0.00591716,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,203,255,206,203
gi|251831106|ref|NC_012920.1|	2723	.	A	C,<*>	0	.	DP=165;I16=74,90,1,0,2788,47396,17,289,3280,65600,20,400,2663,55085,17,289;QS=0.993939,0.00606061,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,202,255,205,203
gi|251831106|ref|NC_012920.1|	2724	.	G	T,<*>	0	.	DP=164;I16=73,90,1,0,2771,47107,17,289,3260,65200,20,400,2667,54995,25,625;QS=0.993902,0.00609756,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,202,255,205,203
gi|251831106|ref|NC_012920.1|	2725	.	A	T,C,<*>	0	.	DP=164;I16=

gi|251831106|ref|NC_012920.1|	4046	.	A	C,T,<*>	0	.	DP=151;I16=69,77,4,1,2482,42194,85,1445,2920,58400,100,2000,2150,42528,86,1780;QS=0.966887,0.0264901,0.00662252,0;VDB=0.51241;SGB=-0.590765;RPB=0.707462;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,187,255,194,189,255,199,192,190
gi|251831106|ref|NC_012920.1|	4047	.	T	A,<*>	0	.	DP=150;I16=70,79,1,0,2533,43061,17,289,2980,59600,20,400,2224,44026,19,361;QS=0.993333,0.00666667,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,195,255,198,195
gi|251831106|ref|NC_012920.1|	4048	.	G	T,<*>	0	.	DP=151;I16=72,78,1,0,2550,43350,17,289,3000,60000,20,400,2232,44148,21,441;QS=0.993378,0.00662252,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,194,255,197,194
gi|251831106|ref|NC_012920.1|	4049	.	A	<*>	0	.	DP=151;I16=73,78,0,0,2567,43639,0,0,3020,60400,0,0,2264,44918,0,0;QS=1,0;MQSB=1;MQ0F=0	PL	0,255,195
gi|251831106|ref|NC_012920.1|	4050	.	C	G,<*>	0	.	DP=149;I16=70,77,2,0,2499,42483,34,578,2940,58800,40,800,2252,44988,24,386;QS=0.986577,0.

gi|251831106|ref|NC_012920.1|	5103	.	A	C,T,<*>	0	.	DP=154;I16=79,73,1,1,2584,43928,30,458,3040,60800,40,800,2253,43303,25,625;QS=0.988523,0.00650344,0.00497322,0;VDB=0.84;SGB=-0.453602;RPB=0.565789;MQB=1;MQSB=1;BQB=0.5;MQ0F=0	PL	0,255,190,255,193,190,255,193,193,190
gi|251831106|ref|NC_012920.1|	5104	.	C	G,<*>	0	.	DP=154;I16=80,72,2,0,2584,43928,34,578,3040,60800,40,800,2260,43974,20,202;QS=0.987013,0.012987,0;VDB=0.06;SGB=-0.453602;RPB=0.375;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,189,255,195,190
gi|251831106|ref|NC_012920.1|	5105	.	T	A,<*>	0	.	DP=152;I16=78,70,4,0,2516,42772,68,1156,2960,59200,80,1600,2221,43285,64,1218;QS=0.973684,0.0263158,0;VDB=0.100421;SGB=-0.556411;RPB=0.220822;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,183,255,195,186
gi|251831106|ref|NC_012920.1|	5106	.	A	T,<*>	0	.	DP=151;I16=82,68,0,1,2550,43350,17,289,3000,60000,20,400,2264,44232,25,625;QS=0.993378,0.00662252,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,188,255,191,188
gi|251831106|ref|NC_012920.1|	5107	.

gi|251831106|ref|NC_012920.1|	6736	.	T	A,G,<*>	0	.	DP=180;I16=84,89,4,3,2941,49997,119,2023,3460,69200,140,2800,2666,53254,147,3367;QS=0.961111,0.0222222,0.0166667,0;VDB=0.0536475;SGB=-0.636426;RPB=0.654231;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,195,255,200,196,255,207,205,197
gi|251831106|ref|NC_012920.1|	6737	.	A	T,C,<*>	0	.	DP=181;I16=87,91,1,2,3026,51442,51,867,3560,71200,60,1200,2757,55345,45,869;QS=0.983425,0.0110497,0.00552486,0;VDB=0.185719;SGB=-0.511536;RPB=0.293061;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,202,255,206,202,255,208,205,203
gi|251831106|ref|NC_012920.1|	6738	.	T	G,A,<*>	0	.	DP=179;I16=84,92,1,2,2992,50864,51,867,3520,70400,60,1200,2749,55043,42,782;QS=0.98324,0.0111732,0.00558659,0;VDB=0.559209;SGB=-0.511536;RPB=0.446586;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,202,255,206,203,255,208,206,203
gi|251831106|ref|NC_012920.1|	6739	.	C	G,<*>	0	.	DP=183;I16=85,96,2,0,3077,52309,34,578,3620,72400,40,800,2742,54804,36,746;QS=0.989071,0.010929,0;VDB=0.72;SGB=-0.453602;RPB=0.853591;MQB

gi|251831106|ref|NC_012920.1|	8244	.	A	T,<*>	0	.	DP=175;I16=93,79,0,3,2924,49708,48,774,3440,68800,60,1200,2683,53897,40,822;QS=0.983849,0.0161507,0;VDB=0.82085;SGB=-0.511536;RPB=0.733443;MQB=1;MQSB=1;BQB=0.613462;MQ0F=0	PL	0,255,194,255,203,195
gi|251831106|ref|NC_012920.1|	8245	.	A	T,C,<*>	0	.	DP=177;I16=93,82,1,1,2972,50482,34,578,3500,70000,40,800,2689,54093,41,865;QS=0.988689,0.00565536,0.00565536,0;VDB=0.22;SGB=-0.453602;RPB=0.642857;MQB=1;MQSB=1;BQB=0.994286;MQ0F=0	PL	0,255,197,255,200,197,255,200,200,197
gi|251831106|ref|NC_012920.1|	8246	.	A	C,T,<*>	0	.	DP=175;I16=92,80,2,1,2924,49708,51,867,3440,68800,60,1200,2688,54144,52,1210;QS=0.982857,0.0114286,0.00571429,0;VDB=0.460446;SGB=-0.511536;RPB=0.575144;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,195,255,199,195,255,201,198,196
gi|251831106|ref|NC_012920.1|	8247	.	T	G,A,<*>	0	.	DP=174;I16=92,79,1,2,2907,49419,51,867,3420,68400,60,1200,2687,54493,64,1410;QS=0.982759,0.0114943,0.00574713,0;VDB=0.074936;SGB=-0.511536;RPB=0.571467;MQB=1;

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



*bcftools mpileup* calculates the genotype likelihoods for each position. It is not doing variant calling! What you see is the position, the REF and observed ALT alleles, some mpileup specific information which are used later in variant calling (for there description look into the header of the output) and the likelihoods for each possible genotype

* CHROM: contig=ID=gi|251831106|ref|NC_012920.1|,length=16569
* POS
* ID
* REF
* ALT: ID=`*`,Description="Represents allele(s) other than observed."
* QUAL 
* FILTER
* INFO: ID=`INDEL`,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL."
* INFO: ID=`IDV`,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel"
* INFO: ID=`IMF`,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel"
* INFO: ID=`DP`,Number=1,Type=Integer,Description="Raw read depth"
* INFO: ID=`VDB`,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3"
* INFO: ID=`RPB`,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)"
* INFO: ID=`MQB`,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)"
* INFO: ID=`BQB`,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)"
* INFO: ID=`MQSB`,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)"
* INFO: ID=`SGB`,Number=1,Type=Float,Description="Segregation based metric."
* INFO: ID=`MQ0F`,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)"
* INFO: ID=`I16`,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h"The fields are:
  * depth fwd: ref (0) and non-ref (2)
  * depth rev: ref (1) and non-ref (3)
  * baseQ: ref (4) and non-ref (6)
  * baseQ^2: ref (5) and non-ref (7)
  * mapQ: ref (8) and non-ref (10)
  * mapQ^2: ref (9) and non-ref (11)
  * minDist: ref (12) and non-ref (14)
  * minDist^2: ref (13) and non-ref (15)
* INFO: ID=`QS`,Number=R,Type=Float,Description="Auxiliary tag used for calling"
* FORMAT: ID=`PL`,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods". The PL values describes a phred scaled likelihood for the different genotypes that are possible like 0/0, 0/1, 1/1, 0/2 etc. The higher the number the less likely this genotype is.

In [2]:
! ls ./P1

1.test_mpileup.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


**bcftools call**

`bcftools call [OPTIONS] FILE`

This command replaces the former `bcftools view` caller. Some of the original functionality has been temporarily lost in the process of transition under htslib, but will be added back on popular demand. The original calling model can be invoked with the `-c` option.


`-c, --consensus-caller`

the original samtools/bcftools calling method

In [3]:
# 1. SNP calling
# Producing mpileup file and calling the variants
! bcftools mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam | bcftools call -cv > ./P1/2.var.raw.vcf

Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250


In [4]:
! ls ./P1

1.test_mpileup.vcf
2.var.raw.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [5]:
! cat ./P1/2.var.raw.vcf

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.10.2+htslib-1.10.2
##bcftoolsCommand=mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam
##reference=file://./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test 

In [6]:
! wc -l ./P1/1.test_mpileup.vcf

   16608 ./P1/1.test_mpileup.vcf


In [7]:
! wc -l ./P1/2.var.raw.vcf

      90 ./P1/2.var.raw.vcf


Variants have been called from the mpileup file.

**bcftools view**

`bcftools view [OPTIONS] file.vcf.gz [REGION […]]`

*View, subset and filter VCF or BCF files by position and filtering expression*

Convert between VCF and BCF. Former bcftools subset.

In [8]:
# 2. SNP filtering
# Test the use of bcftools view
! bcftools view ./P1/2.var.raw.vcf > ./P1/3.var_call.vcf

In [9]:
! ls ./P1

1.test_mpileup.vcf
2.var.raw.vcf
3.var_call.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [10]:
! cat ./P1/3.var_call.vcf

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.10.2+htslib-1.10.2
##bcftoolsCommand=mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam
##reference=file://./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test 

In [11]:
! wc -l ./P1/3.var_call.vcf

      92 ./P1/3.var_call.vcf


In [13]:
# Filtering SNP
! vcfutils.pl


Usage:   vcfutils.pl <command> [<arguments>]

Command: subsam       get a subset of samples
         listsam      list the samples
         fillac       fill the allele count field
         qstats       SNP stats stratified by QUAL

         hapmap2vcf   convert the hapmap format to VCF
         ucscsnp2vcf  convert UCSC SNP SQL dump to VCF

         varFilter    filtering short variants (*)
         vcf2fq       VCF->fastq (**)

Notes: Commands with description endting with (*) may need bcftools
       specific annotations.



In [14]:
#  2. SNP filtering
! bcftools view ./P1/2.var.raw.vcf | vcfutils.pl varFilter -D100 > ./P1/4.var_mito_yoruba_mpileup.flt.vcf

In [15]:
! ls ./P1

1.test_mpileup.vcf
2.var.raw.vcf
3.var_call.vcf
4.var_mito_yoruba_mpileup.flt.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [16]:
! cat ./P1/4.var_mito_yoruba_mpileup.flt.vcf

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.10.2+htslib-1.10.2
##bcftoolsCommand=mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam
##reference=file://./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test 

In [17]:
! wc -l ./P1/4.var_mito_yoruba_mpileup.flt.vcf

      47 ./P1/4.var_mito_yoruba_mpileup.flt.vcf


** ** 

*Freebayes* 

SNP calling program based on bayesian statistics. It is able to deal with individual and populations or pooled and polyploid samples. 

In [18]:
! freebayes -h

usage: freebayes [OPTION] ... [BAM FILE] ... 

Bayesian haplotype-based polymorphism discovery.

citation: Erik Garrison, Gabor Marth
          "Haplotype-based variant detection from short-read sequencing"
          arXiv:1207.3907 (http://arxiv.org/abs/1207.3907)

overview:

    To call variants from aligned short-read sequencing data, supply BAM files and
    a reference.  FreeBayes will provide VCF output on standard out describing SNPs,
    indels, and complex variants in samples in the input alignments.

    By default, FreeBayes will consider variants supported by at least 2
    observations in a single sample (-C) and also by at least 20% of the reads from
    a single sample (-F).  These settings are suitable to low to high depth
    sequencing in haploid and diploid samples, but users working with polyploid or
    pooled samples may wish to adjust them depending on the characteristics of
    their sequencing data.

    FreeBayes is capable of calling vari

In [19]:
# Call SNP with default parameters
! freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam  -v ./P1/1A.var_mito_yoruba_freebayes.vcf


In [20]:
! ls ./P1

1.test_mpileup.vcf
1A.var_mito_yoruba_freebayes.vcf
2.var.raw.vcf
3.var_call.vcf
4.var_mito_yoruba_mpileup.flt.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [21]:
! cat ./P1/1A.var_mito_yoruba_freebayes.vcf

##fileformat=VCFv4.2
##fileDate=20200810
##source=freeBayes v1.3.2-dirty
##reference=./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##phasing=none
##commandline="freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam -v ./P1/1A.var_mito_yoruba_freebayes.vcf"
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=RO,Number=1,Type=Integer,Descript

In [22]:
! wc -l ./P1/1A.var_mito_yoruba_freebayes.vcf

     137 ./P1/1A.var_mito_yoruba_freebayes.vcf


In [24]:
# Change sample ploidy (haploid)
! freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam -p 1   -v ./P1/2A.var_mito_yoruba_freebayes_haplo.vcf

In [25]:
! ls ./P1/

1.test_mpileup.vcf
1A.var_mito_yoruba_freebayes.vcf
2.var.raw.vcf
2A.var_mito_yoruba_freebayes_haplo.vcf
3.var_call.vcf
4.var_mito_yoruba_mpileup.flt.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [26]:
! cat ./P1/2A.var_mito_yoruba_freebayes_haplo.vcf

##fileformat=VCFv4.2
##fileDate=20200810
##source=freeBayes v1.3.2-dirty
##reference=./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##phasing=none
##commandline="freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam -p 1 -v ./P1/2A.var_mito_yoruba_freebayes_haplo.vcf"
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=RO,Number=1,Type=Integ

In [27]:
! wc -l ./P1/2A.var_mito_yoruba_freebayes_haplo.vcf

     137 ./P1/2A.var_mito_yoruba_freebayes_haplo.vcf


It is posible improve the result with quality paramenters such as:

`-m –min-mapping-quality Q` Exclude alignments from analysis if they have a mapping quality less than Q. 
* Default: 30

`-q –min-base-quality Q`    Exclude alleles from analysis if their supporting base quality is less than Q. 
* Default: 20

`-C –min-alternate-count N` Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position. 
* Default: 1

In [28]:
# Calling SNPs only if detected in 3 or more reads
! freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam -p 1 -C 3  -v ./P1/3A.var_mito_yoruba_freebayes_haplo_reads3.vcf

In [29]:
! ls ./P1

1.test_mpileup.vcf
1A.var_mito_yoruba_freebayes.vcf
2.var.raw.vcf
2A.var_mito_yoruba_freebayes_haplo.vcf
3.var_call.vcf
3A.var_mito_yoruba_freebayes_haplo_reads3.vcf
4.var_mito_yoruba_mpileup.flt.vcf
human_cambridge_reference_mito.fasta
human_cambridge_reference_mito.fasta.fai
mito_yoruba_reads_pe1.sorted.bam
mito_yoruba_reads_pe1.sorted.bam.bai


In [31]:
! cat ./P1/3A.var_mito_yoruba_freebayes_haplo_reads3.vcf

##fileformat=VCFv4.2
##fileDate=20200810
##source=freeBayes v1.3.2-dirty
##reference=./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##phasing=none
##commandline="freebayes -f ./P1/human_cambridge_reference_mito.fasta -b ./P1/mito_yoruba_reads_pe1.sorted.bam -p 1 -C 3 -v ./P1/3A.var_mito_yoruba_freebayes_haplo_reads3.vcf"
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth at the locus">
##INFO=<ID=DPB,Number=1,Type=Float,Description="Total read depth per bp at the locus; bases in reads overlapping / bases in haplotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=RO,Number=

In [33]:
! wc -l ./P1/3A.var_mito_yoruba_freebayes_haplo_reads3.vcf

     137 ./P1/3A.var_mito_yoruba_freebayes_haplo_reads3.vcf


** ** 

*VarScan*

SNP calling than works with more simple statistics that may be more robust in extreme read depth, pooled samples, and contaminated or impure samples. VarScan employs statistics based on thresholds for read depth, base quality, variant allele frequency, etc.

The variant calling features of VarScan for single samples (pileup2snp, pileup2indel, pileup2cns) and multiple samples (mpileup2snp, mpileup2indel, mpileup2cns, and somatic) expect input in mpileup format.

In [34]:
# Generating mpileup file
! bcftools mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam > ./P1/1B.mito_yoruba_reads_pe1.mpileup


[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250


In [35]:
! cat ./P1/1B.mito_yoruba_reads_pe1.mpileup

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.10.2+htslib-1.10.2
##bcftoolsCommand=mpileup -f ./P1/human_cambridge_reference_mito.fasta ./P1/mito_yoruba_reads_pe1.sorted.bam
##reference=file://./P1/human_cambridge_reference_mito.fasta
##contig=<ID=gi|251831106|ref|NC_012920.1|,length=16569>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test 

gi|251831106|ref|NC_012920.1|	1182	.	C	A,G,<*>	0	.	DP=177;I16=87,83,3,4,2890,49130,119,2023,3400,68000,140,2800,2658,54386,122,2666;QS=0.960452,0.0225989,0.0169492,0;VDB=0.194874;SGB=-0.636426;RPB=0.185183;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,192,255,196,192,255,204,202,194
gi|251831106|ref|NC_012920.1|	1183	.	T	A,G,<*>	0	.	DP=176;I16=86,86,2,2,2924,49708,66,1092,3440,68800,80,1600,2736,56458,53,1035;QS=0.977926,0.0113712,0.0107023,0;VDB=0.836469;SGB=-0.556411;RPB=0.839935;MQB=1;MQSB=1;BQB=0.694609;MQ0F=0	PL	0,255,198,255,202,198,255,204,204,198
gi|251831106|ref|NC_012920.1|	1184	.	T	G,A,<*>	0	.	DP=172;I16=86,82,1,3,2856,48552,68,1156,3360,67200,80,1600,2730,56354,72,1620;QS=0.976744,0.0174419,0.00581395,0;VDB=0.43429;SGB=-0.556411;RPB=0.676625;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,194,255,200,195,255,203,198,195
gi|251831106|ref|NC_012920.1|	1185	.	C	G,A,<*>	0	.	DP=174;I16=83,84,5,2,2839,48263,119,2023,3340,66800,140,2800,2688,55800,124,2588;QS=0.95977,0.0287356,0.0114943,0;VDB=0.298692

gi|251831106|ref|NC_012920.1|	2015	.	G	C,<*>	0	.	DP=139;I16=72,65,0,2,2329,39593,34,578,2740,54800,40,800,2056,41870,31,565;QS=0.985611,0.0143885,0;VDB=0.78;SGB=-0.453602;RPB=0.868613;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,183,255,189,184
gi|251831106|ref|NC_012920.1|	2016	.	C	A,<*>	0	.	DP=141;I16=73,67,0,1,2380,40460,17,289,2800,56000,20,400,2077,42373,7,49;QS=0.992908,0.0070922,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,186,255,189,187
gi|251831106|ref|NC_012920.1|	2017	.	T	G,A,<*>	0	.	DP=143;I16=72,68,1,2,2380,40460,49,803,2800,56000,60,1200,2036,41252,48,1154;QS=0.979827,0.0139975,0.00617538,0;VDB=0.607612;SGB=-0.511536;RPB=0.572332;MQB=1;MQSB=1;BQB=0.615013;MQ0F=0	PL	0,255,184,255,189,185,255,190,188,185
gi|251831106|ref|NC_012920.1|	2018	.	G	C,<*>	0	.	DP=144;I16=73,68,0,3,2397,40749,49,803,2820,56400,60,1200,2051,41917,36,626;QS=0.979967,0.0200327,0;VDB=0.938134;SGB=-0.511536;RPB=1;MQB=1;MQSB=1;BQB=0.614955;MQ0F=0	PL	0,255,184,255,193,185
gi|251831106|ref|NC_012920.

gi|251831106|ref|NC_012920.1|	3348	.	A	C,<*>	0	.	DP=164;I16=80,80,4,0,2720,46240,68,1156,3200,64000,80,1600,2545,52551,89,2051;QS=0.97561,0.0243902,0;VDB=0.202231;SGB=-0.556411;RPB=0.999943;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,191,255,203,193
gi|251831106|ref|NC_012920.1|	3349	.	A	T,C,<*>	0	.	DP=168;I16=88,76,1,3,2788,47396,68,1156,3280,65600,80,1600,2557,53287,65,1191;QS=0.97619,0.0178571,0.00595238,0;VDB=0.66971;SGB=-0.556411;RPB=0.809444;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,190,255,196,191,255,199,194,191
gi|251831106|ref|NC_012920.1|	3350	.	T	A,G,<*>	0	.	DP=165;I16=83,77,3,2,2720,46240,85,1445,3200,64000,100,2000,2555,53511,57,713;QS=0.969697,0.0181818,0.0121212,0;VDB=0.919254;SGB=-0.590765;RPB=0.813951;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,189,255,194,190,255,198,196,191
gi|251831106|ref|NC_012920.1|	3351	.	C	G,<*>	0	.	DP=165;I16=86,77,2,0,2771,47107,34,578,3260,65200,40,800,2573,53337,30,650;QS=0.987879,0.0121212,0;VDB=0.64;SGB=-0.453602;RPB=0.579755;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,25

gi|251831106|ref|NC_012920.1|	5023	.	C	A,G,<*>	0	.	DP=166;I16=83,80,1,2,2771,47107,51,867,3260,65200,60,1200,2722,56954,40,736;QS=0.981928,0.0120482,0.0060241,0;VDB=0.833544;SGB=-0.511536;RPB=0.861757;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,194,255,198,194,255,200,197,195
gi|251831106|ref|NC_012920.1|	5024	.	C	A,<*>	0	.	DP=168;I16=87,79,0,2,2822,47974,34,578,3320,66400,40,800,2719,56819,41,881;QS=0.988095,0.0119048,0;VDB=0.32;SGB=-0.453602;RPB=0.563253;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,195,255,201,195
gi|251831106|ref|NC_012920.1|	5025	.	C	G,A,<*>	0	.	DP=170;I16=89,79,1,1,2856,48552,34,578,3360,67200,40,800,2722,56904,33,689;QS=0.988235,0.00588235,0.00588235,0;VDB=0.64;SGB=-0.453602;RPB=0.607143;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,195,255,198,195,255,198,198,195
gi|251831106|ref|NC_012920.1|	5026	.	A	C,<*>	0	.	DP=166;I16=86,79,1,0,2805,47685,17,289,3300,66000,20,400,2723,56647,25,625;QS=0.993976,0.0060241,0;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,196,255,199,196
gi|251831

gi|251831106|ref|NC_012920.1|	6755	.	G	T,C,<*>	0	.	DP=165;I16=77,86,1,1,2767,46979,34,578,3260,65200,40,800,2592,53288,29,601;QS=0.987861,0.00606926,0.00606926,0;VDB=0.48;SGB=-0.453602;RPB=0.47546;MQB=1;MQSB=1;BQB=0.98773;MQ0F=0	PL	0,255,198,255,201,198,255,201,201,198
gi|251831106|ref|NC_012920.1|	6756	.	T	A,G,<*>	0	.	DP=163;I16=74,84,4,1,2686,45662,85,1445,3160,63200,100,2000,2546,52462,62,1186;QS=0.969325,0.0245399,0.00613497,0;VDB=0.60156;SGB=-0.590765;RPB=0.272722;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,192,255,200,195,255,204,198,195
gi|251831106|ref|NC_012920.1|	6757	.	T	A,G,<*>	0	.	DP=162;I16=73,83,4,2,2652,45084,102,1734,3120,62400,120,2400,2512,51882,83,1501;QS=0.962963,0.0246914,0.0123457,0;VDB=0.752618;SGB=-0.616816;RPB=0.794415;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,190,255,196,192,255,202,198,193
gi|251831106|ref|NC_012920.1|	6758	.	T	G,<*>	0	.	DP=161;I16=77,82,0,2,2703,45951,34,578,3180,63600,40,800,2539,52159,44,986;QS=0.987578,0.0124224,0;VDB=0.58;SGB=-0.453602;RPB=0.827044;

gi|251831106|ref|NC_012920.1|	8360	.	A	T,C,<*>	0	.	DP=156;I16=69,82,2,3,2567,43639,85,1445,3020,60400,100,2000,2242,44794,74,1484;QS=0.967949,0.0192308,0.0128205,0;VDB=0.831179;SGB=-0.590765;RPB=0.995908;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,191,255,196,192,255,200,198,193
gi|251831106|ref|NC_012920.1|	8361	.	G	T,<*>	0	.	DP=157;I16=69,86,2,0,2635,44795,34,578,3100,62000,40,800,2293,45723,26,626;QS=0.987261,0.0127389,0;VDB=0.74;SGB=-0.453602;RPB=0.541936;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,198,255,204,199
gi|251831106|ref|NC_012920.1|	8362	.	T	A,G,<*>	0	.	DP=155;I16=69,82,3,1,2567,43639,68,1156,3020,60400,80,1600,2272,45370,56,1174;QS=0.974194,0.0193548,0.00645161,0;VDB=0.865924;SGB=-0.556411;RPB=0.987597;MQB=1;MQSB=1;BQB=1;MQ0F=0	PL	0,255,192,255,198,194,255,201,197,194
gi|251831106|ref|NC_012920.1|	8363	.	G	<*>	0	.	DP=156;I16=74,82,0,0,2652,45084,0,0,3120,62400,0,0,2340,46860,0,0;QS=1,0;MQSB=1;MQ0F=0	PL	0,255,198
gi|251831106|ref|NC_012920.1|	8364	.	A	T,C,<*>	0	.	DP=159;I16=73,82,2,2,

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [36]:
! wc -l ./P1/1B.mito_yoruba_reads_pe1.mpileup

   16608 ./P1/1B.mito_yoruba_reads_pe1.mpileup


In [37]:
# Check VarScan options
! varscan

VarScan v2.4.4

***NON-COMMERCIAL VERSION***

USAGE: java -jar VarScan.jar [COMMAND] [OPTIONS] 

COMMANDS:
	pileup2snp		Identify SNPs from a pileup file
	pileup2indel		Identify indels a pileup file
	pileup2cns		Call consensus and variants from a pileup file
	mpileup2snp		Identify SNPs from an mpileup file
	mpileup2indel		Identify indels an mpileup file
	mpileup2cns		Call consensus and variants from an mpileup file

	somatic			Call germline/somatic variants from tumor-normal pileups
	copynumber			Determine relative tumor copy number from tumor-normal pileups
	readcounts		Obtain read counts for a list of variants from a pileup file

	filter			Filter SNPs by coverage, frequency, p-value, etc.
	somaticFilter		Filter somatic variants for clusters/indels
	fpfilter		Apply the false-positive filter

	processSomatic		Isolate Germline/LOH/Somatic calls from output
	copyCaller		GC-adjust and process copy number changes from VarScan copynumber output
	compare			Compare two 

In [38]:
# Checking mpileup2snp options (call SNP, but not InDels)
! varscan mpileup2snp -h

Only SNPs will be reported
Min coverage:	8
Min reads2:	2
Min var freq:	0.2
Min avg qual:	15
P-value thresh:	0.01
USAGE: java -jar VarScan.jar mpileup2cns [pileup file] OPTIONS
	mpileup file - The SAMtools mpileup file

	OPTIONS:
	--min-coverage	Minimum read depth at a position to make a call [8]
	--min-reads2	Minimum supporting reads at a position to call variants [2]
	--min-avg-qual	Minimum base quality at a position to count a read [15]
	--min-var-freq	Minimum variant allele frequency threshold [0.01]
	--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
	--p-value	Default p-value threshold for calling variants [99e-02]
	--strand-filter	Ignore variants with >90% support on one strand [1]
	--output-vcf	If set to 1, outputs in VCF format
	--vcf-sample-list	For VCF output, a list of sample names in order, one per line
	--variants	Report only variant (SNP/indel) positions [0]


In [39]:
# Call variants
! varscan mpileup2snp ./P1/1B.mito_yoruba_reads_pe1.mpileup -output-vcf 1 > ./P1/2B.var_mito_yoruba_varscan.vcf

Only SNPs will be reported
Min coverage:	8
Min reads2:	2
Min var freq:	0.2
Min avg qual:	15
P-value thresh:	0.01
Reading input from ./P1/1B.mito_yoruba_reads_pe1.mpileup
##fileformat=VCFv4.2



In [40]:
! wc -l ./P1/2B.var_mito_yoruba_varscan.vcf

      24 ./P1/2B.var_mito_yoruba_varscan.vcf


Default option: 
`--strand-filter` Ignore variants with >90% support on one strand [1]

Repeat variant calling removing this

In [46]:
# Variant calling with no strand filter
!varscan mpileup2snp ./P1/1B.mito_yoruba_reads_pe1.mpileup --strand-filter 0  -output-vcf 1 > ./P1/3B.var_mito_yoruba_snps_varscan_not_filter_strand.vcf

Only SNPs will be reported
Min coverage:	8
Min reads2:	2
Min var freq:	0.2
Min avg qual:	15
P-value thresh:	0.01
Reading input from ./P1/1B.mito_yoruba_reads_pe1.mpileup
##fileformat=VCFv4.2



In [47]:
! wc -l ./P1/3B.var_mito_yoruba_snps_varscan_not_filter_strand.vcf

      24 ./P1/3B.var_mito_yoruba_snps_varscan_not_filter_strand.vcf


No difference in the variants called

** ** 

*calmd*

Calculates MD and NM tags
* `MD` String encoding mismatched and deleted reference bases
* `NM` Integer. Edit distance to the reference

Generate the MD tag. If the MD tag is already present, this command will give a warning if the MD tag generated is different from the existing tag. Output SAM by default.

**MD**:Z:`[0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)*`

String encoding mismatched and deleted reference bases, used in conjunction with the CIGAR and SEQ fields to reconstruct the bases of the reference sequence interval to which the alignment has been mapped. This can enable variant calling without requiring access to the entire original reference.

The MD string consists of the following items, concatenated without additional delimiter characters:
* [0-9]+, indicating a run of reference bases that are identical to the corresponding SEQ bases;
* [A-Z], identifying a single reference base that differs from the SEQ base aligned at that position;
* ^[A-Z]+, identifying a run of reference bases that have been deleted in the alignment.

As shown in the complete regular expression above, numbers alternate with the other items. Thus if two mismatches or deletions are adjacent without a run of identical bases between them, a `0` (indicating a 0-length run) must be used to separate them in the MD string.

Clipping, padding, reference skips, and insertions (`H`, `S`, `P`, `N`, and `I` CIGAR operations) are not represented in the MD string. When reconstructing the reference sequence, inserted and soft-clipped SEQ bases are omitted as determined by tracking `I` and `S` operations in the CIGAR string. (If the CIGAR string contains `N` operations, then the corresponding skipped parts of the reference sequence cannot be reconstructed.)

For example, a string `10A5^AC6` means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. 


**NM**:i:`count`

Number of differences (mismatches plus inserted and deleted bases) between the sequence and reference, counting only (case-insensitive) A, C, G and T bases in sequence and reference as potential matches, with everything else being a mismatch. 

Note this means that ambiguity codes in both sequence and reference that match each other, such as `N` in both, or compatible codes such as `A` and `R`, are still counted as mismatches. The special sequence base `=` will always be considered to be a match, even if the reference is ambiguous at that point. 

Alignment reference skips, padding, soft and hard clipping (`N`, `P`, `S` and `H` CIGAR operations) do not count as mismatches, but insertions and deletions count as one mismatch per base.

*SNP calling pre calmd processing*

In [52]:
! ls ./P2

alignments.bam ref.fasta


In [53]:
# 1. Index the alignment
! samtools index ./P2/alignments.bam

In [54]:
! ls ./P2

alignments.bam     alignments.bam.bai ref.fasta


In [73]:
# 2. Call variants with freebayes
! freebayes -f ./P2/ref.fasta --min-base-quality 20  ./P2/alignments.bam | bgzip > ./P2/1.alignments.vcf.gz

In [56]:
! ls ./P2

1.alignments.vcf.gz alignments.bam.bai  ref.fasta.fai
alignments.bam      ref.fasta


**tabix**

Generic indexer for TAB-delimited genome position files. 

Tabix indexes a TAB-delimited genome position file `in.tab.bgz` and creates an index file (`in.tab.bgz.tbi` or `in.tab.bgz.csi`) when region is absent from the command-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface.

After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". (Coordinates specified in this region format are 1-based and inclusive.)


In [58]:
# Options
! tabix


Version: 1.10.2
Usage:   tabix [OPTIONS] [FILE] [REGION [...]]

Indexing Options:
   -0, --zero-based           coordinates are zero-based
   -b, --begin INT            column number for region start [4]
   -c, --comment CHAR         skip comment lines starting with CHAR [null]
   -C, --csi                  generate CSI index for VCF (default is TBI)
   -e, --end INT              column number for region end (if no end, set INT to -b) [5]
   -f, --force                overwrite existing index without asking
   -m, --min-shift INT        set minimal interval size for CSI indices to 2^INT [14]
   -p, --preset STR           gff, bed, sam, vcf
   -s, --sequence INT         column number for sequence names (suppressed by -p) [1]
   -S, --skip-lines INT       skip first INT lines [0]

Querying and other options:
   -h, --print-header         print also the header lines
   -H, --only-header          print only the header lines
   -l, --list-chroms          list chromosome 

In [74]:
# 3. Index the file 
! tabix -p vcf ./P2/1.alignments.vcf.gz

In [60]:
! ls ./P2

1.alignments.vcf.gz     alignments.bam          ref.fasta
1.alignments.vcf.gz.tbi alignments.bam.bai      ref.fasta.fai


Extract a specific postion from the file

In [76]:
! tabix ./P2/1.alignments.vcf.gz I:7950020-7950021

I	7950020	.	GCCAG	GCCCAG	9817.24	.	AB=0;ABP=0;AC=14;AF=1;AN=14;AO=316;CIGAR=1M1I4M;DP=322;DPB=385.4;DPRA=0;EPP=10.047;EPPR=0;GTI=0;LEN=1;MEANALT=1.71429;MQM=51.6234;MQMR=0;NS=7;NUMALT=1;ODDS=47.9169;PAIRED=0.993671;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=11163;QR=0;RO=0;RPL=152;RPP=3.99983;RPPR=0;RPR=164;RUN=1;SAF=170;SAP=6.96843;SAR=146;SRF=0;SRP=0;SRR=0;TYPE=ins;technology.illumina=1	GT:DP:AD:RO:QR:AO:QA:GL	1/1:44:0,44:0:0:44:1578:-139.622,-13.2453,0	1/1:59:0,58:0:0:58:2062:-182.758,-17.4597,0	1/1:53:0,53:0:0:53:1847:-162.406,-15.9546,0	1/1:52:0,51:0:0:51:1843:-162.795,-15.3525,0	1/1:26:0,25:0:0:25:892:-79.1411,-7.52575,0	1/1:43:0,43:0:0:43:1473:-130.344,-12.9443,0	1/1:45:0,42:0:0:42:1468:-129.898,-12.6433,0


*SNP calling after calmd processing*

In [77]:
! samtools calmd

Usage: samtools calmd [-eubrAESQ] <aln.bam> <ref.fasta>
Options:
  -e       change identical bases to '='
  -u       uncompressed BAM output (for piping)
  -b       compressed BAM output
  -S       ignored (input format is auto-detected)
  -A       modify the quality string
  -Q       use quiet mode to output less debug info to stdout
  -r       compute the BQ tag (without -A) or cap baseQ by BAQ (with -A)
  -E       extended BAQ for better sensitivity but lower specificity
  --no-PG  do not add a PG line
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, 

In [82]:
# 1. Process aligment with calmd
! samtools calmd -Arb ./P2/alignments.bam ./P2/ref.fasta > ./P2/1A.alignments.calmd.bam

In [83]:
! ls ./P2/

1.alignments.vcf.gz     alignments.bam          ref.fasta.fai
1.alignments.vcf.gz.tbi alignments.bam.bai
1A.alignments.calmd.bam ref.fasta


In [86]:
# 2. Index the processed aligment file
! samtools index ./P2/1A.alignments.calmd.bam

In [87]:
! ls ./P2

1.alignments.vcf.gz         alignments.bam
1.alignments.vcf.gz.tbi     alignments.bam.bai
1A.alignments.calmd.bam     ref.fasta
1A.alignments.calmd.bam.bai ref.fasta.fai


In [88]:
# 3. Call the variants with Freebayes
! freebayes -f ./P2/ref.fasta --min-base-quality 20  ./P2/1A.alignments.calmd.bam | bgzip > ./P2/2B.alignments.calmd.vcf.gz

In [89]:
! ls ./P2

1.alignments.vcf.gz         alignments.bam
1.alignments.vcf.gz.tbi     alignments.bam.bai
1A.alignments.calmd.bam     ref.fasta
1A.alignments.calmd.bam.bai ref.fasta.fai
2B.alignments.calmd.vcf.gz


In [90]:
# 4. Index the file
! tabix -p vcf ./P2/2B.alignments.calmd.vcf.gz

In [91]:
! ls ./P2

1.alignments.vcf.gz            2B.alignments.calmd.vcf.gz.tbi
1.alignments.vcf.gz.tbi        alignments.bam
1A.alignments.calmd.bam        alignments.bam.bai
1A.alignments.calmd.bam.bai    ref.fasta
2B.alignments.calmd.vcf.gz     ref.fasta.fai


Look at specific position of the file

In [94]:
! tabix ./P2/2B.alignments.calmd.vcf.gz I:7950020-7950021

Not present in the file. 

*SNP filtering*

In [1]:
! ls ./P3

ril.vcf.gz


In [4]:
# Index the file
! tabix -p vcf ./P3/ril.vcf.gz

In [5]:
! ls ./P3

ril.vcf.gz     ril.vcf.gz.tbi


In [1]:
! vcffilter

usage: vcffilter [options] <vcf file>

options:
    -f, --info-filter     specifies a filter to apply to the info fields of records,
                          removes alleles which do not pass the filter
    -g, --genotype-filter specifies a filter to apply to the genotype fields of records
    -k, --keep-info       used in conjunction with '-g', keeps variant info, but removes genotype
    -s, --filter-sites    filter entire records, not just alleles
    -t, --tag-pass        tag vcf records as positively filtered with this tag, print all records
    -F, --tag-fail        tag vcf records as negatively filtered with this tag, print all records
    -A, --append-filter   append the existing filter tag, don't just replace it
    -a, --allele-tag      apply -t on a per-allele basis.  adds or sets the corresponding INFO field tag
    -v, --invert          inverts the filter, e.g. grep -v
    -o, --or              use logical OR instead of AND to combine filters
    -r, --regio

In [2]:
# Quality filter 
! vcffilter -f "QUAL > 10" ./P3/ril.vcf.gz | bgzip > ./P3/1.filtered.vcf.gz

In [3]:
! ls ./P3

1.filtered.vcf.gz ril.vcf.gz        ril.vcf.gz.tbi


In [4]:
! vcftools


VCFtools (0.1.16)
© Adam Auton and Anthony Marcketta 2009

Process Variant Call Format files

For a list of options, please go to:
	https://vcftools.github.io/man_latest.html

Alternatively, a man page is available, type:
	man vcftools

Questions, comments, and suggestions should be emailed to:
	vcftools-help@lists.sourceforge.net



In [5]:
! man vcftools

vcftools(man)                    2 August 2018                   vcftools(man)



NNAAMMEE
       vcftools  v0.1.16  -  Utilities  for  the variant call format (VCF) and
       binary variant call format (BCF)

SSYYNNOOPPSSIISS
       vvccffttoooollss [ ----vvccff FILE | ----ggzzvvccff FILE | ----bbccff FILE] [ ----oouutt OUTPUT  PRE-
       FIX ] [ FILTERING OPTIONS ]  [ OUTPUT OPTIONS ]

DDEESSCCRRIIPPTTIIOONN
       vcftools  is  a suite of functions for use on genetic variation data in
       the form of VCF and BCF files. The tools provided will be  used  mainly
       to  summarize data, run calculations on data, filter out data, and con-
       vert data into other useful file formats.

EEXXAAMMPPLLEESS
       Output allele frequency for all sites in the input vcf file from  chro-
       mosome 1
         vvccffttoooollss --gzvcf input_file.vcf.gz --freq --chr 1 --out chr1_analysis

     

In [6]:
# Filter SNP with quality < 10
! vcftools --gzvcf ./P3/ril.vcf.gz --recode --recode-INFO-all --minQ 10 --stdout | gzip -c > ./P3/2.ril_vcf_Q10only.vcf.gz

In [7]:
! ls ./P3

1.filtered.vcf.gz        ril.vcf.gz
2.ril_vcf_Q10only.vcf.gz ril.vcf.gz.tbi


In [9]:
# Filter SNP with more than 20% of missing data
! vcftools --gzvcf ./P3/ril.vcf.gz --recode --recode-INFO-all --max-missing 0.8 --stdout | gzip -c > ./P3/3.ril_vcf_0.8_missing.vcf.gz

In [10]:
! ls ./P3

1.filtered.vcf.gz            ril.vcf.gz
2.ril_vcf_Q10only.vcf.gz     ril.vcf.gz.tbi
3.ril_vcf_0.8_missing.vcf.gz


In [11]:
# Select SNP in a determinate region
! vcftools --gzvcf ./P3/ril.vcf.gz --recode --recode-INFO-all --chr CP4_pseudomolecule00 --from-bp 161000 --to-bp 245000 --stdout |  gzip -c > ./P3/4.ril_vcf_candidate-region.vcf.gz

In [12]:
! ls ./P3

1.filtered.vcf.gz                 4.ril_vcf_candidate-region.vcf.gz
2.ril_vcf_Q10only.vcf.gz          ril.vcf.gz
3.ril_vcf_0.8_missing.vcf.gz      ril.vcf.gz.tbi


In [14]:
# Calculate the observed heterozigosity of all SNP
! vcftools --gzvcf ./P3/ril.vcf.gz   --het --out ./P3/4.ril


VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--gzvcf ./P3/ril.vcf.gz
	--het
	--out ril

Using zlib version: 1.2.11
After filtering, kept 153 out of 153 Individuals
Outputting Individual Heterozygosity
	Individual Heterozygosity: Only using fully diploid SNPs.
	Individual Heterozygosity: Only using biallelic SNPs.
After filtering, kept 943 out of a possible 943 Sites
Run Time = 0.00 seconds


In [15]:
! ls ./P3

1.filtered.vcf.gz                 4.ril.log
2.ril_vcf_Q10only.vcf.gz          4.ril_vcf_candidate-region.vcf.gz
3.ril_vcf_0.8_missing.vcf.gz      ril.vcf.gz
4.ril.het                         ril.vcf.gz.tbi


In [16]:
! cat ./P3/4.ril.het

INDV	O(HOM)	E(HOM)	N_SITES	F
1_14_1_gbs	102	79.8	112	0.68911
1_17_1_gbs	103	79.8	112	0.72020
1_18_4_gbs	108	79.8	112	0.87565
1_19_4_gbs	108	79.8	112	0.87565
1_26_1_gbs	111	79.8	112	0.96891
1_27_1_gbs	98	79.8	112	0.56476
1_2_2_gbs	99	79.8	112	0.59585
1_35_13_gbs	106	79.8	112	0.81347
1_35_2_gbs	111	79.8	112	0.96891
1_3_2_gbs	104	79.8	112	0.75129
1_50_1_gbs	105	79.8	112	0.78238
1_59_1_gbs	106	79.8	112	0.81347
1_59_2_gbs	90	79.8	112	0.31605
1_63_4_gbs	108	79.8	112	0.87565
1_6_1_gbs	105	79.8	112	0.78238
1_6_2_gbs	110	79.8	112	0.93782
1_70_1_gbs	104	79.8	112	0.75129
1_74_1_gbs	106	79.8	112	0.81347
1_74_2_gbs	106	79.8	112	0.81347
1_79_1_gbs	107	79.8	112	0.84456
1_7_2_gbs	108	79.8	112	0.87565
1_81_10_gbs	109	79.8	112	0.90673
1_86_1_gbs	105	79.8	112	0.78238
1_8_2_gbs	104	79.8	112	0.75129
1_91_2_gbs	107	79.8	112	0.84456
1_94_2_gbs	106	79.8	112	0.81347
1_94_4_gbs	110	79.8	112	0.93782
2_107_1_gbs	109	79.8	112	0.90673
2_10_2_gbs	107	79.8	112	0.84456
2_116_1_gbs	107	79.

In [17]:
# Calculate SNP density
! vcftools --gzvcf ./P3/ril.vcf.gz   --SNPdensity 10000 --out ./P3/5.ril


VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--gzvcf ./P3/ril.vcf.gz
	--out ./P3/5.ril
	--SNPdensity 10000

Using zlib version: 1.2.11
After filtering, kept 153 out of 153 Individuals
Outputting SNP density
After filtering, kept 943 out of a possible 943 Sites
Run Time = 0.00 seconds


In [18]:
! ls ./P3

1.filtered.vcf.gz                 4.ril_vcf_candidate-region.vcf.gz
2.ril_vcf_Q10only.vcf.gz          5.ril.log
3.ril_vcf_0.8_missing.vcf.gz      5.ril.snpden
4.ril.het                         ril.vcf.gz
4.ril.log                         ril.vcf.gz.tbi


In [19]:
! cat ./P3/5.ril.snpden

CHROM	BIN_START	SNP_COUNT	VARIANTS/KB
CP4_pseudomolecule00	10000	6	0.6
CP4_pseudomolecule00	20000	1	0.1
CP4_pseudomolecule00	30000	1	0.1
CP4_pseudomolecule00	40000	6	0.6
CP4_pseudomolecule00	50000	0	0
CP4_pseudomolecule00	60000	3	0.3
CP4_pseudomolecule00	70000	0	0
CP4_pseudomolecule00	80000	0	0
CP4_pseudomolecule00	90000	0	0
CP4_pseudomolecule00	100000	0	0
CP4_pseudomolecule00	110000	1	0.1
CP4_pseudomolecule00	120000	0	0
CP4_pseudomolecule00	130000	0	0
CP4_pseudomolecule00	140000	3	0.3
CP4_pseudomolecule00	150000	5	0.5
CP4_pseudomolecule00	160000	12	1.2
CP4_pseudomolecule00	170000	1	0.1
CP4_pseudomolecule00	180000	1	0.1
CP4_pseudomolecule00	190000	0	0
CP4_pseudomolecule00	200000	0	0
CP4_pseudomolecule00	210000	0	0
CP4_pseudomolecule00	220000	1	0.1
CP4_pseudomolecule00	230000	6	0.6
CP4_pseudomolecule00	240000	2	0.2
CP4_pseudomolecule00	250000	10	1
CP4_pseudomolecule00	260000	1	0.1
CP4_pseudomolecule00	270000	3	0.3
CP4_pseudomolecule00	280000	2	0.2
CP4_pseudo