 ### IGV, Interactive Genome Viewer.
 Have you used IGV before? Probably. It's an awesome way to see your reads stacked up against a reference!
 
 Using the E coli reference genome we will compare the output of the different chemistries, the basecalling algorithms used for the same chemistry and the reads of a nanopolished file.
 
 IGV is a great way to see what type of errors are seen in the nanopore. Homopolymers can cause issues as you should be able to observe. Most of the errors in nanopore reads can be seen to be insertion/deletion errors, an issue when determining if a change in the current is due to a nucleotide or just noise.

In [None]:
import os, subprocess, re
from igv import IGV, Reference, Track

In [None]:
# Show paths to the bam files
reference_file = "references/e_coli_k12_mg1655/NC_000913.fna"
bam_dir = "/mnt/shared/PoreCampAU/data/alignment/e_coli_R9/"
my_alignment_dir = "/home/researcher/alignment/"
# List the set of files in the bam directory
for dirpath, dirnames, filenames in os.walk(bam_dir):
    if len(filenames) == 0:  # empty folder
        continue
    for filename in filenames:
        if not filename.endswith(".bam"):  # Not a bam file, maybe an index file.
            continue
        print dirpath + "/" + filename

In [None]:
IGV(locus="")

It's easy to visually see the differences in quality between each chemistry and alignment algorithm.
However, quantitative metrics are often easier to explain to someone.
To do this, we'll use the stats module of samtools to generate a stats report from the bam file.

In [None]:
bam_file = "/mnt/shared/PoreCampAU/data/alignment/e_coli_R9/nanonet2d/2016-11-15_E_COLI_R9_bwa-mem.sorted.bam"
stats_file = my_alignment_dir + "e_coli_R9_metrichor_stats.txt"  # rename this for each bam file to stop overwriting.
samtools_stats_command = "samtools stats %s > %s" % (bam_file, stats_file)
stderr = subprocess.check_call(samtools_stats_command, shell=True, stderr=subprocess.STDOUT)

if not stderr=="":
    print "Stderr = %s" % stderr

Cool, now this file is particularly big for a summary sheet.
Fortunately it's sorted into components that we can extract using the grep command.
...we could also use python, because python is beautiful.

In [None]:
stats_file_handler = open(stats_file, 'r')
for line in stats_file_handler:
    if line.startswith("SN\t"):
        print(line.rstrip()) # rstrip gets rid of the \n at the end of the line.
stats_file_handler.close()