#### Alignment:
The good news: Oxford nanopore reads are extrememly long. Repeat regions are no match for long reads, the variable flanking regions around the repeat can determine where in the genome this read belongs. It also means that any ambiguity in alignment is cleared up through other regions in the read.
The bad news: Oxford nanopore reads lack accuracy compared to other second generation sequencing technologies.
Therefore, seed-and-extend algorithms have had to change tack a bit to allow for more mismatches in the seed before dismissing a read entirely.
We will use a bwa mem variant that has the parameters optimised for Oxford Nanopore reads.
This tutorial will take you through the basics of converting a set of fastq files to a sam file to a sorted bam file.
It is important for bam files to be sorted for many downstream analysis.
It's "sorted" in the computer way. Bam files are sam files in binary format so it isn't easy to show how the end result.
We can use the fastq files that we extracted from our fast5 files and compare the accuracies over the different chemistries.

In [None]:
# Import the modules we need.
import os
import subprocess
from Bio import Entrez, SeqIO

In [None]:
# Set the path to include anaconda2
PATH=os.environ.get('PATH')
HOME=os.environ.get('HOME')

ANACONDA_PATH = HOME + "/programs/anaconda2/bin"
PATH = ANACONDA_PATH + ":" + PATH
os.environ['PATH'] = PATH

In [None]:
# Set the directories:
FASTQ_DIRECTORY = ""
ALIGNMENT_DIRECTORY = ""
SAMPLE_NAME = ""

# Create the alignment directory if it doesn't already exist.
if not os.path.isdir(ALIGNMENT_DIRECTORY):
    os.mkdir(ALIGNMENT_DIRECTORY)

In [None]:
# Download the reference 
from Bio import Entrez, SeqIO

# Use your own email here
Entrez.email = "alexiswl@student.unimelb.edu.au"

# Create reference directory and file name.
reference_directory = "/home/researcher/references/"
if not os.path.isdir(reference_directory):
    os.mkdir(reference_directory)
reference_name = "Escherichia_coli_k12_MG1655"
reference_file = reference_directory + reference_name + ".fa"
uid = "U00096.3"  # This is the uid for E coli genome strain K-12 MG1655
handle = Entrez.efetch(db="nucleotide", id=uid, rettype="fasta")
fasta_handler = SeqIO.read(handle, "fasta")
reference_handler = open(reference_file, "w")
SeqIO.write(fasta_handler, reference_handler, "fasta")

In [None]:
# Use samtools to generate a fasta index for the reference file.
command = "samtools faidx %s" % reference_file
stderr = subprocess.check_output(command, shell=True, stderr=subprocess.STDOUT)

if not stderr == "":
    print(stderr)

In [None]:
# Before we run bwa mem, we will also need to run bwa index on the reference file
bwa_index_command = "bwa index %s" % reference_file

In [None]:
# Now we can align the genome using bwa-mem
# The output is in sam format and printed to the command line.
# We can redirect the output using the > sign.
sam_file = ALIGNMENT_DIRECTORY + SAMPLE_NAME + ".sam"
bam_file = ALIGNMENT_DIRECTORY + SAMPLE_NAME + ".bam"
sorted_bam_file = ALIGNMENT_DIRECTORY + SAMPLE_NAME + ".sorted.bam"
sorted_bam_file_index = ALIGNMENT_DIRECTORY + SAMPLE_NAME + ".sorted.bai"

bwa_command = "bwa mem -x ont2d %s %s > %s" % (reference_file, fasta_file, sam_file)
stderr = subprocess.check_output(bwa_command, shell=True, stderr=subprocess.STDOUT)

if not stderr == "":
    print("Error: ", stderr)

In [None]:
# Now let's turn that sam file into a bam file.
sam_to_bam_command = "samtools view -b %s -o %s" % (sam_file, bam_file)
stderr = subprocess.check_output(sam_to_bam_command, shell=True, stderr=subprocess.STDOUT)

if not stderr == "":
    print("Error: ", stderr)

In [None]:
# Now sort the sam file
sort_bam_command = "samtools sort -o %s %s" % (sorted_bam_file, bam_file)
stderr = subprocess.check_output(sort_bam_command, shell=True, stderr=subprocess.STDOUT)

if not stderr == "":
    print("Error: ", stderr)

In [None]:
# Now index the bam file
index_bam_command = "samtools index %s %s" % (sorted_bam_file, sorted_bam_file_index)
stderr = subprocess.check_output(index_bam_command, shell=True, stderr=subprocess.STDOUT)

if not stderr == "":
    print("Error: ", stderr)