# HISAT (Hierarchical Indexing for Spliced Alignment of Transcripts)

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT’s hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ~64,000 bp.

We are going to use Yeast (Saccharomyces cerevisiae).

First we use Hisat2 to download the the files and align them. Select the following cell and run it.

In [None]:
from pandas import read_csv

RNASeqSRARunTableFile='../../../data/RNASeqSRA.tsv'
RNASeqSRATable = read_csv(RNASeqSRARunTableFile, delimiter='\t')
RNASeqoutrun = (RNASeqSRATable["Run"]).astype(list)
RNASeqoutputSam = "test/" + RNASeqoutrun + ".sam"
RNASeqoutputAlignmentSummary = "test/" + RNASeqoutrun + ".txt"
RNASeqoutputMetrics = "test/" + RNASeqoutrun + ".metrics"
RNASeqoutputSortBam = "test/" + RNASeqoutrun + ".sorted.bam"

ChIPSeqSRARunTableFile='../../../data/ChIPSeqSRA.tsv'
ChIPSeqSRATable = read_csv(ChIPSeqSRARunTableFile, delimiter='\t')
ChIPSeqoutrun = (ChIPSeqSRATable["Run"]).astype(list)
ChIPSeqoutputSam = "test/" + ChIPSeqoutrun + ".sam"
ChIPSeqoutputAlignmentSummary = "test/" + ChIPSeqoutrun + ".txt"
ChIPSeqoutputMetrics = "test/" + ChIPSeqoutrun + ".metrics"
ChIPSeqoutputSortBam = "test/" + ChIPSeqoutrun + ".sorted.bam"

In [21]:
!bash ../../../scripts/make_yeast_index.sh

--2018-04-17 18:46:39--  ftp://ftp.ensembl.org/pub/release-84/fasta/saccharomyces_cerevisiae/dna//Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz
           => ‘Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz’
Resolving ftp.ensembl.org... 193.62.193.8
Connecting to ftp.ensembl.org|193.62.193.8|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-84/fasta/saccharomyces_cerevisiae/dna/ ... done.
==> SIZE Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz ... 3786708
==> PASV ... done.    ==> RETR Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz ... done.
Length: 3786708 (3.6M) (unauthoritative)


2018-04-17 18:46:41 (5.45 MB/s) - ‘Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz’ saved [3786708]

Running /home/ubuntu/SteveSemick/hisat2-2.1.0/hisat2-build genome.fa genome
Settings:
  Output files: "genome.*.ht2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)


Align the RNA-Seq samples using Hisat

In [None]:
for index, individual in enumerate(outrun):
    run = RNASeqoutrun[index]
    summary = RNASeqoutputAlignmentSummary[index] 
    metrics = RNASeqoutputMetrics[index]
    sam = RNASeqoutputSam[index]
    bam = RNASeqoutputSortBam[index]
    !hisat2 -x /home/ubuntu/hussainather/omicsedu/yeast_index/genome --sra-acc $run --new-summary --summary-file ../../../$summary --met-file ../../../$metrics -S ../../../$sam


Sort the output files and conver them to bam files

In [None]:
for index, individual in enumerate(outrun):
    run = outrun[index]
    summary = outputAlignmentSummary[index] 
    metrics = outputMetrics[index]
    sam = outputSam[index]
    bam = outputSortBam[index]
    !samtools view -bSF4 /home/ubuntu/hussainather/omicsedu/$sam | samtools sort -o /home/ubuntu/hussainather/omicsedu/$bam

[bam_sort_core] merging from 9 files and 1 in-memory blocks...
[bam_sort_core] merging from 5 files and 1 in-memory blocks...


Do the same thing for ChIP-Seq samples.

In [None]:
for index, individual in enumerate(outrun):
    run = ChIPSeqoutrun[index]
    summary = ChIPSeqoutputAlignmentSummary[index] 
    metrics = ChIPSeqoutputMetrics[index]
    sam = ChIPSeqoutputSam[index]
    bam = ChIPSeqoutputSortBam[index]
    !hisat2 -x /home/ubuntu/hussainather/omicsedu/yeast_index/genome --sra-acc $run --new-summary --summary-file ../../../$summary --met-file ../../../$metrics -S ../../../$sam