# HISAT (Hierarchical Indexing for Spliced Alignment of Transcripts)

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT’s hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ~64,000 bp.

We are going to use Yeast (Saccharomyces cerevisiae) S288C. S288C refers to the strain of the Yeast. 

In [3]:
wget `esearch -db assembly -query "Saccharomyces cerevisiae[ORGN] AND 288C[SB]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element FtpPath_RefSeq | \
awk -F"/" '{print $0"/"$NF"_genomic.fna.gz"}'`

/bin/sh: 1: esearch: not found
/bin/sh: 1: xtract: not found

  EFETCH - retrieve entries from sequence databases.

  Synopsis: efetch -options [database:]<query> 

  Databases:  SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite

  Options:
    -a            Search with Accession number
    -f            Fasta format output
    -q            Sequence only output (one line)
    -s <#>        Start at position #
    -e <#>        Stop at position #
    -o            More options and info...

    -D <dir>      Specify database directory
    -H            Display index header data
    -p            Display entrynames in search path
    -r            Print sequence in 'raw' format
    -m            Fetch from mixed mini database
    -M            Mini format output
    -b            Do NOT reverse the order of bytes
                              (SunOS, IRIX do reverse, Alpha not)
    -d <dbfile>   Specify database file (avoid this)
    -i <idxfile> 

First we use Hisat2 to download the the files and align them. Select the following cell and run it.

In [4]:
file = open("SRA.txt", "r")
lines = file.readlines()
for line in lines:
    sraNum = line.split();
    output = "test/" + sraNum[0] + ".sam"
    print(output)
    """!hisat2\
       -x hisat2-2.1.0/grch38/genome\
       --sra-acc sraNum -S output.sam"""
file.close

test/SRA1.sam
test/SRA2.sam
test/SRA3.sam
test/SRA4.sam
test/SRA5.sam
test/SRA6.sam


<function TextIOWrapper.close>