## Aligning fastq files from SHAPE transcriptome wide experiments using HiSat2

In this script, we want to align the fastq files from SHAPE transcriptome wide experiments to the genome and transcriptome using HiSat2 

The goal is to get reads that are overlapping intron-exon junctions and their corresponding exon-exon junctions

In [None]:
%%script bash
# Create HiSat2 index for hg38
#hisat2-build ../data/hg38.fa ../data/hg38_hisat2_index/hg38

In [None]:
%%script bash
hisat2 -p 12 --score-min G,20,8 --mp 6,2 --rdg 5,1 --rfg 5,1 -x ../data/hg38_hisat2_index/hg38 -1 ../data/trial/plus/19098PLUS-Yoruban_cell_lines_SHAPE_CGTACTA-GCGTAAG_Merged_R1.fastq -2 ../data/trial/plus/19098PLUS-Yoruban_cell_lines_SHAPE_CGTACTA-GCGTAAG_Merged_R2.fastq -S ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE.sam

In [None]:
%%script bash
echo "Convert sam file to bam file"
samtools view -b -o ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE.bam ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE.sam
echo "Sort bam file"
samtools sort -@ 4 -0 ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE_sorted.bam ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE.bam
echo "Index bam file"
samtools index ../tmp/HiSat2run_sameParmsAsBowtie2/HiSat2_19098PLUS-Yoruban_cell_lines_SHAPE_sorted.bam

In [1]:
%%bash
# Cut out Bowtie2 SAM flag
samtools view ../processed_data/alignedToWholeGenome/19098PLUS-Yoruban_cell_lines_SHAPE_CGTACTA-GCGTAAG_Merged.sam | cut -f2 > ../processed_data/alignedToWholeGenome/19098PLUS-Yoruban_cell_lines_SHAPE_CGTACTA-GCGTAAG_Merged.SAMflags.txt


In [3]:
import pandas as pd

In [4]:
# Read in flags file and look at value frequency
flags = pd.read_csv("../processed_data/alignedToWholeGenome/19098PLUS-Yoruban_cell_lines_SHAPE_CGTACTA-GCGTAAG_Merged.SAMflags.txt",header=None)
freq_flags = flags[0].value_counts()
freq_flags

77     33176345
141    33176345
99     25597341
147    25597341
83     25516950
163    25516950
133     3324041
69      2752358
161     2274499
81      2274499
145     2259194
97      2259194
89      1710744
73      1613297
153     1417514
137     1334844
177      507778
113      507778
129      472645
65       472645
dtype: int64

In [5]:
flags.shape

(191762302, 1)