In [1]:
mkdir data

In [2]:
module avail sra


--------------------------- /share/apps/modulefiles ----------------------------
   sra-tools/2.10.9


In [3]:
module avail kallisto


--------------------------- /share/apps/modulefiles ----------------------------
   kallisto/0.46.1


In [5]:
module avail salmon


--------------------------- /share/apps/modulefiles ----------------------------
   salmon/0.8.0    salmon/1.4.0


In [10]:
module avail fastqc


--------------------------- /share/apps/modulefiles ----------------------------
   fastqc/0.11.9


In [11]:
module avail trimmomatic


--------------------------- /share/apps/modulefiles ----------------------------
   trimmomatic/0.36    trimmomatic/0.39


In [1]:
module purge

module load sra-tools/2.10.9
module load kallisto/0.46.1
module load salmon/1.4.0
module load fastqc/0.11.9
module load trimmomatic/0.39

In [7]:
## Retrieve RNA-seq data
## https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1679648
## https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR2015713

cd data
fastq-dump --split-3 SRR2015713

Read 60054217 spots for SRR2015713
Written 60054217 spots for SRR2015713


## FastQC

In [3]:
fastqc SRR2015713_1.fastq SRR2015713_2.fastq

Started analysis of SRR2015713_1.fastq
Approx 5% complete for SRR2015713_1.fastq
Approx 10% complete for SRR2015713_1.fastq
Approx 15% complete for SRR2015713_1.fastq
Approx 20% complete for SRR2015713_1.fastq
Approx 25% complete for SRR2015713_1.fastq
Approx 30% complete for SRR2015713_1.fastq
Approx 35% complete for SRR2015713_1.fastq
Approx 40% complete for SRR2015713_1.fastq
Approx 45% complete for SRR2015713_1.fastq
Approx 50% complete for SRR2015713_1.fastq
Approx 55% complete for SRR2015713_1.fastq
Approx 60% complete for SRR2015713_1.fastq
Approx 65% complete for SRR2015713_1.fastq
Approx 70% complete for SRR2015713_1.fastq
Approx 75% complete for SRR2015713_1.fastq
Approx 80% complete for SRR2015713_1.fastq
Approx 85% complete for SRR2015713_1.fastq
Approx 90% complete for SRR2015713_1.fastq
Approx 95% complete for SRR2015713_1.fastq
Analysis complete for SRR2015713_1.fastq
Started analysis of SRR2015713_2.fastq
Approx 5% complete for SRR2015713_2.fastq
Approx 10% complete for

In [None]:
## Retrieve human transcriptome
## https://useast.ensembl.org/info/data/ftp/index.html
cd ..
mkdir GRCh38
wget http://ftp.ensembl.org/pub/release-103/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
wget http://ftp.ensembl.org/pub/release-103/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz

In [None]:
## Prepare full transcriptome (cDNA + ncRNA)
zcat Homo_sapiens.GRCh38.cdna.all.fa.gz > "Homo_sapiens.GRCh38.all.fa"
zcat Homo_sapiens.GRCh38.ncrna.fa.gz >> "Homo_sapiens.GRCh38.all.fa"

## Using Kallisto: Indexing and alignment

https://pachterlab.github.io/kallisto/manual

In [9]:
time kallisto index -i GRCh38 Homo_sapiens.GRCh38.all.fa


[build] loading fasta file Homo_sapiens.GRCh38.all.fa
[build] k-mer length: 31
        from 1977 target sequences
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 1538889 contigs and contains 142407578 k-mers 


real	6m57.983s
user	6m45.349s
sys	0m11.237s


In [5]:
cd ..
time kallisto quant -i GRCh38/GRCh38 \
                    -o result/kallisto \
                    data/SRR2015713_1.fastq data/SRR2015713_2.fastq


[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 255,044
[index] number of k-mers: 142,407,578
[index] number of equivalence classes: 1,039,961
[quant] running in paired-end mode
[quant] will process pair 1: data/SRR2015713_1.fastq
                             data/SRR2015713_2.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 60,054,217 reads, 52,305,678 reads pseudoaligned
[quant] estimated average fragment length: 166.922
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,489 rounds


real	10m4.125s
user	9m34.953s
sys	0m27.287s


## Using Salmon: Indexing and alignment
https://salmon.readthedocs.io/en/latest/

In [6]:
cd GRCh38

In [7]:
time salmon index -t Homo_sapiens.GRCh38.all.fa \
                  -i GRCh38.salmon \
                  -k 31

Version Info: This is the most recent version of salmon.
index ["GRCh38.salmon"] did not previously exist  . . . creating it
[2021-03-07 21:36:43.708] [jLog] [info] building index
out : GRCh38.salmon
[2021-03-07 21:36:45.459] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2021-03-07 21:36:58.980] [puff::index::jointLog] [info] Replaced 100,008 non-ATCG nucleotides
[2021-03-07 21:36:58.980] [puff::index::jointLog] [info] Clipped poly-A tails from 1,977 transcripts
wrote 240285 cleaned references
[2021-03-07 21:37:06.458] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2021-03-07 21:37:10.289] [puff::index::jointLog] [info] ntHll estimated 140914689 distinct k-mers, setting filter size to 2^32
Threads = 2
Vertex length = 31
Hash functions = 5
Filter size = 4294967296
Capacity = 2
Files: 
GRCh38.salmon/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Rou

In [2]:
cd ..
mkdir result/salmon
time salmon quant -i GRCh38/GRCh38.salmon \
                  --libType A \
                  -1 data/SRR2015713_1.fastq \
                  -2 data/SRR2015713_2.fastq \
                  -o result/salmon

Version Info: This is the most recent version of salmon.
### salmon (selective-alignment-based) v1.4.0
### [ program ] => salmon 
### [ command ] => quant 
### [ index ] => { GRCh38/GRCh38.salmon }
### [ libType ] => { A }
### [ mates1 ] => { data/SRR2015713_1.fastq }
### [ mates2 ] => { data/SRR2015713_2.fastq }
### [ output ] => { result/salmon }
Logs will be written to result/salmon/logs
[2021-03-07 22:11:44.750] [jointLog] [info] setting maxHashResizeThreads to 48
[2021-03-07 22:11:44.750] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2021-03-07 22:11:44.750] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2021-03-07 22:11:44.750] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2021-03-07 22:11:44.750] [jointLog] [info] parsing read library format
[2021-03-07 22:11:44.750] [jointLog] [info] There is 1