Aidan Coyle, afcoyle@uw.edu
Roberts Lab, UW-SAFS
2021-02-02

After an initial analysis, realized that my libraries were incorrect. I had pooled day 0 and day 2 libraries together by temperature treatment, but day 0 hemolymph was extracted prior to any temperature treatments. Here are the changes between this analysis and the previous one:

1. Both individual and pooled libraries are examined, rather than solely individual libraries
2. A balanced sample design is utilized, where an equal number of libraries from each treatment will be examined
3. Day 0 and Day 2 libraries will not be pooled together by temperature treament, as discussed above
4. Day 2 and Day 17 libraries will be pooled by temperature treatment
5. The day comparison (Day 0 vs 17) will be dropped (for now at least)
6. When downloading transcripts, we will check IDs with checksums (failed to last time, meaning we must rebuild kallisto indices
7. As much as possible will be done remotely on the lab's Roadrunner computer, rather than on a local machine. This means that commands will largely be copied and pasted from the command line, rather than ran directly in this Jupyter notebook.




Library IDs are as follows. Asterisks label Day 0 crabs that were part of either the elevated or lowered treatment groups - since at Day 0, they had not yet been exposed to changes away from ambient temperature, they are included as part of the ambient treatment group:

| Crab ID    | Library ID | Day| Temperature |
|-------------|----------------|-------------|----------|
| G        | 272             |   2          |   Elevated       |
| H        | 294             |   2          |   Elevated       |
| I        | 280             |   2          |   Elevated       |
|pooled    | 380825          |   2          |   Elevated       |
| G*       | 173*            |   0*         |   Ambient*       |
| H*       | 72*             |   0*         |   Ambient*       |
| I*       | 127*            |   0*         |   Ambient*       |
| A        | 178             |   0          |   Ambient        |
| A        | 359             |   2          |   Ambient        |
| A        | 463             |   17         |   Ambient        |
| B        | 118             |   0          |   Ambient        |
| B        | 349             |   2          |   Ambient        |
| B        | 481             |   17         |   Ambient        |
| C        | 132             |   0          |   Ambient        |
| C        | 334             |   2          |   Ambient        |
| C        | 485             |   17         |   Ambient        |
| E*       | 151*            |   0*         |   Ambient*       |
| pooled   | 380820          |   2          |   Ambient        |
| E        | 254             |   2          |   Lowered        |
| E        | 445             |   17         |   Lowered        |
| pooled   | 380823          |   2          |   Lowered        |


Trimmed individual libraries were downloaded from Gannet, available [here](https://gannet.fish.washington.edu/Atumefaciens/20200318_cbai_RNAseq_fastp_trimming/), at 22:00 PST on 2021-02-02 

Trimmed pooled libraries were downloaded from Gannet, available [here](https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/), at 24:00 PST on 2021-02-02


Transcriptomes used are cbai_transcriptome_v3.0.fasta and cbai_transcriptome_v2.0.fasta, available [here](https://owl.fish.washington.edu/halfshell/genomic-databank/). Both transcriptomes have not been filtered to exclude hematodinium sequences. Transcriptome checksums are available [here](https://github.com/RobertsLab/resources/wiki/Genomic-Resources)

Transcriptomes were downloaded at 01:00 PST on 2021-02-03

Plan to create indices using both transcriptome v3.0 and v2.0

## Download individual libraries

In [None]:
# Download all files in directory
!wget --no-check-certificate --no-parent --recursive --reject "index.html" https://gannet.fish.washington.edu/Atumefaciens/20200318_cbai_RNAseq_f
astp_trimming/

In [None]:
# Remove all files that aren't .fq.gz or .md5
!rm *.html
!rm *.zip
!rm index.html*
!rm *.json
!rm *.sh
!rm *.log
!rm *.out
!rm *.txt
!rm -r multiqc*

In [None]:
# Move files from data/gannet.fish.washington.edu/Atumefaciens/20200318_cbai_RNAseq_fastp_trimming into data/libraries
!cd ..
!mv 20200318_cbai_RNAseq_fastp_trimming/* ../../libraries
# Delete old directory
!cd ../..
!rm -r gannet.fish.washington.edu

In [None]:
# remove all uninfected libraries, as they won't be part of analysis
!rm 113_R*
!rm 221_R*
!rm 222_R*
!rm 425_R*
!rm 427_R*
!rm 73_R*

In [None]:
# Rename checksum file to clarify it is specific to individual libraries
!mv trimmed_fastq_checksums.md5 trimmed_indivfastq_checksums.md5

In [None]:
# Check that files downloaded properly with checksums
!md5sum -c trimmed_indivfastq_checksums.md5

## Download pooled libraries


In [None]:
# Move up a directory to keep download simpler
!cd ..

In [None]:
# Download all files in directory
!wget --no-check-certificate --no-parent --recursive --reject "index.html" https://gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming/

In [None]:
# Move into our new file structure
cd gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming

In [None]:
# Remove all files that aren't .fq.gz or .md5
!rm *.html
!rm *.zip
!rm index.html*
!rm *.json
!rm *.log
!rm *.out
!rm *.txt
!rm -r multiqc*

Interestingly, this library has 2 checksum files - 20200413_cbai_checkums.md5 (not typo - it is checkums) and trimmed_fastq_checksums.md5
Ran diff, and it appears 20200... is a checksum file for the untrimmed fastq files, and can thus be safely removed.
We will also rename the trimmed_fastq_checksums.md5 file to clarify it is specific to pooled libraries

In [None]:
!rm 20200413_cbai_checkums.md5
!mv trimmed_fastq_checksums.md5 trimmed_pooledfastq_checksums.md5

In [None]:
# Remove all uninfected libraries, as they won't be part of analysis
!rm 380820_*
!rm 380822_*
!rm 380824_*

In [None]:
# Check that files downloaded properly with checksums
!md5sum -c trimmed_pooledfastq_checksums.md5

In [None]:
# Move files from data/gannet.fish.washington.edu/Atumefaciens/20200414_cbai_RNAseq_fastp_trimming into data/libraries
!cd ..
!mv 20200414_cbai_RNAseq_fastp_trimming/* ../../libraries
# Delete old directory
!cd ../..
!rm -r gannet.fish.washington.edu
cd libraries

In [None]:
# Merge libraries by lanes, removing un-merged files
!cat 380821_S2_L001_R1_001.fastp-trim.202004143925.fq.gz 380821_S2_L002_R1_001.fastp-trim.202004144145.fq.gz > 380821_S2_R1_001.fastp-trim.fq.gz
!cat 380821_S2_L001_R2_001.fastp-trim.202004143925.fq.gz 380821_S2_L002_R2_001.fastp-trim.202004144145.fq.gz > 380821_S2_R2_001.fastp-trim.fq.gz
!rm 380821_S2_L00*
!cat 380823_S4_L001_R1_001.fastp-trim.202004144852.fq.gz 380823_S4_L002_R1_001.fastp-trim.202004145106.fq.gz > 380823_S4_R1_001.fastp-trim.fq.gz
!cat 380823_S4_L001_R2_001.fastp-trim.202004144852.fq.gz 380823_S4_L002_R2_001.fastp-trim.202004145106.fq.gz > 380823_S4_R2_001.fastp-trim.fq.gz
!rm 380823_S4_L00*
!cat 380825_S6_L001_R1_001.fastp-trim.202004145835.fq.gz 380825_S6_L002_R1_001.fastp-trim.202004140109.fq.gz > 380825_S6_R1_001.fastp-trim.fq.gz
!cat 380825_S6_L001_R2_001.fastp-trim.202004145835.fq.gz 380825_S6_L002_R2_001.fastp-trim.202004140109.fq.gz > 380825_S6_R2_001.fastp-trim.fq.gz
!rm 380825_S6_L00*

## Download transcriptomes
Again, downloading transcriptome v2.0 and v3.0. Both are unfiltered by taxonomic group and include genes from both C. bairdi and Hematodinium.

In [None]:
!cd transcriptomes
# Download transcriptome 2.0
!curl -O -k https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v2.0.fasta
# Download transcriptome 3.0
!curl -O -k https://owl.fish.washington.edu/halfshell/genomic-databank/cbai_transcriptome_v3.0.fasta
# Use checksums available at https://github.com/RobertsLab/resources/wiki/Genomic-Resources
# Transcriptome 2.0 checksum: 01adbd54298495c147767b19ee5c0de9
!
# Matches
# Transcriptome 3.0 checksum: 5516789cbad5fa9009c3566003557875
!
# Matches

## Create an index for kallisto
Warning: if running on local machine, could take days. Ran in much shorter time on Roadrunner.

In [None]:
!cd ../../output/kallisto_indices
# Index for transcriptome 2.0
!kallisto index -i kallisto_bairdihemat_index_v2.0.idx ../../data/transcriptomes/cbai_transcriptome_v2.0.fasta
# Index for transcriptome 3.0
!kallisto index -i kallisto_bairdihemat_index_v3.0.idx ../../data/transcriptomes/cbai_transcriptome_v3.0.fasta

## Run kallisto quantification for all libraries for Transcriptome 2.0, starting with individual libraries
Order is same as table above, with all pooled libraries examined last

In [None]:
# Quantify ID 272
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id272 \
../../data/libraries/272_R1_001.fastp-trim.202003184536.fq.gz \
../../data/libraries/272_R2_001.fastp-trim.202003184536.fq.gz

In [None]:
# Quantify ID 294
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id294 \
../../data/libraries/294_R1_001.fastp-trim.202003180701.fq.gz \
../../data/libraries/294_R2_001.fastp-trim.202003180701.fq.gz

In [None]:
# Quantify ID 280
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id280 \
../../data/libraries/280_R1_001.fastp-trim.202003185124.fq.gz \
../../data/libraries/280_R2_001.fastp-trim.202003185124.fq.gz

In [None]:
# Quantify ID 173
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id173 \
../../data/libraries/173_R1_001.fastp-trim.202003181159.fq.gz \
../../data/libraries/173_R2_001.fastp-trim.202003181159.fq.gz

In [None]:
# Quantify ID 072
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id072 \
../../data/libraries/72_R1_001.fastp-trim.202003181709.fq.gz \
../../data/libraries/72_R2_001.fastp-trim.202003181709.fq.gz

In [None]:
# Quantify ID 127
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id127 \
../../data/libraries/127_R1_001.fastp-trim.202003185538.fq.gz \
../../data/libraries/127_R2_001.fastp-trim.202003185538.fq.gz

In [None]:
# Quantify ID 178
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id178 \
../../data/libraries/178_R1_001.fastp-trim.202003181815.fq.gz \
../../data/libraries/178_R2_001.fastp-trim.202003181815.fq.gz

In [None]:
# Quantify ID 359
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id359 \
../../data/libraries/359_R1_001.fastp-trim.202003182247.fq.gz \
../../data/libraries/359_R2_001.fastp-trim.202003182247.fq.gz

In [None]:
# Quantify ID 463
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id463 \
../../data/libraries/463_R1_001.fastp-trim.202003185732.fq.gz \
../../data/libraries/463_R2_001.fastp-trim.202003185732.fq.gz

In [None]:
# Quantify ID 118
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id118 \
../../data/libraries/118_R1_001.fastp-trim.202003184931.fq.gz \
../../data/libraries/118_R2_001.fastp-trim.202003184931.fq.gz

In [None]:
# Quantify ID 349
# Realized at this point I should be writing out std error - started here
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id349 \
../../data/libraries/349_R1_001.fastp-trim.202003181609.fq.gz \
../../data/libraries/349_R2_001.fastp-trim.202003181609.fq.gz \
2> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 481
# Started appending std error to file
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id481 \
../../data/libraries/481_R1_001.fastp-trim.202003180047.fq.gz \
../../data/libraries/481_R2_001.fastp-trim.202003180047.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 132
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id132 \
../../data/libraries/132_R1_001.fastp-trim.202003180140.fq.gz \
../../data/libraries/132_R2_001.fastp-trim.202003180140.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 334
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id334 \
../../data/libraries/334_R1_001.fastp-trim.202003181149.fq.gz \
../../data/libraries/334_R2_001.fastp-trim.202003181149.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 485
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id485 \
../../data/libraries/485_R1_001.fastp-trim.202003181245.fq.gz \
../../data/libraries/485_R2_001.fastp-trim.202003181245.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 151
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id151 \
../../data/libraries/151_R1_001.fastp-trim.202003180619.fq.gz \
../../data/libraries/151_R2_001.fastp-trim.202003180619.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 254
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id254 \
../../data/libraries/254_R1_001.fastp-trim.202003184228.fq.gz \
../../data/libraries/254_R2_001.fastp-trim.202003184228.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify ID 445
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id445 \
../../data/libraries/445_R1_001.fastp-trim.202003185018.fq.gz \
../../data/libraries/445_R2_001.fastp-trim.202003185018.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

Continue with building libraries for pooled samples

In [None]:
# Quantify 380821
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id380821 \
../../data/libraries/380821_S2_R1_001.fastp-trim.fq.gz \
../../data/libraries/380821_S2_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify 380823
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id380823 \
../../data/libraries/380823_S4_R1_001.fastp-trim.fq.gz \
../../data/libraries/380823_S4_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

In [None]:
# Quantify 380825
!kallisto quant \
-i kallisto_bairdihemat_index_v2.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev2.0/id380825 \
../../data/libraries/380825_S6_R1_001.fastp-trim.fq.gz \
../../data/libraries/380825_S6_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev2.0/std_errortracking.txt

## Continue with kallisto counts for transcriptome 3.0
Order has changed - starting with library where we began tracking std error. If 3.0 is clearly worse than 2.0, we will stop running kallisto on libraries with transcriptome 3.0

In [None]:
# Quantify ID 349
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id349 \
../../data/libraries/349_R1_001.fastp-trim.202003181609.fq.gz \
../../data/libraries/349_R2_001.fastp-trim.202003181609.fq.gz \
2> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 481
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id481 \
../../data/libraries/481_R1_001.fastp-trim.202003180047.fq.gz \
../../data/libraries/481_R2_001.fastp-trim.202003180047.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 132
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id132 \
../../data/libraries/132_R1_001.fastp-trim.202003180140.fq.gz \
../../data/libraries/132_R2_001.fastp-trim.202003180140.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 334
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id334 \
../../data/libraries/334_R1_001.fastp-trim.202003181149.fq.gz \
../../data/libraries/334_R2_001.fastp-trim.202003181149.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 485
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id485 \
../../data/libraries/485_R1_001.fastp-trim.202003181245.fq.gz \
../../data/libraries/485_R2_001.fastp-trim.202003181245.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 151
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id151 \
../../data/libraries/151_R1_001.fastp-trim.202003180619.fq.gz \
../../data/libraries/151_R2_001.fastp-trim.202003180619.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 254
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id254 \
../../data/libraries/254_R1_001.fastp-trim.202003184228.fq.gz \
../../data/libraries/254_R2_001.fastp-trim.202003184228.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 445
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id445 \
../../data/libraries/445_R1_001.fastp-trim.202003185018.fq.gz \
../../data/libraries/445_R2_001.fastp-trim.202003185018.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

Continue with building libraries for pooled samples

In [None]:
# Quantify 380821
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id380821 \
../../data/libraries/380821_S2_R1_001.fastp-trim.fq.gz \
../../data/libraries/380821_S2_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify 380823
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id380823 \
../../data/libraries/380823_S4_R1_001.fastp-trim.fq.gz \
../../data/libraries/380823_S4_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify 380825
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id380825 \
../../data/libraries/380825_S6_R1_001.fastp-trim.fq.gz \
../../data/libraries/380825_S6_R2_001.fastp-trim.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 272
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id272 \
../../data/libraries/272_R1_001.fastp-trim.202003184536.fq.gz \
../../data/libraries/272_R2_001.fastp-trim.202003184536.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 294
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id294 \
../../data/libraries/294_R1_001.fastp-trim.202003180701.fq.gz \
../../data/libraries/294_R2_001.fastp-trim.202003180701.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 280
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id280 \
../../data/libraries/280_R1_001.fastp-trim.202003185124.fq.gz \
../../data/libraries/280_R2_001.fastp-trim.202003185124.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 173
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id173 \
../../data/libraries/173_R1_001.fastp-trim.202003181159.fq.gz \
../../data/libraries/173_R2_001.fastp-trim.202003181159.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 072
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id072 \
../../data/libraries/72_R1_001.fastp-trim.202003181709.fq.gz \
../../data/libraries/72_R2_001.fastp-trim.202003181709.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 127
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id127 \
../../data/libraries/127_R1_001.fastp-trim.202003185538.fq.gz \
../../data/libraries/127_R2_001.fastp-trim.202003185538.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 178
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id178 \
../../data/libraries/178_R1_001.fastp-trim.202003181815.fq.gz \
../../data/libraries/178_R2_001.fastp-trim.202003181815.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 359
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id359 \
../../data/libraries/359_R1_001.fastp-trim.202003182247.fq.gz \
../../data/libraries/359_R2_001.fastp-trim.202003182247.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 463
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id463 \
../../data/libraries/463_R1_001.fastp-trim.202003185732.fq.gz \
../../data/libraries/463_R2_001.fastp-trim.202003185732.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

In [None]:
# Quantify ID 118
!kallisto quant \
-i kallisto_bairdihemat_index_v3.0.idx \
-o ../kallisto_libraries_bairdihemat_transcriptomev3.0/id118 \
../../data/libraries/118_R1_001.fastp-trim.202003184931.fq.gz \
../../data/libraries/118_R2_001.fastp-trim.202003184931.fq.gz \
2>> ../kallisto_libraries_bairdihemat_transcriptomev3.0/std_errortracking.txt

## End of Kallisto quantification
## Begin building transcript expression matrix. Using transcriptome 2.0
Although 3.0 mapped more reads and had a longer fragment length for most libraries, those differences weren't too large. Therefore, we chose to use transcriptome 2.0, since it originated from all individual and pooled libraries, whereas 3.0 originated only from pooled libraries

Now working on local machine. Build matrix to compare Day 0/2 ambient-temperature crabs vs. Day 2 elevated-temperature crabs

In [1]:
!pwd

/mnt/c/Users/acoyl/Documents/GitHub/hemat_bairdii_transcriptome/scripts


In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id178/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id118/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id132/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id272/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id294/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id280/abundance.tsv

In [5]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/elev2_vs_amb02_indiv_only

Build another matrix to compare all ambient-temperature crabs (including Day 0 elevated/decreased-treatment crabs, because Day 0 was prior to any treatments

In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id178/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id463/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id118/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id481/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id132/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id485/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id151/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id173/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id072/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id127/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id380821/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id272/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id294/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id280/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id380825/abundance.tsv

In [8]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/amb0217_elev0_low0_vs_elev2

Build another matrix to compare infected individual Elevated libraries from Day 0 with individual Elevated libraries from Day 2. Reminder: Day 0 samples were taken when all crabs were held at ambient-temperature waters, and the same crabs were sampled on Day 0 and Day 2.

Effectively, this compares the same infected crab prior to exposure to elevated temps and post-exposure.

In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id173/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id072/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id127/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id272/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id294/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id280/abundance.tsv

In [2]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/elev0_vs_elev2_indiv/

Next, we will build three separate matrices to compare ambient-temperature crab.

1. Ambient Day 0 vs. Ambient Day 2 (indiv. libraries only)
2. Ambient Day 0 vs. Ambient Day 17 (indiv. libraries only)
3. Ambient Day 2 vs. Ambient Day 17 (indiv. libraries only)

Ambient Day 0 vs. Ambient Day 2

In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id178/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id118/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id132/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv

In [4]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/amb0_vs_amb2_indiv/

Ambient Day 0 vs. Ambient Day 17

In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id178/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id118/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id132/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id463/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id481/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id485/abundance.tsv

In [6]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/amb0_vs_amb17_indiv/

Ambient Day 2 vs. Ambient Day 17

In [None]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id463/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id481/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id485/abundance.tsv

In [4]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/amb2_vs_amb17_indiv/

Ambient Day 2 vs. Elevated Day 2, individual libraries only

In [5]:
!../../../GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
--gene_trans_map 'none' \
--out_prefix kallisto \
--name_sample_by_basedir \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id272/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id294/abundance.tsv \
../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id280/abundance.tsv

-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id359/abundance.tsv
-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id349/abundance.tsv
-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id334/abundance.tsv
-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id272/abundance.tsv
-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id294/abundance.tsv
-reading file: ../output/kallisto_libraries_bairdihemat_transcriptomev2.0/id280/abundance.tsv


* Outputting combined matrix.

/mnt/c/Users/acoyl/Documents/GradSchool/RobertsLab/Tools/Trinity/trinityrnaseq-v2.11.0/util/support_scripts/run_TMM_scale_matrix.pl --matrix kallisto.isoform.TPM.not_cross_norm > kallisto.isoform.TMM.EXPR.matrixCMD: R --no-save --no-restore --no-site-file --no-init-file -q < kallisto.isoform.TPM.not_cross_norm.runTMM.R 1>&2 
/mnt/c/Users/acoyl/Downloads/anaconda3/lib/R/bin/exec/R: error while loading

In [6]:
# Since the script outputs files in the working directory without an 
# option to change output dir, move all output over manually
!mv kallisto.isoform.* ../output/kallisto_matrices/amb2_vs_elev2_indiv/

This completes our kallisto analysis and matrix creation. Move to the R file 02_kallisto_to_deseq_to_accessionIDs to begin differential gene expression analysis using DESeq2