## FISH546 Differential Gene Expression Analysis Project

## Purpose

I am using RNA-seq data taken from Sea Cucumbers (*Apostichopus japonicus*) that were treated under 2 different temperatures (26°C & 30°C). The purpose being to conduct DGE analysis to determine the biological responses that heat stress induces on this organism. Data was obtained from the NIH website, done by researchers in Xu et al., at Qingdao Agricultural University (2023).

## Background

As climate change develops, water temperature rising becomes an increasing threat to marine life (Lindsey & Dahlman, 2023). Researchers have found 30°C to be a lethal temperature for the marine invertebrate *Apostichopus japonicus* (SOURCE). In the research study by Xu et al., 2023, they cite aquaculture of sea cucumbers as a motive to test the effect that heat stress has on gene expression. The focus of Xu et al., is to analyze the effects that heat stress has on heat-respondent proteins.

## Methods

The design experiment consisted of 3 controls maintained at 18°C. Six sub-lethal temperature treatment groups (26°C) and three lethal temperature treatment groups (30°C). The treatment groups went through a temperature-rise process from 18°C to 26°C or to 30°C respectively, with a rate of 2°C per hour by using a heating rod. The 26°C treatment group was able to be kept at that temperature for 6 hours and 48 hours, creating two treatment groups within the 26°C treatment group (6 hrs vs 48 hrs). The 30°C treatment group was only kept at 30°C for 6 hours. The researchers site full mortality when they did try to create a 48 hr group within this temperature treatment. Intestine tissue was obtained for RNA-Seq analysis.

## Code

Broken in 4 parts.

## Part 1: Obtain data & conduct quality check (QC)

FastQ files were obtained from NCBI. Accession code being: PRJNA848687. More information about the data can be found [here](https://www.ncbi.nlm.nih.gov/bioproject/848687). Go [here](https://www.ncbi.nlm.nih.gov/biosample?LinkName=bioproject_biosample_all&from_uid=848687) if you want to access information about the individual files (12 total).

The quality check was done using FastQC:


```{bash}
#| echo: true
/home/shared/8TB_HDD_02/hannia/SeaCucumber/FastQC/fastqc \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*.fastq \
-o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output
```


Results: HTML results of QC did not show presence of outliers that needed to be removed. Refer to \~SeaCucumber/output_fastqc for the html files.

## Part 2: Pseudo-alignment

The reference genome of *Apostichopus japonicus* for the pseudo-alignment was obtained from NCBI and can be accessed [here](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_037975245.1/). The **NCBI RefSeq assembly ID is:** GCF_037975245.1.

Pseudo-aligment was done using Kallisto. An index was created first (1). Then the pseudo-alignment was completed using Kallisto quant for paired-end reads (2).

1.  


```{bash}
#| echo: true
/home/shared/kallisto/kallisto index \
-i /home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/GCF_037975245.1_ref/ncbi_dataset/data/GCF_037975245.1/rna.fna
```


2 .


```{bash}
#| echo: true
find /home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*_1.fastq \
| xargs -n1 basename \
| sed 's/_1\.fastq$//' \
| xargs -I{} /home/shared/kallisto/kallisto quant \
-i /home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx \
-o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/{} \
-t 40 \
--paired \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/{}_1.fastq \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/{}_2.fastq
```


## Part 3: Creating a gene expresison matrix 


```{bash}
#| echo: true
perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
  --est_method kallisto \
  --gene_trans_map none \
  --out_prefix /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01 \
  --name_sample_by_basedir \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635628/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635629/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635630/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635631/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635632/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635633/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635634/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635635/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635636/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635637/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635638/abundance.tsv \
 /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635639/abundance.tsv
```


## Part 4: DESeq2 for DGE analysis 


```{bash}
#| echo: true


```


## Results so far (week 8)

## Plan for next 2 weeks

## References

Dahlman, L., & Lindsey, R. (2023). Climate change: Ocean heat content. NOAA Climate.gov. https://www.climate.gov/news-features/understanding-climate/climate-change-ocean-heat-content.