# RNA Sequencing and Differential Abundance Analysis Pipeline for Huntington's Disease (GSE270472)

## 1. Rationale Behind Parameter Choices
# ------------------------------------------------------
- FastQC: Quality control to check GC content, per-base quality, and adapter contamination.
- BBDUK 2: Used for read trimming (Q ≥ 20) to remove low-quality bases and adapters.
- Bowtie2: Aligns reads to the GRCh38 reference genome with `-X 1000` (max insert size) and `--un-conc` (output unaligned reads).
- RSEM: Used for gene expression quantification with TPM values.
- DESeq2: Used for differential expression analysis with the design formula `~ condition + batch`.
- GSEA & gProfiler2: Used for gene ontology enrichment and pathway analysis.

2. RNA-Seq Pipeline (Trimming, Alignment, Quantification)
 ---------------------------------------------------------
 - Read Trimming: BBDUK 2 removes adapters and low-quality sequences.
 - Read Alignment: Bowtie2 aligns sequences to GRCh38.
 - Read Quantification: RSEM estimates transcript abundance.

## 3. Differential Abundance Analysis (DESeq2, GSEA, gProfiler2)
 --------------------------------------------------------------
 - DESeq2: Used with FDR < 0.05 to identify DEGs.
 - GSEA: Gene set enrichment to detect significant pathways.
 - gProfiler2: Functional enrichment analysis with GO terms.

## 4. Biological Questions Addressed
 ------------------------------------
 - How does HTT loss-of-function contribute to HD pathogenesis?
 - Which key pathways and biological processes are altered?
 - What are the differentially expressed genes between HTT knockdown and control groups?

## Limitations and Biases
 -------------------------
 - Batch effects in sequencing runs may introduce variability.
 - Incomplete rRNA removal could impact downstream analysis.
 - Read depth variability affects the detection of low-expressed genes.

print("RNA-Seq pipeline setup complete.")