# RNA Sequencing and Differential Abundance Analysis Pipeline for Huntington's Disease (GSE270472)

## 1. Rationale Behind Parameter Choices
 ------------------------------------------------------
- FastQC: Quality control to check GC content, per-base quality, and adapter contamination.
- BBDUK 2: Used for read trimming (Q ≥ 20) to remove low-quality bases and adapters.
- Bowtie2: Aligns reads to the GRCh38 reference genome with `-X 1000` (max insert size) and `--un-conc` (output unaligned reads).
- RSEM: Used for gene expression quantification with TPM values.
- DESeq2: Used for differential expression analysis with the design formula `~ condition + batch`.
- GSEA & gProfiler2: Used for gene ontology enrichment and pathway analysis.

## 2. RNA-Seq Pipeline (Trimming, Alignment, Quantification)
 ---------------------------------------------------------
 - Read Trimming: BBDUK 2 removes adapters and low-quality sequences.
 - Read Alignment: Bowtie2 aligns sequences to GRCh38.
 - Read Quantification: RSEM estimates transcript abundance.

## 3. Differential Abundance Analysis (DESeq2, GSEA, gProfiler2)
 --------------------------------------------------------------
 - DESeq2: Used with FDR < 0.05 to identify DEGs.
 - GSEA: Gene set enrichment to detect significant pathways.
 - gProfiler2: Functional enrichment analysis with GO terms.

## 4. Biological Questions Addressed
 ------------------------------------
 - How does HTT loss-of-function contribute to HD pathogenesis?
 - Which key pathways and biological processes are altered?
 - What are the differentially expressed genes between HTT knockdown and control groups?

## Limitations and Biases
 -------------------------
 - Batch effects in sequencing runs may introduce variability.
 - Incomplete rRNA removal could impact downstream analysis.
 - Read depth variability affects the detection of low-expressed genes.

print("RNA-Seq pipeline setup complete.")

Shell Script for Nextflow Execution

In [2]:
#!/bin/bash
nextflow run main.nf --reads 'path/to/reads/*_R{1,2}.fastq.gz' 

SyntaxError: invalid syntax (3503914526.py, line 2)

Nextflow Pipeline (main.nf)

In [None]:
params.reads = 'path/to/reads/*_R{1,2}.fastq.gz'
params.genome = 'path/to/genome.fa'
params.outdir = 'path/to/output/'

process FastQC {
    input:
    path reads
    output:
    path 'fastqc/'
    script:
    """
    fastqc -o fastqc/ $reads
    """
}

process rRNAMapping {
    input:
    path reads
    output:
    path 'rRNA_unmapped/'
    script:
    """
    bowtie2 -x path/to/rRNA_index -1 ${reads[0]} -2 ${reads[1]} --un-conc rRNA_unmapped/ -S /dev/null
    """
}

process RSEMQuantification {
    input:
    path reads
    output:
    path 'rsem/'
    script:
    """
    rsem-calculate-expression --paired-end --bowtie2 --estimate-rspd --append-names --output-genome-bam ${reads[0]} ${reads[1]} $params.genome rsem/output
    """
}

workflow {
    reads = Channel.fromPath(params.reads)
    fastqc = FastQC(reads)
    rRNA_unmapped = rRNAMapping(fastqc.out)
    rsem = RSEMQuantification(rRNA_unmapped.out)
}