 # Commands for CLI Environment

Make sure you are using a Linux enironment. These commands are specifically designed for the CLI environment. i.e. **Ubuntu**

## Install Necessary Tools  
Before running the pipeline, install all required dependencies.


In [None]:
# Update system packages
!sudo apt update

# Install required tools
!sudo apt install sra-toolkit trimmomatic hisat2 samtools stringtie subread


## Download RNA-Seq Dataset
Fetch sequencing data from SRA using its unique accession number.


In [None]:
# Download RNA-Seq data
!prefetch SRR30802871

# Convert to FASTQ format and split paired-end reads
!fastq-dump SRR30802871 --split-files

#It will create two fastq files i.e. SRR30802871_1.fastq and SRR30802871_2.fastq

## Trim Low-Quality Reads  
Remove adapter sequences and low-quality bases using Trimmomatic.


In [None]:
# Run Trimmomatic for quality trimming
!java -jar trimmomatic-0.39.jar PE \
    SRR30802871_1.fastq SRR30802871_2.fastq \
    output_1-paired.fastq output_1-unpaired.fastq \
    output_2-paired.fastq output_2-unpaired.fastq \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36


## Download and Prepare Reference Genome  
Obtain the human reference genome and index it for alignment.


In [None]:
# Download the reference genome. As I worked on Human so I will download Human Reference Genome.
!wget https://ftp.ensembl.org/pub/release-113/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# Unzip the genome file
!gunzip Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# Create genome index
!hisat2-build -p 8 Homo_sapiens.GRCh38.dna.toplevel.fa index


## Align Reads to Reference Genome  
Map the trimmed reads to the reference genome using HISAT2.


In [None]:
# Align reads and generate SAM file
!hisat2 -x index -1 output_1-paired.fastq -2 output_2-paired.fastq -S aligned_output.sam -p 8

## Convert and Sort BAM Files  
Convert the SAM file to BAM format and sort it for downstream analysis.


In [None]:
# Convert SAM to BAM format
!samtools view -bS aligned_output.sam > aligned_output.bam

# Sort BAM file
!samtools sort -o sorted_output.bam aligned_output.bam


## Download Annotation File  
Obtain the gene annotation file (GTF format) for transcriptome assembly.


In [None]:
# Download GTF annotation file. As I worked on Human so I will download Human Annotated Genome.
!wget ftp://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz

# Unzip the annotation file
!gunzip Homo_sapiens.GRCh38.110.gtf.gz


## Assemble Transcripts  
Use StringTie to assemble transcripts from aligned reads.


In [None]:
# Run transcriptome assembly
!stringtie sorted_output.bam -G Homo_sapiens.GRCh38.110.gtf -o assembled_transcripts.gtf -p 8


## Merge Transcript Assemblies  
Combine multiple transcript assemblies for a unified dataset.


In [None]:
# Merge assembled transcripts
!stringtie --merge -G Homo_sapiens.GRCh38.110.gtf -o merged_transcripts.gtf assembled_transcripts.gtf


## Generate Gene Count Matrix  
Use FeatureCounts to count the number of reads mapped to each gene.


In [None]:
# Create count matrix for gene expression analysis
!featureCounts -p -a merged_transcripts.gtf -o count_matrix.txt sorted_output.bam

#At the end It will give us the text file of count matrix that contains the data of gene expression.
