# 🧬 Genomic Data Processing Commands: Bowtie, BWA, SAMtools, and BCFtools

This document provides a categorized and well-documented collection of shell commands used in typical genomic data analysis workflows. Tools covered include **Bowtie2**, **BWA**, **SAMtools**, and **BCFtools** — for tasks such as indexing reference genomes, aligning sequencing reads, converting file formats, generating pileups, and performing variant calling.

Each command is annotated for clarity, making it easier for researchers and students to understand and adapt them to their own projects.

> 📁 Suitable for organizing and running pipelines in a bioinformatics project directory.



---

### **📦 Indexing Reference Genomes**

```bash
# Create a Bowtie2 index for HPV reference genome
bowtie2-build HPV_all.fasta hpv/hpv

# Create a BWA index for HPV reference genome
bwa index HPV_all.fasta
```
<br />

### **📁 Exploring Files and Indexes**

```bash
ls hpv       # List the Bowtie2 index files generated
ls -lt       # List files sorted by modification time
```
<br />

### **🔍 Estimating Number of Reads in a FASTQ File**

```bash
(wc -l exome.fastq) / 4  # Divide total lines by 4 to get number of sequences
```
<br />

### **🎯 Aligning Reads using Bowtie2**

```bash
# Run Bowtie2 and redirect all output (stdout & stderr) to a log file
bowtie2 >& bowtie2.log

# Global alignment of exome reads to human genome using 4 threads
bowtie2 -p 4 -x /data1/igm3/gemones exome.fastq -S exome.bt2.sam

# Local alignment (allows partial matches per read — more sensitive)
bowtie2 -p 4 --local -x /data1/igm3/gemones exome.fastq -S exome.bt2.sam
```
<br />

### **🧬 Aligning Reads using BWA**

```bash
# Map exon reads to human genome using BWA with 4 threads
bwa mem -t 4 /data1/igm3/genomes.fa exome.fastq > exome.bwa.sam

# View SAM output file contents
more exome.bwa.sam
```
<br />

### **🔄 Converting SAM to BAM with SAMtools**

```bash
# Convert Bowtie2 SAM to BAM
samtools view -bT /data1/igm/gemoes exome.bt2.sam > exome.bt2.bam

# Convert BWA SAM to BAM
samtools view -bT /data1/igm3/genomes.fa exome.bwa.sam > exome.bwa.bam

# View BAM file
samtools view exome.bwa.bam
```
<br />

### **📊 Indexing and Inspecting BAM Files**

```bash
samtools index sample.bam       # Create BAM index (.bai)
samtools flagstat sample.bam    # Display mapping statistics
```
<br />

### **🧪 Generating Pileups with SAMtools**

```bash
# Basic mpileup (requires sorted & indexed BAM)
samtools mpileup -f /data1/igm3/gemones.fasta sample.bam > sample.mpileup

# Mpileup with variant calling output in VCF format
samtools mpileup -v -u -f /data1/igm3/gemones.fasta sample.bam > sample.vcf

# Mpileup with output in BCF format
samtools mpileup -g -f /data1/igm3/gemones.fasta sample.bam > sample.bcf
```
<br />

### **🧪 Variant Calling with BCFtools**

```bash
# View BCF file contents
bcftools view sample.bcf

# Call variants and output compressed VCF (bgzipped)
bcftools call -v -m -O z -o sample.vcf.gz sample.bcf
```
<br />

### **🔍 Exploring VCF Output**

```bash
zcat sample.vcf.gz                        # View compressed VCF file
zcat sample.vcf.gz | grep -v "^#"        # View only called variants (skip headers)
zcat sample.vcf.gz | grep -v "^#" | wc -l  # Count number of variant entries

cat out.full.mpileup.vcf | grep –v "^#" | cut –f1 | grep –c "^Chr3" # count the number of entries located on Chr3
cat out.full.mpileup.vcf | grep –v "^#" | cut –f4 | grep –P "^A$" # count the entries that have 'A' as a corresponding genome letter
cat out.full.mpileup.vcf | grep –v "^#" | grep –c "DP=20;" # count the entries that have exactly 20 supporting reads (read depth)
cat out.full.mpileup.vcf | grep –v "^#" | grep –c INDEL # count the entries that represent indels
cat out.full.mpileup.vcf | grep –v "^#" | cut –f1,2 | grep Chr1 | grep 175672 # count the entries reported for position 175672 on Chr 1
cat out.final.vcf | grep –v "^#" | cut -f1-5 | grep Chr3 | grep 11937923 # check for type of variant called at position 11937923 on Chr3
```
<br />

### **🧪 Full Variant Calling Pipeline**

```bash
ls sample.bam                            # Check BAM file presence
samtools index sample.bam                # Sort & index BAM
samtools mpileup -g -f /data1/igm3/gemones.fasta sample.bam > sample.bcf
bcftools call -v -m -O z -o sample.vcf.gz sample.bcf
zcat sample.vcf.gz                       # View VCF output (1 variant per line)
```
<br />

### **👀 Visualizing with `tview`**

```bash
# Terminal-based visual viewer; -d T opens in a TUI viewer, remove to view inline
samtools tview -p 17:7579600 -d T sample.bam /data1/igm3/genomes.fasta
```
<br />

