<a href="https://colab.research.google.com/github/SenseiBassa/Bioinformatics-Projects-HackBio-/blob/main/Genome_Mapping_Project_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

%%bash
#--------------------
# Introduction to the Genome Mapping Project
# Task: Task: Map Trimmed Reads to a Reference Genome
# By: Bassa Joshua Samuel
# Date: 10/09/2025
# In brief: Map trimmed paired-end reads to a reference genome by repairing them, aligning with BWA, and converting the results to BAM format with Samtools.

# Tools needed: bwa, samtools, bbmap (repair.sh), fastp.
#----------------------------------



Introduction to the Genome Mapping

As we navigate the vast expanse of genomic data, the next crucial step in our journey is genome mapping—a process that aligns raw sequencing reads to a reference genome. Genome mapping is the compass that guides us through the intricacies of genomic landscapes, allowing us to decipher the genomic coordinates of individual reads.

The Problem:
Imagine your raw sequencing reads as puzzle pieces scattered across a table, each representing a fragment of the biological story encoded in the DNA. The challenge lies in piecing together these fragments accurately, ensuring that they align with the reference genome in a meaningful and biologically relevant manner. Failure to address this alignment challenge can introduce errors, misalignments, and ultimately lead to skewed interpretations of the genomic narrative.

The Need for Genome Mapping:
Genome mapping is the critical bridge that connects raw sequencing reads to the known blueprint of a reference genome. It's analogous to reading a map—a precise and accurate alignment ensures that we correctly identify the genomic location of each read, allowing us to navigate the vast genomic landscape with confidence. A well-executed genome mapping step is foundational for downstream analyses such as variant calling, structural variant identification, and functional annotation.
So, let's embark on this journey of genomic navigation and discover how genome mapping serves as the key to unlocking the wealth of information embedded in our DNA. Welcome to the Genome Mapping section of the Genomics Data Analysis Pipeline Course!


TYPES OF GENOME MAPPING

- Reference Based genome mapping (reference based).
- De Novo based genome mapping (manual)

STEPS FOR REFERENCED BASED GENOME MAPPING.

- Correct for disordered reads with bbtoois repair.sh
- Build the reference genome index with bwa index
- Perform alignment with bwa mem (suitable for reads between 70 - 1000bp in fragment length. bwa is suitable for shorter reads or backtrack algorithm).
- Compress the alignment output with samtools

 Voila, you are done!

In [None]:
# Code for automating BWA implementation
#!/bin/bash

SAMPLES=(
  "ACBarrie"
  "Alsen"
  "Baxter"
  "Chara"
  "Drysdale"
)

bwa index references/reference.fasta
mkdir repaired
mkdir alignment_map

for SAMPLE in "${SAMPLES[@]}"; do

    repair.sh in1="trimmed_reads/${SAMPLE}_R1.fastq.gz" in2="trimmed_reads/${SAMPLE}_R2.fastq.gz" out1="repaired/${SAMPLE}_R1_rep.fastq.gz" out2="repaired/${SAMPLE}_R2_rep.fastq.gz" outsingle="repaired/${SAMPLE}_single.fq"
    echo $PWD
    bwa mem -t 1 \
    references/reference.fasta \
    "repaired/${SAMPLE}_R1_rep.fastq.gz" "repaired/${SAMPLE}_R2_rep.fastq.gz" \
  | samtools view -b \
  > "alignment_map/${SAMPLE}.bam"
done

Task: Map Trimmed Reads to a Reference Genome

Create directories: repaired and alignment_map.

Prepare the reference genome for mapping:
bwa index references/reference.fasta

For a set of paired-end samples (ACBarrie, Alsen, Baxter, Chara, Drysdale), run repair.sh on each sample to repair reads and handle singletons.

Map repaired reads to the reference genome using bwa mem.

Convert the resulting SAM output to BAM format using samtools view -b.

Save each BAM file in alignment_map/ named after the corresponding sample.

In [None]:
#!/bin/bash

# ================================
# Install Required Tools (Linux)
# ================================
# Ensure conda or apt is available. Uncomment the appropriate section.

# Using conda (recommended):
# conda install -c bioconda bwa samtools bbmap fastp

# OR using apt (if available on system):
# sudo apt-get update
# sudo apt-get install -y bwa samtools bbmap fastp

# ================================
# Mapping Workflow
# ================================

# Create output directories
mkdir -p repaired alignment_map

# Index the reference genome
bwa index references/reference.fasta

# Define sample names
samples=(ACBarrie Alsen Baxter Chara Drysdale)

# Process each sample
for SAMPLE in "${samples[@]}"; do
    # Repair paired-end reads
    repair.sh \
        in1=qc_reads/${SAMPLE}_R1.fastq.gz \
        in2=qc_reads/${SAMPLE}_R2.fastq.gz \
        out1=repaired/${SAMPLE}_R1.fastq.gz \
        out2=repaired/${SAMPLE}_R2.fastq.gz \
        outs=repaired/${SAMPLE}_singletons.fastq.gz

    # Map repaired reads and convert SAM to BAM
    bwa mem references/reference.fasta \
        repaired/${SAMPLE}_R1.fastq.gz \
        repaired/${SAMPLE}_R2.fastq.gz | \
    samtools view -b -o alignment_map/${SAMPLE}.bam
done
