<a href="https://colab.research.google.com/github/SenseiBassa/Bioinformatics-Projects-HackBio-/blob/main/Trimming_Next_Generation_Sequencing_Reads_Project_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

%%bash
#--------------------
# Trimming Next Generation Sequencing Reads Project
# By: Bassa Joshua Samuel
# Date: 09/09/2025
# Quality control on the BaSh terminal using Bash scripting and other Bioinformatics Softwares such as; FastP
#----------------------------------

#Trimming NGS Reads
##Introduction to Data Trimming:
Genomic data, acquired through high-throughput sequencing technologies, often comes with its own set of imperfections and biases. Raw sequencing reads may harbor artifacts such as adapter contamination, low-quality bases, or sequencing errors. These imperfections can adversely impact downstream analyses, leading to misinterpretations and compromised results. Herein lies the significance of data trimming: the careful removal of unwanted elements to unveil the true biological signals embedded in the genomic data.
##The Problem:
Imagine a scenario where your genomic dataset is akin to a treasure trove buried beneath layers of noise. Sequencing errors, adapter remnants, and low-quality bases act as obstacles, obscuring the genuine biological information encoded in the DNA. If left unaddressed, these artifacts can misguide downstream analyses, leading to inaccurate variant calling, flawed annotations, and ultimately, skewed biological interpretations.
##The Need for Trimming:
Data trimming serves as the initial sieve in the genomics analysis process. By systematically identifying and excising these unwanted elements, we not only improve the accuracy of subsequent analytical steps but also optimize computational resources. Trimming becomes imperative for tasks such as variant calling, where the precision of base calls directly influences the reliability of identified genetic variants. Furthermore, trimming mitigates the risk of false positives, ensuring that downstream analyses are grounded in biologically meaningful patterns rather than noise.
So, let's embark on this journey of enhancing data quality and uncovering the true genomic stories that lie beneath the surface. Welcome to the Data Trimming section of the Genomics Data Analysis Pipeline Course!


IMPLEMENTING FastP

!/bin/bash
mkdir qc_reads
SAMPLES=(
  "ACBarrie"
  "Alsen"
  "Baxter"
  "Chara"
  "Drysdale"
)

for SAMPLE in "${SAMPLES[@]}"; do

  fastp \
    -i "$PWD/${SAMPLE}_R1.fastq.gz" \
    -I "$PWD/${SAMPLE}_R2.fastq.gz" \
    -o "qc_reads/${SAMPLE}_R1.fastq.gz" \
    -O "qc_reads/${SAMPLE}_R2.fastq.gz" \
    --html "qc_reads/${SAMPLE}_fastp.html"
done

In [None]:
#Task: Quality Trimming of Multiple Samples

#Create a directory called qc_reads.
#Define an array of sample names: ACBarrie, Alsen, Baxter, Chara, Drysdale.
#Write a Bash script to iterate over the samples and run fastp for paired-end reads:
#Input: ${SAMPLE}_R1.fastq.gz and ${SAMPLE}_R2.fastq.gz
#Output: qc_reads/${SAMPLE}_R1.fastq.gz and qc_reads/${SAMPLE}_R2.fastq.gz
#Generate an HTML report for each sample in qc_reads/

In [None]:
#!/bin/bash

# Create output directory
mkdir -p qc_reads

# Define sample names
samples=(ACBarrie Alsen Baxter Chara Drysdale)

# Run fastp for each sample
for SAMPLE in "${samples[@]}"; do
    fastp \
        -i ${SAMPLE}_R1.fastq.gz \
        -I ${SAMPLE}_R2.fastq.gz \
        -o qc_reads/${SAMPLE}_R1.fastq.gz \
        -O qc_reads/${SAMPLE}_R2.fastq.gz \
        -h qc_reads/${SAMPLE}_fastp.html \
        -j qc_reads/${SAMPLE}_fastp.json
done

In [12]:
%%bash
# Remove existing miniconda directory if it exists
rm -rf $HOME/miniconda

# Download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda -u

# Initialize conda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
conda init

PREFIX=/root/miniconda
Unpacking bootstrapper...
Unpacking payload...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /root/miniconda
no change     /root/miniconda/condabin/conda
no change     /root/miniconda/bin/conda
no change     /root/miniconda/bin/conda-env
no change     /root/miniconda/bin/activate
no change     /root/miniconda/bin/deactivate
no change     /root/miniconda/etc/profile.d/conda.sh
no change     /root/miniconda/etc/fish/conf.d/conda.fish
no change     /root/miniconda/shell/condabin/Conda.psm1
no change     /root/miniconda/shell/condabin/conda-hook.ps1
no change   

--2025-09-09 16:45:01--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.191.158, 104.16.32.241, 2606:4700::6810:bf9e, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.191.158|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 162129736 (155M) [application/octet-stream]
Saving to: ‘miniconda.sh’

     0K .......... .......... .......... .......... ..........  0% 7.59M 20s
    50K .......... .......... .......... .......... ..........  0% 8.14M 20s
   100K .......... .......... .......... .......... ..........  0% 9.49M 19s
   150K .......... .......... .......... .......... ..........  0% 45.1M 15s
   200K .......... .......... .......... .......... ..........  0%  195M 12s
   250K .......... .......... .......... .......... ..........  0%  206M 10s
   300K .......... .......... .......... .......... ..........  0%  151M 9s
   350K .......... .......... .......... ......

In [15]:
%%bash
# Initialize conda in the current shell session
eval "$($HOME/miniconda/bin/conda shell.bash hook)"

# Accept the terms of service
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Install mamba for faster installation
conda install -c conda-forge mamba -y

# Install fastp using mamba
$HOME/miniconda/bin/mamba install -c bioconda fastp -y

accepted Terms of Service for https://repo.anaconda.com/pkgs/main
accepted Terms of Service for https://repo.anaconda.com/pkgs/r
Jupyter detected...
2 channel Terms of Service accepted
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
Solving environment: | / done

# All requested packages already installed.

bioconda/linux-64                                           Using cache
bioconda/noarch                                             Using cache

Pinned packages:

  - python=3.13


Transaction

  Prefix: /root/miniconda

  All requested packages already installed


Transaction starting

Transaction finished



    
    
