The **GATK** was developed to provide tools for variant analysis. Therefore, it does not perform the alignment of a FASTQ file; for this, we need to use the **BWA** tool on the FASTQ file that we have.

The **Burrows-Wheeler Aligner** is a tool that can be installed on **Ubuntu**. To check if it is already installed, simply type:

```bash
bwa


If it’s not installed and we want to install it, we can use the following commands:

```bash
sudo apt update
sudo apt install bwa


**Tip:** It’s generally a good idea to perform everything via command lines in the **Linux shell**. However, if your file is on **Windows**, such as in a folder on the desktop, you can transfer this file to the Linux partition or even make a copy — all interactively, without necessarily using command lines.

To access Linux folders from Windows, simply open any Windows folder and type the following in the address bar:

```plaintext
\\wsl$

You will be directed to the Ubuntu folder, where you can find all the directories you have on Linux. This way, you can visually and interactively locate the folder where you want to place the FASTQ file.

From here, we can return to **Ubuntu**, switch to the command line, and navigate to the directory where our **FASTQ file** is located. Another important note is that to use **BWA**, we need a reference genome in the **.fasta** format.

To download one of the main reference genomes, you can use the following command:

```bash
wget ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz


With the reference file and the **FASTQ** file in the same folder (where we should also be positioned), we can proceed with the first **BWA** command.

Before we apply **BWA**, however, we need to perform what’s called **"indexing the reference genome"**. This involves running a command that instructs **BWA** to create multiple smaller files from the reference file. This indexing process significantly speeds up the alignment, as it allows the tool to work with smaller segments of the file at a time instead of scanning through the entire file. 

For each reference genome, you only need to apply the following command once:

```bash
bwa index Reference_File_Name


<span style="color:red;">**ATTENTION:**</span> <span style="color:red;">This process can take a while, depending on your computer's performance. It also generates additional large files, between **5–7 GB**, so be prepared for the extra storage requirements.</span>


# Report

## Background

While trying to apply indexing on my computer, I encountered a memory issue. Although I had plenty of memory available, I still lacked sufficient RAM (I have 8 GB), causing the process to stop. To resolve this, we decided to transfer the file to a server, even though it was a server without a GPU or graphics card, as it has substantial RAM.

## Solution Approach

To avoid disrupting other server users, I was advised to execute everything within a Docker container. Consequently, we contacted the server administrator to install Docker on the server and grant me the necessary permissions to make installations.

## Implementation Steps

Once Docker was set up, the process became feasible. The first step was to install the required Docker image. We chose the `pegi3s/bwa` image using the following command (executed on the server):

```bash
docker pull pegi3s/bwa


# Verification of Installation

After completing the installation, we can verify if it was successful by running the following command:

```bash
docker run --rm pegi3s/bwa bwa

# Important Configuration Details

At the time of performing this procedure, **BWA** was configured with the following specifications:

- **Program**: bwa (alignment via Burrows-Wheeler transformation)
- **Version**: 0.7.17-r1188
- **Contact**: Heng Li (<lh3@sanger.ac.uk>)

## Indexing Command

Now, we can proceed with the indexing step using the following command:

```bash
docker run --rm -v "/home/murilo.aguiar/NGS_Files:/data" pegi3s/bwa bwa index /data/GRCh38.fa

# Generated Files from Indexing

The following files were created during the indexing process:

- **GRCh38.fa.amb**
- **GRCh38.fa.ann**
- **GRCh38.fa.bwt**
- **GRCh38.fa.pac**
- **GRCh38.fa.sa**

These files are essential outputs of the indexing step and will be used in subsequent analysis stages.