# Convertion ubam to fastq (Samtools 1.13)

>***Why do we need ubam to fastq conversion ?***

Once basecalling and modified basecalling were performed, we wanted to have a fastq format for adapters trimming, therefore we converted ubam output from dorado basecalling to fastq format using Samtools.

In [None]:
#!/bin/bash 

# Create directories to stock merged fastq
mkdir -p /bigvol/omion/01-Basecalling/merged_fastq/{Dorado,Guppy}/{modbasecalling,basecalling}

for i in /bigvol/omion/01-Basecalling/Dorado/*/Gd*/*.bam; do
    if [ -f "$i" ]; then
        sample=$(basename "$i")
        directory=$(dirname "$i")
        gd_part=$(basename "$directory")

        # Extract the basecalling type (modbasecalling or basecalling)
        basecalling_type=$(echo "$i" | awk -F'/' '{print $6}')

        # Create the output directory path
        output_dir="/bigvol/omion/01-Basecalling/merged_fastq/Dorado/${basecalling_type}"

        # Run samtools to convert BAM to FASTQ and save the output in the corresponding directory with the correct filename
        samtools fastq "$i" > "${output_dir}/${gd_part}.fastq"
    fi
done


Note:
Fastq files of the same isolates from Guppy were merged using a cat command.

Merged fastq from Guppy were then mooved to the folder:  
/bigvol/omion/01-Basecalling/merged_fastq/Guppy/basecalling  (for classical basecalling)
/bigvol/omion/01-Basecalling/merged_fastq/Guppy/modbasecalling  (for basecalling of modified bases)


# Porechop

Porechop is the software we used to trim our adapters. It take a single concatenated fastq file as input. Dorado basecalling will trimmed by defaut the adapters except if you specify --no-trim option, this option was not specified. Despite this Porechop was still performed on reads basecall with Dorado, as some adapters were not properly removed, ( it was also performed on reads basecalls by Guppy).

In [None]:
#!/bin/bash 

# Create directories to stock trimmed fastq
mkdir -p /bigvol/omion/03-Porechop/{Dorado,Guppy}/{modbasecalling,basecalling}

for i in /bigvol/omion/01-Basecalling/merged_fastq/*/*/*.fastq; do
    if [ -f "$i" ]; then
        sample=$(basename "$i")
        directory=$(dirname "$i")
        gd_part=$(basename "$sample" .fastq)

        # Extract the basecalling method (Dorado or Guppy) and type (modbasecalling or basecalling)
        basecalling_method=$(echo "$directory" | awk -F'/' '{print $(NF-1)}')
        basecalling_type=$(echo "$directory" | awk -F'/' '{print $NF}')

        # Create the output directory path
        output_dir="/bigvol/omion/03-Porechop/${basecalling_method}/${basecalling_type}"

        # Run porechop and save the output in the corresponding directory 
        porechop -t 80 -i "$i" -o "${output_dir}/${gd_part}.fastq"
    fi
done
