Skip to content

Build a Custom DB

Heewon Seo edited this page Dec 12, 2025 · 1 revision

SLURM scripts

The code below demonstrates the workflow for processing a single sample (F0), with its FASTQ file size of 7.3 GB. You can parallelize the steps or adjust the command-line options as needed for your analysis.

Step 01. Assembly of long reads into contigs (contiguous sequence)

#!/bin/bash
#SBATCH --job-name=metaFlye
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G
#SBATCH --partition=bigmem
#SBATCH --time=24:00:00

flye --nano-hq $FASTQ_FOLDER/F0.fastq.gz --out-dir $ASSEMBLY_FOLDER/F0 --meta --threads 32

Step 02. Polishing (INDEL correction) of the resulting assembly

#! /bin/bash
#SBATCH --job-name=medaka
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00

medaka_consensus -i $FASTQ_FOLDER/F0.fastq.gz -d $ASSEMBLY_FOLDER/F0/assembly.fasta -o $MEDAKA_FOLDER -m r1041_e82_400bps_sup_v5.2.0

Step 03. Binning of contigs into MAGs (Metagenome-Assembled Genomes)

#! /bin/bash
#SBATCH --job-name=semibin2
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --partition=single
#SBATCH --time=24:00:00

SemiBin2 single_easy_bin -i $MEDAKA_FOLDER/F0/consensus.fasta -b $MEDAKA_FOLDER/F0/calls_to_draft.bam -o $BIN_FOLDER/F0 --environment human_gut --sequencing-type long_read -t 2

Step 04. Quality assessment of MAGs

#! /bin/bash
#SBATCH --job-name=checkm2
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00

export CHECKM2DB="$REF_FOLDER/uniref100.KO.1.dmnd"
checkm2 predict -i $BIN_FOLDER$/F0/output_bins --output_directory $CHECKM2_FOLDER/F0 -x fa.gz --threads 8

Step 05. Taxonomic classification of MAGs

#! /bin/bash
#SBATCH --job-name=gtdbtk
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=124G
#SBATCH --partition=bigmem
#SBATCH --time=12:00:00

gtdbtk classify_wf --genome_dir $BIN_FOLDER/F0/fasta_passed --out_dir $GTDBTK_FOLDER/F0 --extension fa.gz --cpus 12

Step 06. Download FASTA files from GenBank and add a header for Kraken2

# conda install -c conda-forge ncbi-datasets-cli
datasets download genome accession GCA_XXXXX --include genome --filename GCA_XXXXX.zip

Step 07. Create a custom DB using the MAGs (Kraken2)

cd $CUSTOMDB_FOLDER
k2 download-taxonomy --db myDB
k2 add-to-library --file GCA_XXXXX.fasta --db myDB
k2 build --db myDB

Clone this wiki locally