-
Notifications
You must be signed in to change notification settings - Fork 0
Build a Custom DB
Heewon Seo edited this page Dec 12, 2025
·
1 revision
The code below demonstrates the workflow for processing a single sample (F0), with its FASTQ file size of 7.3 GB. You can parallelize the steps or adjust the command-line options as needed for your analysis.
#!/bin/bash
#SBATCH --job-name=metaFlye
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G
#SBATCH --partition=bigmem
#SBATCH --time=24:00:00
flye --nano-hq $FASTQ_FOLDER/F0.fastq.gz --out-dir $ASSEMBLY_FOLDER/F0 --meta --threads 32#! /bin/bash
#SBATCH --job-name=medaka
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00
medaka_consensus -i $FASTQ_FOLDER/F0.fastq.gz -d $ASSEMBLY_FOLDER/F0/assembly.fasta -o $MEDAKA_FOLDER -m r1041_e82_400bps_sup_v5.2.0Step 03. Binning of contigs into MAGs (Metagenome-Assembled Genomes)
#! /bin/bash
#SBATCH --job-name=semibin2
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --partition=single
#SBATCH --time=24:00:00
SemiBin2 single_easy_bin -i $MEDAKA_FOLDER/F0/consensus.fasta -b $MEDAKA_FOLDER/F0/calls_to_draft.bam -o $BIN_FOLDER/F0 --environment human_gut --sequencing-type long_read -t 2Step 04. Quality assessment of MAGs
#! /bin/bash
#SBATCH --job-name=checkm2
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00
export CHECKM2DB="$REF_FOLDER/uniref100.KO.1.dmnd"
checkm2 predict -i $BIN_FOLDER$/F0/output_bins --output_directory $CHECKM2_FOLDER/F0 -x fa.gz --threads 8Step 05. Taxonomic classification of MAGs
#! /bin/bash
#SBATCH --job-name=gtdbtk
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=124G
#SBATCH --partition=bigmem
#SBATCH --time=12:00:00
gtdbtk classify_wf --genome_dir $BIN_FOLDER/F0/fasta_passed --out_dir $GTDBTK_FOLDER/F0 --extension fa.gz --cpus 12Step 06. Download FASTA files from GenBank and add a header for Kraken2
# conda install -c conda-forge ncbi-datasets-cli
datasets download genome accession GCA_XXXXX --include genome --filename GCA_XXXXX.zipStep 07. Create a custom DB using the MAGs (Kraken2)
cd $CUSTOMDB_FOLDER
k2 download-taxonomy --db myDB
k2 add-to-library --file GCA_XXXXX.fasta --db myDB
k2 build --db myDB