Build a Custom DB

SLURM scripts

The code below demonstrates the workflow for processing a single sample (F0), with its FASTQ file size of 7.3 GB. You can parallelize the steps or adjust the command-line options as needed for your analysis.

Step 01. Assembly of long reads into contigs (contiguous sequence)

#!/bin/bash
#SBATCH --job-name=metaFlye
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G
#SBATCH --partition=bigmem
#SBATCH --time=24:00:00

flye --nano-hq $FASTQ_FOLDER/F0.fastq.gz --out-dir $ASSEMBLY_FOLDER/F0 --meta --threads 32

Step 02. Polishing (INDEL correction) of the resulting assembly

#! /bin/bash
#SBATCH --job-name=medaka
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00

medaka_consensus -i $FASTQ_FOLDER/F0.fastq.gz -d $ASSEMBLY_FOLDER/F0/assembly.fasta -o $MEDAKA_FOLDER -m r1041_e82_400bps_sup_v5.2.0

Step 03. Binning of contigs into MAGs (Metagenome-Assembled Genomes)

#! /bin/bash
#SBATCH --job-name=semibin2
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --partition=single
#SBATCH --time=24:00:00

SemiBin2 single_easy_bin -i $MEDAKA_FOLDER/F0/consensus.fasta -b $MEDAKA_FOLDER/F0/calls_to_draft.bam -o $BIN_FOLDER/F0 --environment human_gut --sequencing-type long_read -t 2

Step 04. Quality assessment of MAGs

#! /bin/bash
#SBATCH --job-name=checkm2
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --partition=single
#SBATCH --time=24:00:00

export CHECKM2DB="$REF_FOLDER/uniref100.KO.1.dmnd"
checkm2 predict -i $BIN_FOLDER$/F0/output_bins --output_directory $CHECKM2_FOLDER/F0 -x fa.gz --threads 8

Step 05. Taxonomic classification of MAGs

#! /bin/bash
#SBATCH --job-name=gtdbtk
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=124G
#SBATCH --partition=bigmem
#SBATCH --time=12:00:00

gtdbtk classify_wf --genome_dir $BIN_FOLDER/F0/fasta_passed --out_dir $GTDBTK_FOLDER/F0 --extension fa.gz --cpus 12

Step 06. Download FASTA files from GenBank and add a header for Kraken2

# conda install -c conda-forge ncbi-datasets-cli
datasets download genome accession GCA_XXXXX --include genome --filename GCA_XXXXX.zip

Step 07. Create a custom DB using the MAGs (Kraken2)

cd $CUSTOMDB_FOLDER
k2 download-taxonomy --db myDB
k2 add-to-library --file GCA_XXXXX.fasta --db myDB
k2 build --db myDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build a Custom DB

SLURM scripts

Step 01. Assembly of long reads into contigs (contiguous sequence)

Step 02. Polishing (INDEL correction) of the resulting assembly

Uh oh!

Uh oh!

Clone this wiki locally