# Whole Genome Metagenomic Sequencing Data Analysis

- 16 Freshwater sponge metaDNA samples extracted using AllPrep DNA/RNA kit
- DNB-Seq by BGI Poland, at least 12 GB data per samples, with 95% of data over Q20
- Data basecalled, adapter trimmed, low quality read removed by BGI 
- Reads were assembled to contig by David
- 


### Bioinformatic anaylsis steps

0. QC: for cleaned reads (from BGI) and Error-corrected reads 
    - Script: DNB_1_1_QC.sh
    - fastQC, multiQC for general quality metrics

1. Assembly (by David)
    - Scripts in folder: metaWGS_DNB_Seq/metaWGS_assem_David
    - Quality improvement: PhiX174 removal, adapter trimming (trimmomatic), error-correction (bbmap tadpole.sh)
    - Contig assembly (metaSPADes)

2. Contig Assembly and gene prediction 
    1. File renaming
        1. 1. Contig file renaming 
            - Script: DNB_2_1_ctg_rename.sh
        1. 2. Error-corrected file renaming 
            - Script: DNB_2_1_EC_reads_rename.sh
    2. Contig filtration by 500 bp length
        - Script: DNB_2_2_FILTER_MIN_500BP.sh
        - Perl script: SL_filter_contigs_on_size.pl
    3. Contig ID renaming 
        - Script: DNB_2_3_RENAME_contigs.sh
        - Perl script: SL_rename_fasta_id.pl

    4. Contig Classification  
        - Script: DNB_2_4_DeepMicroClass.sh
        - DeepMicroClass (v1.0.3)
        4. 1. Extraction of prokaryotic contigs
            - Script: DNB_2_4_1_Extract_DMC_PROKS.sh

    5. Prokaryotic gene prediction
        - Script: DNB_2_5_Prodigal_gene_prediction.sh

3. MAG Assembly
    1. Mapping 
        1. 1. Mapping for coverage information 
            - Script: DNB_3_1_BamSamCov_before_MAG.sh
                - BBmap (version: 39.42) for mapping and sorting
                - metabat2 (version: 2.18) for coverage information
            - Output files: .bam, .sorted.bam, coverage.txt 
                - File location: /gxfs_work/geomar/smomw681/BGI_META_WGS_DATA/BGI_DATA_SUNWOO/MAPPING
        1. 2. Generate prokaryotic contigs fasta file mapped to error corrected reads
            - Script: DNB_3_2_BBmap_proks_ctg_mapping.sh
                - Bbmap
            - Output: .EC_Proks_mapped.fq.gz
        1. 3. QC for mapped prokaryotic contigs files
            - 
    
    2. Binning 

### Samples
- Metagenomic freshwater sponge DNA samples from Sp2 and Sp3 from SPZO from May-August

### Conda Environment Setup


In [None]:
# Assembly
    conda create --name Assembly -y 
    conda activate Assembly    
    conda install conda-forge::libgcc-ng=14.2.0     # for the current version of spades and bbmap
    conda install bioconda::spades=4.0.0 -y   #or v4.1.0
    conda install bioconda::trimmomatic=0.39 -y  
    conda update trimmomatic # 0.40
    conda install bioconda::bbmap # v39.52   # updated 08.01.26
    conda install bioconda::samtools # 1.23     # updated 08.01.26
    conda install conda-forge::pigz=2.8 -y
    conda install bioconda::filtlong #0.3.1

# MAG prep and after
    conda create -n MAG
    conda activate MAG
    #conda install bioconda::deepmicroclass=1.0.3 -y 
    conda install python=3.12.9 -y
    pip install DeepMicroClass # without pytorch: use CPU only
    pip install requests #v2.32.3
    conda install bioconda::checkm-genome -y #=1.2.3
    conda install bioconda::prodigal -y #=2.6.3
    conda install bioconda::pprodigal -y #=1.0.1
    conda install bioconda::gtdbtk -y #=2.4.0
    # db for gtdbtk also downloaded: run script 3_0 
    conda install bioconda/label/cf201901::mash # v2.3

# MAG_construction and check
    conda create -n METABAT2
    conda activate METABAT2
    conda install bioconda::metabat2=2.18   # updated 08.01.26
    # metabat2 must be installed anew according to the description on the official github site due to negative coverage problem
    # https://bitbucket.org/berkeleylab/metabat/src/master/INSTALL.md
    module load gcc12-env/12.3.0
    module load gcc/12.3.0
    module load boost/1.83.0
    module load cmake/3.27.4
    conda install bioconda::checkm2=1.1.0
    # checkm2 database in /gxfs_work/geomar/smomw681/.conda/envs/METABAT2/checkm_data/
    # or download it to a custom directory and set as environmental variable
        checkm2 database --download --path  /gxfs_work/geomar/smomw681/DATABASES/CheckM_db/
        export CHECKM2DB="/gxfs_work/geomar/smomw681/DATABASES/CheckM_db/CheckM2_database"
        conda env config vars set CHECKM2DB="/gxfs_work/geomar/smomw681/DATABASES/CheckM_db/CheckM2_database"
    conda install bioconda::coverm=0.7.0
    conda install bioconda::drep # 
    # install dependencies of drep
    conda install bioconda/label/cf201901::mash # v2.1
    module load boost/1.83.0
    module load gcc/12.3.0
    module load gsl/2.7.1
    conda install bioconda::centrifuge -y # v1.0.4.2
    # conda install bioconda/label/cf201901::fastani -y # v1.1
        # But couldn't solve the dependency problem of fastANI. conflicting with gsllib
        # made a separate environment because of dependency problem