# Bioinformatic analysis for Nanopore 16S Amplicon Sequencing 

In [None]:
# set up the conda environment named DORADO 1.3.0
    module load gcc12-env/12.3.0
    module load miniconda3/24.11.1
    conda create -n DORADO_1.3.0
    conda activate DORADO_1.3.0
    cd /gxfs_work/geomar/smomw681/DORADO
    wget -r "https://cdn.oxfordnanoportal.com/software/analysis/dorado-1.3.0-linux-x64.tar.gz"
    tar -xzvf cdn.oxfordnanoportal.com/software/analysis/dorado-1.3.0-linux-x64.tar.gz
    dorado-1.3.0-linux-x64/bin/dorado --version
    echo 'export PATH="/gxfs_work/geomar/smomw681/DORADO/dorado-1.3.0-linux-x64/bin:$PATH"' >> ~/.bashrc
    source ~/.bashrc

# Download the model required
wget -O /gxfs_work/geomar/smomw681/DORADO/dna_r10.4.1_e8.2_400bps_hac@v5.2.0.zip -r "https://cdn.oxfordnanoportal.com/software/analysis/dorado/dna_r10.4.1_e8.2_400bps_hac@v5.2.0.zip" 
unzip /gxfs_work/geomar/smomw681/DORADO/dna_r10.4.1_e8.2_400bps_hac@v5.2.0.zip



## Bioinformatic analysis process

1. **Basecalling, demultiplexing** and **adapter trimming** using Dorado
    - Dorado v.1.3.0
    - Script
        - Basecalling and demultiplexing: N16S_Run1_Dorado_BSC_Demux.sh, N16S_Run2_Dorado_BSC_Demux.sh
        - Adapter Trimming: N16S_2_Trimming.sh

2. **Renaming** the fastq files and the barcodes
    - Script: N16S_2_1_ID_replacement.sh
    - File location: /gxfs_work/geomar/smomw681/NANOPORE_DATA/DEMULTIPLEXED/NANOPORE_16S_RENAMED
        - File name: $Sample_ID_renamed.fastq
        - ID: $Sample_ID_(count)

3. Data **filtering**: 
    - chopper v0.12.0
    - Script: N16S_3_1_chopper.sh
    - Kept 2838529 reads out of 3451719 reads ()
    - Output: /gxfs_work/geomar/smomw681/NANOPORE_DATA/DEMULTIPLEXED/NANOPORE_16S_FILTERED
    - All trimmed and filtered files should be **concatenated** in one file: 
        - Location: /gxfs_work/geomar/smomw681/NANOPORE_DATA/CONCATENATED/N16S_Seq_concat_filt.fastq

4. Raw and filtered data **QC**: fastQC and multiQC
    - fastQC 0.12.1
    - multiQC 1.32
    - Script: N16S_3_2_QC.sh
    - Output location: 
        - Raw: /gxfs_work/geomar/smomw681/NANOPORE_DATA/QC_NANOPORE/QC_NANOPORE_16S_RAW
        - Filtered: /gxfs_work/geomar/smomw681/NANOPORE_DATA/QC_NANOPORE/QC_NANOPORE_16S_FILTERED


5. **16S rRNA identification**
    - NanoCLUST
        - Script:
    - EMU:
        - Script: N16S_ID/N16S_4_1_2_EMU.sh
        - Output: 
            - Location:/gxfs_work/geomar/smomw681/NANOPORE_DATA/EMU/EMU_N16S_ind
            - Files:
                1. {sampleID}_renamed_filt_read-assignment-distributions.tsv
                    - SeqID-TaxID matrix with read assignment probability
                2. **{sampleID}_renamed_filt_rel-abundance.tsv**
                    - Output of main interest
                    - Relative abundance on species level in concatenated file (>0.01)
                3. {sampleID}_renamed_filt_rel-abundance-threshold-0.0001.tsv
                    - Relative abundance on species level in concatenated file (>0.0001)
            - Concatenated into one file for the downstream analysis, visualization etc. 
                - Script: N16S_4_2_4_EMU_conca.sh
                - Added sample ID to the last column 
                - File: N16S_EMU_rel_abundance_concat.tsv
        - Visualization with R
            - 


## Sample specification and experiment information
- Sample specification: Sample name (containing Sampling date, type, location and spot), short version 
    - without year (2025) and location as P1(SPZO1), P2(SPZO2), P3(POST)
- Experiment information: barcode and batch 
    - Additional experiments with the DNA sample or corresponding RNA sample 
        - 16S: 16S rRNA V3/V4 sequencing on Illumina platform
        - metaWGS: metagenomic whole genome sequencing on DNB-Seq platform
        - metaRNA: metatranscriptomic sequencing on DNB-Seq platform 

- Format
    Barcode.    short sample name   Other experiments

###
### Sequencing Run 1
in N16S_Run1_barcode_sample_table.txt
1. 0506_P1_FW_2_1   16S
2. 0526_P1_FW_2_1   16S
3. 0612_P1_FW_2_1   16S
4. 0626_P1_FW_2_1   16S
5. 0710_P1_FW_2_1   16S
6. 0724_P1_FW_2_1   16S
7. 0804_P1_FW_2_1   16S
8. 0820_P1_FW_2_1   16S

9. 0506_sp2_1   16S, metaWGS, metaRNA
10. 0526_sp2_2   16S, metaWGS, metaRNA
11. 0612_sp2_2   16S, metaWGS, metaRNA
12. 0626_sp2_1   16S, metaWGS, metaRNA
13. 0710_sp2_2   16S, metaWGS, metaRNA
14. 0724_sp2_1   16S, metaWGS, metaRNA
15. 0804_sp2_2_2   16S, metaWGS, metaRNA
16. 0820_sp2_2   16S, metaWGS, metaRNA

17. 0506_sp3_2   16S, metaWGS, metaRNA
18. 0526_sp3_2   16S, metaWGS, metaRNA
19. 0612_sp3_2   16S, metaWGS, metaRNA
20. 0626_sp3_2   16S, metaWGS, metaRNA
21. 0710_sp3_3_2   16S, metaWGS, metaRNA
22. 0724_sp3_3   16S, metaWGS, metaRNA
23. 0804_sp3_3   16S, metaWGS, metaRNA
24. 0820_sp3_1   16S, metaWGS, metaRNA

### Sequencing Run 2



## Samp