# **EPI2ME *wf-metagenomics* Workflow: A Training Explanation**

## **Introduction to Metagenomics Analysis**

In this section, we will explore the *wf-metagenomics* workflow, an EPI2ME Labs pipeline designed for analyzing metagenomic data generated by Oxford Nanopore sequencing. Metagenomics is the study of the collective genetic material from a community of microorganisms. This workflow is valuable for:

* **Taxonomic Classification:** Identifying and quantifying the various microorganisms present in a sample (e.g., bacteria, archaea, fungi, viruses).  
* **Understanding Microbial Community Composition:** Gaining insights into the diversity, abundance, and relative proportions of different organisms within a complex biological sample.  
* **Antimicrobial Resistance (AMR) Gene Detection:** Identifying potential AMR genes within the metagenomic data, which can be crucial for understanding public health implications and environmental impacts.

## **Workflow Overview**

The *wf-metagenomics* workflow provides a comprehensive analysis of metagenomic data. Here's an outline of the key stages:

1. **Input Data:**  
   * The workflow accepts sequencing data in FASTQ or BAM format, which are standard formats for storing raw sequencing reads.  
   * The input data represents the mixed genetic material from the microbial community under investigation.  
2. **Taxonomic Classification:**  
   * The core function of this workflow is to determine the taxonomic composition of the sample. It employs two main approaches:  
     * **Kraken 2:** This method assigns taxonomic labels to DNA sequences by examining their k-mers (short DNA sequences of length k) and comparing them against a database of known k-mers. It's known for its speed and accuracy in identifying the likely source organism of a given sequence.  
     * **Minimap2:** This method aligns the sequencing reads against a reference database of microbial genomes. By determining the best match for each read, Minimap2 can infer the organism it originated from.  
   * Users can choose between these two methods based on their specific needs and the characteristics of their data.  
3. **Antimicrobial Resistance (AMR) Detection (Optional):**  
   * The workflow can also identify genes associated with antimicrobial resistance within the metagenomic data.  
   * This is typically done by comparing the sequencing reads against databases of known AMR genes.  
   * Detecting AMR genes provides insights into the potential presence of drug-resistant microbes in the sample.  
4. **Output and Reporting:**  
   * The workflow generates a comprehensive report summarizing the metagenomic analysis. This report typically includes:  
     * Taxonomic profiles: Interactive visualizations and tables showing the abundance and distribution of different taxa in the sample.  
     * AMR gene identification: If enabled, a list of detected AMR genes and their associated organisms.  
     * Quality control metrics: Statistics about the sequencing data, such as read lengths and quality scores.


## **Key Concepts and Tools**

* **Metagenomics:** The study of the collective genetic material from a community of microorganisms.  
* **Taxonomic Classification:** The process of assigning organisms to their respective taxonomic groups (e.g., phylum, class, order, family, genus, species) based on their genetic sequences.  
* **Kraken 2:** A fast and accurate k-mer-based taxonomic sequence classification tool.  
* **Minimap2:** A versatile sequence alignment program used here for mapping reads to reference genomes for taxonomic classification.  
* **Antimicrobial Resistance (AMR):** The ability of microorganisms to resist the effects of antimicrobial drugs.  
* **FASTQ/BAM:** Standard file formats for storing sequencing data.  
* **Nextflow:** A workflow management system used to define and execute the *wf-metagenomics* pipeline.

## **Workflow Benefits**

* **Comprehensive Metagenomic Analysis:** The workflow provides a complete solution for analyzing metagenomic data, from taxonomic classification to AMR gene detection.  
* **User-Friendly:** The workflow is designed to be relatively easy to use, with clear options and parameters.  
* **Reproducible Results:** The use of Nextflow ensures that the workflow produces consistent and reproducible results.  
* **Flexible:** The workflow can handle various types of metagenomic data and allows users to choose between different taxonomic classification methods.  
* **Scalable:** The workflow can be run on different computing platforms, from individual workstations to high-performance computing clusters.

## **Running the Workflow**

The *wf-metagenomics* workflow is typically executed using the Nextflow command-line tool. EPI2ME provides detailed instructions and examples on how to install and run the workflow, including specifying input data, setting parameters, and managing the analysis. We will cover the specific commands and parameters in the practical session.

In [None]:
%%bash
cd ~
~/nextflow run epi2me-labs/wf-metagenomics --help

**Leveraging the JupyterLab Terminal:**

For users working within a JupyterLab environment (such as on Vertex AI), the terminal provides a convenient way to execute Nextflow commands. Here's how you can run the workflow:

1. **Open a Terminal:** In JupyterLab, navigate to "File" \> "New" \> "Terminal". This will open a new terminal window within your JupyterLab interface.  
2. **Execute the Command:** You can directly copy and paste the Nextflow command into the terminal. For example, to run the workflow with the provided parameters, use the following:  

````
~/nextflow run epi2me-labs/wf-metagenomics \
    --fastq "${HOME}/dsc-epi2me-data/wf-metagenomics-demo/test_data" \
    --out_dir "${HOME}metagenomics-demo_output" \
    -profile standard
```` 

3. **Monitor Execution:** The workflow will begin to execute, and you will see the progress, any error messages, and the final results directly in the terminal.
4. **View the Results:**
    * Once the workflow has completed, the output files will be located in the `metagenomis-demo_output` directory.
    * Locate the report HTML file (`wf-metagenomics.html`) within this directory. This file contains a comprehensive summary of the workflow results.
    * Open the HTML file.  JupyterLab may prompt you to "Trust HTML" at the top left of the file.  If so, click "Trust HTML" to ensure that the report renders correctly and all elements are displayed.

### **Learning Outcomes**

By the end of this section, you should be able to:

* Understand the purpose and applications of the *wf-metagenomics* workflow.  
* Describe the key steps involved in metagenomic data analysis, including taxonomic classification and AMR gene detection.  
* Identify the main tools and concepts used in the workflow.  
* Appreciate the importance of metagenomics in studying microbial communities and its applications in various fields.

EPI2ME provides detailed instructions and examples on how to install and run the workflow, including specifying input data, setting parameters, and managing the analysis. We will cover the specific commands and parameters in the practical session.