# Module 15: Workflows/pipelines GPS and GBS

## Overview


*GPS Unified Pipeline* es un Nextflow Pipeline para el procesamiento de lecturas brutas de secuenciación de *Streptococcus pneumoniae* (archivos FASTQ) por el Proyecto GPS ([Global Pneumococcal Sequencing Project](https://www.pneumogen.net/gps/)) is available via this link: https://github.com/HarryHung/gps-unified-pipeline.

*GBS Typer Pipeline* eis a Nextflow pipeline for characterising serotype, MLST, AMR and surface proteins from Streptococcus agalactiae sequences (FASTQ files) by the [JUNO Project](https://www.gbsgen.net/) and is available [here](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf). The pipeline takes an input of trimmed and QCed paired-end Streptoccocus agalactiae reads, processes them in parallel through these "workflows" and combines the results to create a main report (and MLST and AMR gene allele FASTA files, if applicable).

![Intro](./images/gbs.png)

>  In this module we will work with the GBS Typer Pipeline.

## Details of each workflow:

1. MLST with SRST2

Downloads the MLST database for *Streptococcus agalactiae* and uses SRST2 to do MLST 

2.  Serotyping with SRST2

Downloads the GBS serotype [database](https://github.com/swainechen/GBS-SBG) and uses SRST2 to identify serotypes (in a similar way to MLST)

3. Surface protein typing with SRST2

Uses a custom-made surface gene [database](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf/tree/main/db/0.2.1/GBS_Surface_Gene-DB)  and uses SRST2 to identify surface proteins (in a similar way to MLST)

4. Resistance typing with SRST2

Uses a custom-made GBS antimicrobial resistance gene [database](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf/tree/main/db/0.2.1/GBS_resTyper_Gene-DB) dand ResFinder and uses SRST2 to identify AMR genes

5. Variant detection with freebayes

Uses [freebayes](https://github.com/freebayes/freebayes) to generate consensus MLST/antimicrobial resistance alleles (in FASTA format) based on imperfect alignments from SRST2 (Not part of the main report)

## Running the pipeline

>Running the pipeline requires an internet connection

>Currently it supports only paired-end reads

>This pipeline can only be executed on a computer as it requires Docker and Nextflow.

The steps explained in this tutorial are designed to be executed on a computer, as Google Colab cannot be configured for this purpose. **Please DO NOT EXECUTE** the following steps. These are only a guide for you to follow later on your computer if you have a Linux operating system.

In the folder **Module 15**, you will find the results obtained from a previous execution on a computer. We will explore what the output files look like.

### Installing tools

#### 1. Install Docker  

Depending on your operating system (`Linux`, `Windows`, `Mac`), install Docker by following the detailed instructions available [here](https://docs.docker.com/desktop/?_gl=1*9l1hj6*_gcl_au*MTc0ODc1MTcxLjE3MjY3NzU0NjM.*_ga*ODEwNzA5MjcxLjE3MjAxOTQ5MDc.*_ga_XJWPQMJYHQ*MTczMjcyMTgzMS4xMC4xLjE3MzI3MjI4NDguNjAuMC4w).  

#### 2. Install Nextflow

In [None]:
# Do not execute

curl -s https://get.nextflow.io | bash

In [None]:
# Do not execute
# Change the permissions of the nextflow file and make it executable

chmod +x nextflow

In [None]:
# Do not execute
# Move Nextflow to a directory that is in the PATH

mkdir -p $HOME/.local/bin/
mv nextflow $HOME/.local/bin/

In [None]:
# Do not execute
# Confirm that Nextflow is installed

nextflow info

#### 3. Clone the Pipeline repository

In [None]:
# Do not execute

git clone https://github.com/sanger-pathogens/GBS-Typer-sanger-nf.git

### Usage

Go to the GBS-Typer-sanger-nf directory:

In [None]:
# Do not execute

cd GBS-Typer-sanger-nf

Run with the sample ERR1795461 used in Module 6: Genome Assembly. This will generate reports in a new directory called "results".

In [None]:
# Do not execute

nextflow run main.nf --reads '/Module_15/GBS-Typer-sanger-nf/ERR1795461_{1,2}.fastq.gz' --results_dir results

### From here, you can run the commands!

### Download data

In [None]:
!wget https://zenodo.org/records/14231070/files/Module_15.tar.gz

### Extract the .tar.gz file 

In [None]:
!tar xvf Module_15.tar.gz

Go to the results folder, where you can find the files generated after running the GBS-Typer pipeline.

In [None]:
%cd Module_15/GBS-Typer-sanger-nf/results

List the files in the folder. You will notice the following files:

- drug_cat_alleles_variants.txt
- ERR1795461_new_mlst_alleles.fasta
- ERR1795461_new_mlst_pileup.txt
- gbs_res_variants.txt
- gbs_typer_report.txt
- new_mlst_alleles.log
- resfinder_accessions.txt
- serotype_res_incidence.txt
- surface_protein_incidence.txt
- surface_protein_variants.txt

Download the file "gbs_typer_report.txt" to your computer and open it using Excel or another spreadsheet tool.

This file includes the serotype, MLST type, MLST allelic frequencies, resistance gene incidence, surface protein types, and GBS-specific resistance variants. You can find the description of each column in the report [here](https://docs.google.com/spreadsheets/d/1R5FFvACC3a6KCKkTiluhTj492-4cCe74HcCoklqX-X0/edit?gid=0#gid=0), where the category column is in_silico_analysis.

[Here](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf?tab=readme-ov-file#outputs)  you will find how to interpret the other files that were generated.

### More information

- Running the PBP typing and allele detection workflow is described [here](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf)

- Advanced options on [changing parameters](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf)

- Examples of other Nextflow pipelines hosted by [nf-core](https://nf-co.re/)

- Resources for building your own Nextflow pipelines:
    - Tutorial: https://training.seqera.io/ 

    - Reference: https://www.nextflow.io/docs/latest/index.html

