# Module 13: Workflows/pipelines GPS and GBS

## Overview


*GPS Unified Pipeline* es un Nextflow Pipeline para el procesamiento de lecturas brutas de secuenciación de *Streptococcus pneumoniae* (archivos FASTQ) por el Proyecto GPS ([Global Pneumococcal Sequencing Project](https://www.pneumogen.net/gps/)) is available via this link: https://github.com/HarryHung/gps-unified-pipeline.

*GBS Typer Pipeline* eis a Nextflow pipeline for characterising serotype, MLST, AMR and surface proteins from Streptococcus agalactiae sequences (FASTQ files) by the [JUNO Project](https://www.gbsgen.net/) and is available [here](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf). The pipeline takes an input of trimmed and QCed paired-end Streptoccocus agalactiae reads, processes them in parallel through these "workflows" and combines the results to create a main report (and MLST and AMR gene allele FASTA files, if applicable).

![Intro](./images/gbs.png)

>  In this module we will work with the GBS Typer Pipeline.

## Details of each workflow:

1. MLST with SRST2

Downloads the MLST database for *Streptococcus agalactiae* and uses SRST2 to do MLST 

2.  Serotyping with SRST2

Downloads the GBS serotype [database](https://github.com/swainechen/GBS-SBG) and uses SRST2 to identify serotypes (in a similar way to MLST)

3. Surface protein typing with SRST2

Uses a custom-made surface gene [database](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf/tree/main/db/0.2.1/GBS_Surface_Gene-DB)  and uses SRST2 to identify surface proteins (in a similar way to MLST)

4. Resistance typing with SRST2

Uses a custom-made GBS antimicrobial resistance gene [database](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf/tree/main/db/0.2.1/GBS_resTyper_Gene-DB) dand ResFinder and uses SRST2 to identify AMR genes

5. Variant detection with freebayes

Uses [freebayes](https://github.com/freebayes/freebayes) to generate consensus MLST/antimicrobial resistance alleles (in FASTA format) based on imperfect alignments from SRST2 (Not part of the main report)

## Running the pipeline

>Running the pipeline requires an internet connection

>Currently it supports only paired-end reads

### Installation

#### 1. Install condacolab

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
!conda config --add channels bioconda

#### 2. Install nextflow

In [None]:
!conda create -n nextflow nextflow
!conda activate nextflow

#### 3. Downloading the pipeline in the "Data" directory

In [None]:
!git clone https://github.com/sanger-pathogens/GBS-Typer-sanger-nf.git

### Usage
Go into the GBS-Typer-sanger-nf directory:

In [None]:
%cd GBS-Typer-sanger-nf

Run with two samples 20280_5#40 and 20280_5#47 from "assignment_s.agalactiae" in "Section_tree". This will generate reports in a new directory called "my_samples". 

In [None]:
!nextflow run main.nf --reads '../Section_three/assignment_s.agalactiae/20280_5#4*_{1,2}.fastq.gz' --results_dir my_samples

This should take about ~20 minutes depening on your system. When it is successfully completed, you should see:


Open "gbs_typer_report.txt" in the my_samples directory (using Excel or other spreadsheet tool):

Each column can be explained from the dictionary here Links to an external site. where the "category" column is "in_silico_analysis".

### More information

- Running the PBP typing and allele detection workflow is described [here](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf)

- Advanced options on [changing parameters](https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf)

- Examples of other Nextflow pipelines hosted by [nf-core](https://nf-co.re/)

- Resources for building your own Nextflow pipelines:
    - Tutorial: https://training.seqera.io/ 

    - Reference: https://www.nextflow.io/docs/latest/index.html