# Pangenome Construction using Roary

## Introduction
Given a set of genomes, the pan genome is the collection of all genes the set contains. Roary, the pan genome pipeline, takes closely related annotated genomes in GFF3 file format and calculates the pan genome.

For more in depth information about Roary, please feel free to have a look the Roary paper included on the VM:

> **Roary: Rapid large-scale prokaryote pan genome analysis**  
> Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill  
> _Bioinformatics, 2015;31(22):3691-3693 doi:[10.1093/bioinformatics/btv421](http://bioinformatics.oxfordjournals.org/content/31/22/3691)_

A copy of the paper can be found at 

`/home/course_data/microbial_analysis_II/roary_paper.pdf`

Or visit the [Roary manual at http://sanger-pathogens.github.io/Roary/](http://sanger-pathogens.github.io/Roary/).

## Learning outcomes
By the end of this tutorial you can expect to be able to:

* Describe what a pangenome is
* Prepare data for input to Roary
* Run Roary to create a pangenome 
* Understand the different output files produced by Roary
* Draw a basic tree from the core gene alignment produced by Roary
* Query the pangenome results produced by Roary
* Use Phandango to visualise the results produced by Roary
* Generate a genome assembly

## Tutorial sections
This tutorial comprises the following sections:   
 1. [What is a pan genome](pan_genome.ipynb)   
 2. [Preparing the input data](prepare_data.ipynb)   
 3. [Performing QC on your data](qc.ipynb)   
 4. [Running Roary](run_roary.ipynb)   
 5. [Exploring the results](results.ipynb)   
 6. [Visualising the results with Phandango](phandango.ipynb)
 7. [Creating genome assemblies](assembly.ipynb)   


## Authors
This tutorial was created by [Sara Sjunnebo](https://github.com/ssjunnebo) and [Jacqui Keane](https://github.com/ssjunnebo).

### Running commands in this tutorial
You can follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, open a new terminal on your computer and type the command below:

In [None]:
cd ~/course_data/microbial_analysis_II/data

Now you can follow the instructions in the tutorial from here.

## Let’s get started!
This tutorial requires that you have Prokka, Roary and FastTree installed on your computer. You will also need `spades` and `assembly-stats` installed. They have already been installed on the virtual machine you are using for this training course. To activate the environment and check that the software is installed correctly, run the following command:

In [None]:
conda activate microbial-analysis-II
prokka --help
roary --help
fasttree -h
spades.py -h
assembly-stats

This should return the help messages for all the software tools you will use in this tutorial.

For more information on these tools, please see:

* The [Roary GitHub page (https://github.com/sanger-pathogens/roary)](https://github.com/sanger-pathogens/roary)
* The [Prokka GitHub page (https://github.com/tseemann/prokka)](https://github.com/tseemann/prokka) 
* The [FastTree webpage (http://www.microbesonline.org/fasttree/)](http://www.microbesonline.org/fasttree/)
* The [SPAdes GitHub page (https://github.com/ablab/spades)](https://github.com/ablab/spades)

To get started with the tutorial, go to the first section: [What is a pangenome?](pan_genome.ipynb)  