Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Microbial Informatics 2014 Labs

Welcome to the Microbial Informatics 2014 labs. This page contains a number of tutorials on performing data analysis on whole genome sequencing data for the Microbial Informatics workshop hosted at the National Microbiology Laboratory in Winnipeg, Canada. These labs can be accessed online at


The data for these labs is a set of whole genome sequencing data from a number of V. Cholerae strains from the outbreak of cholera in Haiti beginning in 2010 as well as a number of other V. cholerae strains included for comparison. This data was previously published in and and is available on NCBI's Sequence Read Archive. A table of the specific data used within this lab is given below.

Strain Location Year NCBI Accession
2010EL-1786 Haiti 2010 NC_016445.1,NC_016446.1
2010EL-1749 Cameroon 2010 SRR773655
2010EL-1796 Haiti 2010 SRR771582
2010EL-1798 Haiti 2010 SRR074109
2011EL-2317 Haiti 2011 SRR773175
2012V-1001 United States 2011 SRR892331
3554-08 Nepal 2008 SRR774919
C6706 Peru 1991 SRR774920
VC-1 Banke district, Nepalgunj municipality 2010 SRR308665
VC-10 Banke district, Nepalgunj municipality 2010 SRR308707
VC-14 Banke district, Nepalgunj municipality 2010 SRR308715
VC-15 Dang Deokhuri district, Narayanpur VDC 2010 SRR308716
VC-18 Banke district, Nepalgunj municipality 2010 SRR308721
VC-19 Kathmandu district, Kathmandu city 2010 SRR308722
VC-25 Rupandehi district, Butawal municipality 2010 SRR308726
VC-26 Rupandehi district, Butawal municipality 2010 SRR308727
VC-6 Banke district, Nepalgunj municipality 2010 SRR308703

These labs will go through data analysis on the above strains. We will not reproduce the exact types of figures from the publications but the labs should help in getting started working with microbial whole genome sequence data.

These labs assume that you are familar working within a Linux environment using the command line.

Running the Labs

Virtual Machine

July 31, 2020: Note the virtual machines are no longer available.

All necessary software to run these labs is provided in the form of a customized Ubuntu virtual machine. You will need to install software such as Oracle Virtual Box in order to run the virtual machine. Please see the Workshop Software instructions for more details.


The data for these labs is provided separately in the file microbial-informatics-2014-data.tar.bz2 and can be downloaded from This is approximetly 1.1 GB. Please download this file from within the Virtual Machine. Once downloaded, the data can be extracted to a directory, Course/ with the following command.

$ tar -xvvjf microbial-informatics-2014-data.tar.bz2

For the remainder of these labs, please adjust any references to /Course with the directory that was just extracted. For example, if the files were extacted within the Downloads directory and a command is given to copy files from /Course please copy the files from ~/Downloads/Course.


Once the virtual machine is running and the data is downloaded, the instructions for these labs can be obtained by running the following.

$ git clone

This will copy all the instructions and other needed files to a directory, microbial-informatics-2014/.


Day 6: May 14, 2014 Day 7: May 15, 2014
8:45-10:15 am: Ortholog detection with OrthoMCL 12:30-2:00 pm: Whole Genome SNP Phylogenomics
genome-groups-small output-10-subsample
10:30-12:15 pm: Working with GView Server 2:15-3:15 pm: Feature Frequency Profile Phylogenies
lab2-pangenome-atlas tree-5
3:00-4:45 pm: Minimum Spanning Trees with PHYLOViZ


Microbial Whole Genome Sequence data analysis labs for 2014






No packages published