Skip to content

SionBayliss/Bio-Courses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Bio-Courses

Introduction

This page compiles a list of links to tutorials which have been written by numerous authors for many of the steps involved in whole genome sequence (WGS) analysis of prokaryotic organisms. Some of these steps contain concepts and ideas that are generally applicable to whole genome sequencing of other organisms (e.g. read QC) although in many cases the recommended software would be different. It should be noted that the first step for any aspiring bioinformatician of any level is to build up familiarity with the Linux command line. This will provide access to powerful and flexible tools for and applications.

Disclaimer

The links and tutorials listed below were not written, and are not owned, by the author of this page unless explicitly noted. We take no responsibility for their maintenance or accuracy.

Content

  1. Linux command line
  2. Programming
    1. Python
    2. Perl
    3. R
  3. Core Concepts in WGS
    1. Whole Genome Sequencing (WGS)
    2. Library Preparation
    3. Sequencing Technology
    4. Coverage
  4. Sequencing Reads
    1. Short Reads
    2. Long Reads
    3. Read QC
  5. Mapping and Variant Calling
  6. Assembly
  7. Assembly QC
  8. Annotation
  9. Phylogenomics
  10. Pangenomics
  11. K-mer and related
  12. Databases
    1. NCBI
    2. ENA
    3. BIGSdb
    4. Enterobase
  13. Servers
    1. EDGE

Command-line tutorials

Familiarity with the Linux command-line is usually the first step for budding informaticians. Many tools are only designed or distributed for Linux-based systems. In addition to this many powerful operations, such as iterating through batches of files, can dramatically reduce and simplify workflows.

Programming

Picking up a programming language allows for an informatician to be more flexible in how they approach analysis workflows. Scripts can be used to automate many complex tasks in a more bespoke way than loops on the command-line. There are some excellent tutorials online for many languages. Python is considered the most powerful and popular language for bioinformatics. Perl comes in a (debatably) close second. R is often used to perform advanced statistical analyses and to produce publication worthy figures.

Perl

Python

R

  • R for begginers – basic introduction to R and statistical analysis.
  • ggplot2 tutorial – an incredibly flexible and powerful family of packages for creating figures using the grammar of graphics.

Core Concepts in WGS

Whole genome sequencing

Library preparation

Sequencing technology

Coverage

Sequence coverage or depth (depth of coverage) is the number of times a base in the target genome is covered by a read e.g. 30x coverage would mean that, on average, each base in your sample will be coverage by 30 reads.

Types of Reads

Short Reads

Long Reads

Read QC

  • Fastqc – an introduction to fastqc, a tool for assessing multiple read quality metrics.
  • Trimmomatic manual - a tools for trimming reads and removing adapter sequences.

Mapping and Variant Calling

  • snippy - a tool for mapping (BWA) and variant calling.

Assembly

Assembly QC

Annotation

Phylogenomics

Pangenomics

K-mer and related

Databases

NCBI

ENA

BIGsDB

Enterobase

Servers

EDGE

About

Tutorials for bacterial WGS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published