Skip to content

GenomicsAotearoa/hts_workshop_mpi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This is the opening page for the HTS workshop. Content is divided according to the teaching stream depending on user experience and profficiency.


Contents

  1. Introduction to NeSI

    1. Connecting to NeSI
    2. Navigation on the command line
    3. Working with files on the command line
  2. Quality filtering Illumina data

    1. Inspecting reads
    2. Trimming paired-end reads
  3. Quality filtering Nanopore data

    1. Inspecting reads
    2. Trimming long-read Nanopore data
  4. Annotating sequences with BLAST

    1. Submitting a BLAST job to NeSI
    2. Introduction to slurm
    3. Interpreting BLAST outputs
  5. Level 2 - Advanced

    1. Shell navigation (advanced)
    2. Loops and variables in the command line
    3. Redirection in the command line
    4. De novo assembly of sequencing data
    5. Polishing of genome assemblies
    6. Mapping reads to a reference
      1. Illumina mapping
      2. Nanopore mapping
      3. Filtering and sorting mapping files
      4. Summarising mapping statistics
    7. Annotation and classification refresher
      1. Gene prediction with prodigal
      2. Gene prediction with AUGUSTUS
      3. Annotation with diamond BLASTp
      4. Classification with kraken2

Getting started

The work covered in this training programme are run through the New Zealand eScience Infrastructure (NeSI) platforn. Workshop participants are expected to have set up an account with the correct project access prior to attending this workshop. Please contact the workshop organisers to arrange access to the project accounts.

If you are a beginner to this work, keep in mind that the glossary of terms and slurm module guide which will be helpful as we progress through the materials.


Background - Shell genomics

DOI

An introduction to the Unix shell for people working with genomics data. This material is adapted from the Data Carpentry Genomics Workshop. Please see http://www.datacarpentry.org/shell-genomics/ for the original version of this material.

Command line interface (OS shell) and graphic user interface (GUI) are different ways of interacting with a computer's operating system. The shell is a program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination.

There are quite a few reasons to start learning about the shell:

  • For most bioinformatics tools, you have to use the shell as there is no graphical interface.
  • The shell gives you power to do your work more efficiently and more quickly.
    • When you need to do things tens to hundreds of times, knowing how to use the shell is transformative.

Many of the exercises covered in this training programme are obtained from, or inspired by, the Data Carptentry initiative, particularly their Genomics Workshop1.


Background - Data used in training

This workshop provides a basic introduction to working with the slurm scheduling system, and begins working with Illumina MiSeq and Oxford Nanopore Technology sequence data. The data used in this workshop is mostly using simulated reads, produced using InSilicoSeq[^2] from the Mycoplasma bovis 8790 reference genome NZ_LAUS01000004.1. We also make use of publicly available sequencing data from the studies PRJNA813586, PRJEB38441, and PRJEB38523.

Additional teaching materials were sourced from:

  1. Genomics Aoteoroa Metagenomic Summer School workshop2.
  2. Long-Read, long reach Bioinformatics Tutorial3.
  3. Galaxy Training! seuqence analysis resources4.

Citations

[^2] Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics 35(3), 521-522.

Footnotes

  1. Erin Alison Becker, Anita Schürch, Tracy Teal, Sheldon John McKay, Jessica Elizabeth Mizzi, François Michonneau, et al. (2019, June). datacarpentry/shell-genomics: Data Carpentry: Introduction to the shell for genomics data, June 2019 (Version v2019.06.1). Zenodo. http://doi.org/10.5281/zenodo.3260560.

  2. Jian Sheng Boey, Dinindu Senanayake, Michael Hoggard et al. (2022). Metagenomics Summer School https://github.com/GenomicsAotearoa/metagenomics_summer_school.

  3. Tim Kahlke (2021). Long-Read Data Analysis https://timkahlke.github.io/LongRead_tutorials/.

  4. Joachim Wolff, Bérénice Batut, Helena Rasche (2023). Sequence Analysis (revision 96e01807afff10d6060ac0691d004f0469676534). https://training.galaxyproject.org/training-material/topics/sequence-analysis/.

About

Training materials for the 2023 HTS Plant Health and Environment Laboratory workshop

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •