Skip to content

Repository for January 26, 2022 "Downloading and assembling microbial sequence data" Workshop by Aaron Petkau

License

Notifications You must be signed in to change notification settings

MMID-coding-workshop/2022-01-26-Downloading-and-assembling-microbial-sequence-data

Repository files navigation

Downloading and assembling microbial sequence data

Binder

This repository contains the slides and additional materials for the Downloading and assembling microbial sequence data MMID Coding workshop for January 26, 2022.

Workshop lecture video (January 26, 2022) video is now available on YouTube at https://www.youtube.com/watch?v=qa0kqE8BIVQ

1. Workshop material

2. Workshop tutorial

In addition to the slides, there is also an additional tutorial provided as an interactive Jupyter notebook.

2.1. Run commands in local terminal

The easiest way to use the Juptyer notebook above is to open it up in GitHub (microbial-genome-assembly.ipynb) and then copy/paste the appropriate commands shown into a local terminal/Bash.

copy-paste-terminal.png

2.2. Run in a cloud-based environmend

If you instead wish to launch the Juptyer notebook in a cloud-based environment to follow along please click the Binder link.

2.3. Run on a local environment

To run the Jupyter notebook in a local termainl first please make sure you have conda/bioconda installed. Then you can install Juptyer (and the software necessary to run Bash in Jupyter) with:

conda create -c conda-forge -c bioconda -c defaults -n jupyterlab jupyterlab calysto_bash zip mamba

Now, to run this tutorial using Juptyer please do:

conda activate jupyterlab

# Only need to run this once to download workshop materials
git clone https://github.com/MMID-coding-workshop/2022-01-26-Downloading-and-assembling-microbial-sequence-data.git

jupyter lab

The command should show you how to open up the running Jupyter application in your web browser. Navigate to and open the file tutorial/microbial-genome-assembly.ipynb and you should now be able to run through it on your local machine.

3. Software used

A list of the necessary software used for this tutorial is given below:

  • conda/bioconda: Software to install and manage software packages.
  • sra-tools: conda create -y -n sra-tools sra-tools
    • prefetch: Downloads genomes from NCBI.
    • fasterq-dump: Converts genomes to fastq format
    • srapath: Prints paths to download SRA files.
  • gzip: Should come installed on any standard Linux/Unix computer (though you can install with conda install gzip)
  • fastp: Quality reports and filtering of sequence reads. conda create -y -n fastp fastp
  • skesa: De novo assembly of bacterial genomes sequenced using Illumina. conda create -y -n skesa skesa
  • Quast: Quality assessment of genome assemblies. conda create -y -n quast quast.

Some other software may be demonstrated in the tutorial but it's not neccessary for performing a genome assembly.

About

Repository for January 26, 2022 "Downloading and assembling microbial sequence data" Workshop by Aaron Petkau

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published