This repository contains the slides and additional materials for the Downloading and assembling microbial sequence data MMID Coding workshop for January 26, 2022.
Workshop lecture video (January 26, 2022) video is now available on YouTube at https://www.youtube.com/watch?v=qa0kqE8BIVQ
- MMID_CodingWorkshop-2022-01-26-AssemblingMicrobialSequence.pdf: The slides for the workshop.
- microbial-genome-assembly.ipynb: The Jupyter notebook containing the commands used to download and assemble a genome.
In addition to the slides, there is also an additional tutorial provided as an interactive Jupyter notebook.
The easiest way to use the Juptyer notebook above is to open it up in GitHub (microbial-genome-assembly.ipynb) and then copy/paste the appropriate commands shown into a local terminal/Bash.
If you instead wish to launch the Juptyer notebook in a cloud-based environment to follow along please click the link.
To run the Jupyter notebook in a local termainl first please make sure you have conda/bioconda installed. Then you can install Juptyer (and the software necessary to run Bash in Jupyter) with:
conda create -c conda-forge -c bioconda -c defaults -n jupyterlab jupyterlab calysto_bash zip mamba
Now, to run this tutorial using Juptyer please do:
conda activate jupyterlab
# Only need to run this once to download workshop materials
git clone https://github.com/MMID-coding-workshop/2022-01-26-Downloading-and-assembling-microbial-sequence-data.git
jupyter lab
The command should show you how to open up the running Jupyter application in your web browser. Navigate to and open the file tutorial/microbial-genome-assembly.ipynb
and you should now be able to run through it on your local machine.
A list of the necessary software used for this tutorial is given below:
- conda/bioconda: Software to install and manage software packages.
- sra-tools:
conda create -y -n sra-tools sra-tools
prefetch
: Downloads genomes from NCBI.fasterq-dump
: Converts genomes to fastq formatsrapath
: Prints paths to download SRA files.
- gzip: Should come installed on any standard Linux/Unix computer (though you can install with
conda install gzip
) - fastp: Quality reports and filtering of sequence reads.
conda create -y -n fastp fastp
- skesa: De novo assembly of bacterial genomes sequenced using Illumina.
conda create -y -n skesa skesa
- Quast: Quality assessment of genome assemblies.
conda create -y -n quast quast
.
Some other software may be demonstrated in the tutorial but it's not neccessary for performing a genome assembly.