Data, Code and Workflows Guideline

Here we briefly introduce the specific purposes of the dir system:

cache: Here, we store intermediate datasets and results that are generated during the workflow steps.
graphs: The graphs/figures produced during the analysis.
input: Here, we store the raw input data.
lib: The functions used within the workflow.
output: The final output results of the workflow.
workflow: Step by step pipeline.
README: This document

Overview of the workflow: Processing meta-amplicon sequence data

This is an overview of the major workflow steps when processing meta-amplicon reads.

Installation

Running environment:
- The workflow was constructed based on Linux Ubuntu 18.04.5 LTS.
Required software and versions:
cutadapt (Martin, 2011; version 2.10)
ITSxpress (Rivers et al., 2018; version 1.0)
R (R Core Team, 2017; version 3.6.3)
tidyverse (Wickham et al., 2019; version 1.3.0)
DADA2 (Callahan et al., 2016; version 1.14.1)
decontam (Davis et al., 2018; version 1.6.0)
phyloseq (McMurdie & Holmes, 2013; version 1.30.0)
Biostrings (Pagès et al., 2021; version 2.54.0)
phangorn (Schliep, 2011; version 2.2.5)
msa (Bodenhofer et al., 2015; version 1.18.0)
ShortRead (Morgan et al., 2009; version 1.44.3)
corncob (B. D. Martin et al., 2020; version 0.2.0)
vegan (Okansen et al., 2016; version 2.6.0)
patchwork (Pedersen, 2020; version 1.0.1)
RStudio 1.2.5033

Input Data

The example data used here represent the paired-end fastq files generated by the Illumina sequencing platform. The example data files are derived from bacterial 16S amplicon reads taken from seagrass (Halophila ovalis). This workflow can be easily adapted to work with fungal ITS amplicon data as well. Varying steps are noted in the workflow image and in the manuscript. Input data files are compressed with gzip to save space. All of the tools in this workflow can process gzip compressed fastq files.

Example R1 FASTQ file: input/2554_pass_1.fastq.gz
Example R2 FASTQ file: input/2554_pass_2.fastq.gz

Each entry in a FASTQ files consists of 4 lines:

A sequence identifier with information about the sequencing run and the cluster. The exact contents of this line vary by based on the BCL to FASTQ conversion software used.
The sequence (the base calls; A, C, T, G and N).
A separator, which is simply a plus (+) sign.
The base call quality scores. These are Phred +33 encoded, using ASCII characters to represent the numerical quality scores.

The first entry of the input data:

@SRR11142554.8310.1 8310 length=301
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGGTGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGTATAGCTAGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGGCCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGAGTAGAAACCCCAGTAGTCCGGCTGACGGACGAACCGTTCCAGAAACA
+
CCCCCGGGGCFECGGDGFGDGCFCCFFDGGGGCDDGD@CEGGGDFGGFFDFGGDC88CCFEFGDDEDD@88CCFFFGFFGD>D=B>A8<=<DDFDC8D?D?D<=,@,=FEECFFFFFGCFGFEEGF@?BEGGGFEGGEFGGDFFGFGFEDBBFGEFDE@CDFCFFFEFGDGFGCG79CE>ACFFGFDGEDD?@BFGGGGGF4;3=56<:FBF<GFFFFBFF?:?:>963.5349?FF6346CE?9>?<:-+08/+,,<<:@@C>*22.9.244:)((283(34((-4(-,(0(*..54)(.

Major steps

Step 1: pre-processing reads

workflow/00_Remove_Primers.R runs cutadapt to remove primers from 16S and ITS reads workflow/00_Trim_ITS_Region.R is used only when processing fungal ITS reads; runs itsxpress to trim conserved regions from ends of amplicons

Rscript workflow/00_Remove_Primers.R
Rscript workflow/00_Trim_ITS_Region.R # do not run if using bacterial 16S amplicons

Step 2: build an ASV table from input files

workflow/01_Process_Raw_16S_Reads.R carries out the DADA2 pipeline for error correction, variant calling, ASV table generation, and taxonomic assignment

Rscript workflow/01_Process_Raw_16S_Reads.R # can be run on ITS reads as well with minimal changes (noted in code)

Step 3: build a phylogeny

02_Build_and_add_Phylogeny.R estimates a phylogenetic tree of your bacterial 16S variants and adds it to your ASV table

Rscript workflow/02_Build_and_add_Phylogeny.R # do NOT run if using fungal ITS amplicons

Step 4: cleaning and exploring your data set

03_Clean_and_Explore_Data.R performs a quick cleanup to remove non-target sequences (such as chloroplast DNA) 04_Explore_and_Test_Hypotheses.R is an example of how to use the phyloseq package to explore your ASV table and perform common hypothesis tests (e.g., PermANOVA, differential abundance testing)

Rscript workflow/03_Clean_and_Explore_Data.R
Rscript workflow/04_Explore_and_Test_Hypotheses.R

Expected results

License

It is a free and open source software, licensed under (choose a license from the suggested list: GPLv3, MIT, or CC BY 4.0).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
cache		cache
graphs		graphs
input		input
lib		lib
workflow		workflow
.DS_Store		.DS_Store
.gitignore		.gitignore
Meta-Amplicon_Recipe.Rproj		Meta-Amplicon_Recipe.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data, Code and Workflows Guideline

Overview of the workflow: Processing meta-amplicon sequence data

Installation

Input Data

Major steps

Step 1: pre-processing reads

Step 2: build an ASV table from input files

Step 3: build a phylogeny

Step 4: cleaning and exploring your data set

Expected results

License

About

Releases

Packages

Contributors 3

Languages

Bio-protocol/metaamplicon-recipe

Folders and files

Latest commit

History

Repository files navigation

Data, Code and Workflows Guideline

Overview of the workflow: Processing meta-amplicon sequence data

Installation

Input Data

Major steps

Step 1: pre-processing reads

Step 2: build an ASV table from input files

Step 3: build a phylogeny

Step 4: cleaning and exploring your data set

Expected results

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages