Skip to content

CBW 2021 PICRUSt2 Tutorial Answers

Morgan Langille edited this page Aug 27, 2021 · 8 revisions

Answers for the CBW 2021 PICRUSt2 tutorial presented as part of the 2021 Canadian Bioinformatics Workshop on microbiome data analysis.

  1. There are 36 samples. You could either count the rows of metadata table or type: wc -l input_files/picrust2_lab_metadata.tsv (which would include the header-line). Alternatively you could get the number of columns in the sequence abundance table with this command:

    awk '{print NF}' input_files/ASV_abun.tsv | head -n 1

  2. You can count how many sequences there are in a FASTA file by counting how many header-lines (i.e. lines that begin with ">") there are:

    grep -c ">" input_files/ASVs.fna
    
  3. The sequence 70334369bb15777be95200a4edaa90be has the highest NSTI value.

  4. The genome represented by sequence 3bc9d66614c8c98d398ace7483422449 is predicted to have 6 copies of the 16S rRNA gene.

  5. The normalized abundance is 3.67, which means that there must have been 3 predicted marker genes. (11/3 = 3.67)

  6. The column sums wouldn't typically be equal since there are different numbers of gene families for each predicted genome. Remember that each predicted genome is based on each input ASV, which will be at variable relative abundances across your samples!

  7. PWY-6317 is galactose degradation I (Leloir pathway).

Clone this wiki locally