### Parkinson's Mouse Tutorial - Import & Demux

Run this notebook in `qiime2-2022.11`.

Well be working through the [pd-mouse tutorial](https://docs.qiime2.org/2022.11/tutorials/pd-mice/).

*Note: did you run `jupyter serverextension enable --py qiime2 --sys-prefix` before getting here?*

Also, see the [Jupyter Markdown documentation](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb).

In [1]:
from os import getcwd, listdir, chdir, mkdir
import qiime2 as q2

In [2]:
! qiime info

[32mSystem versions[0m
Python version: 3.8.18
QIIME 2 release: 2024.5
QIIME 2 version: 2024.5.0.dev0+1.g6306962
q2cli version: 2024.5.0.dev0
[32m
Installed plugins[0m
alignment: 2024.5.0.dev0+1.g3a9c58b
composition: 2024.5.0.dev0
cutadapt: 2024.5.0.dev0
dada2: 2024.5.0.dev0
deblur: 2024.5.0.dev0
demux: 2024.5.0.dev0
diversity: 2024.5.0.dev0+1.g99a0cca
diversity-lib: 2024.5.0.dev0
emperor: 2024.5.0.dev0
feature-classifier: 2024.5.0.dev0
feature-table: 2024.5.0.dev0+2.g65222bd
fragment-insertion: 2024.5.0.dev0
longitudinal: 2024.5.0.dev0
metadata: 2024.5.0.dev0
phylogeny: 2024.5.0.dev0
quality-control: 2024.5.0.dev0
quality-filter: 2024.5.0.dev0
rescript: 2024.5.0.dev0+2.ga0df425
sample-classifier: 2024.5.0.dev0
taxa: 2024.5.0.dev0
types: 2024.5.0.dev0+4.g823b5a4
vsearch: 2024.5.0.dev0
[32m
Application config directory[0m
/Users/fatimamubeenshaik/miniconda3/envs/qiime2-dev/var/q2cli[0m
[32m
Getting help[0m
To get help with QIIME 2, visit https://qiime2.org[0m


In [3]:
getcwd()

'/Users/fatimamubeenshaik/IdeaProjects/ParkinsonMouseTrail/src/main'

In [4]:
listdir()

['02-Parkinson-Mouse-Tutorial-Taxonomy-Phylogeny (2).ipynb',
 '04-Parkinson-Mouse-Tutorial-Diff-Abund (2).ipynb',
 '.DS_Store',
 '01-Parkinson-Mouse-Tutorial-Import-Demux (2).ipynb',
 '.ipynb_checkpoints',
 '03-Parkinson-Mouse-Tutorial-Diversity.ipynb',
 'processed']

In [5]:
mkdir('./processed')

FileExistsError: [Errno 17] File exists: './processed'

In [6]:
chdir('./processed')
getcwd()

'/Users/fatimamubeenshaik/IdeaProjects/ParkinsonMouseTrail/src/main/processed'

## Download and View Metadata

We'll use `wget` to download the metadata file, and then visualize it in onw of two ways:
 - [QIIME 2 View Website](https://view.qiime2.org/)
 - [QIIME 2 CLI / Utilities](https://docs.qiime2.org/2022.11/tutorials/utilities/)
 - [QIIME 2 API](https://docs.qiime2.org/2022.11/interfaces/artifact-api/)
 
 *Note: If you are running this notebook on the HPC, you may need to copy and paste these commands into the "Grace Shell Access" under the "Clusters" menu of the Grace HPC Portal page. Make sure you are downloading the files into the appropriate directory. Aalternatively, simply download the files to you computer and use Jupyter Lab to upload the files.*

In [11]:
# Download Metadata
! wget \
    -O "metadata.tsv" \
    "https://data.qiime2.org/2022.11/tutorials/pd-mice/sample_metadata.tsv"

--2024-04-07 15:22:33--  https://data.qiime2.org/2022.11/tutorials/pd-mice/sample_metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://docs.google.com/spreadsheets/d/e/2PACX-1vTH4PG7f-0EsIZdTm2l5d8dwS8TAVdMFSz7wArszulm-FKdaWmiSv7p3Si6ohMh9TsN8tp4F7V_4VL4/pub?gid=1509704122&single=true&output=tsv [following]
--2024-04-07 15:22:33--  https://docs.google.com/spreadsheets/d/e/2PACX-1vTH4PG7f-0EsIZdTm2l5d8dwS8TAVdMFSz7wArszulm-FKdaWmiSv7p3Si6ohMh9TsN8tp4F7V_4VL4/pub?gid=1509704122&single=true&output=tsv
Resolving docs.google.com (docs.google.com)... 142.251.46.206
Connecting to docs.google.com (docs.google.com)|142.251.46.206|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://doc-00-60-sheets.googleusercontent.com/pub/54bogvaave6cua4cdnls17ksc4/r1mbrkfupeb0a62avr6bc8moa8/17125285

In [7]:
# Peek at the metadata
! qiime tools inspect-metadata metadata.tsv

[1m              COLUMN NAME  TYPE       [0m
                  barcode  categorical
                 mouse_id  categorical
                 genotype  categorical
                  cage_id  categorical
                    donor  categorical
             donor_status  categorical
     days_post_transplant  numeric    
genotype_and_donor_status  categorical
[1m                     IDS:  [0m48
[1m                 COLUMNS:  [0m8
[0m

**Make metadata Visualization**

In [8]:
! qiime metadata tabulate \
  --m-input-file metadata.tsv \
  --o-visualization metadata.qzv

[32mSaved Visualization to: metadata.qzv[0m
[0m

In [14]:
! qiime tools peek metadata.qzv

[32mUUID[0m:        012dcf7b-249e-4d80-9962-ea2e83e06ba5
[32mType[0m:        Visualization


In [9]:
# Visualize via API
q2.Visualization.load('metadata.qzv')

## Import data into QIIME 2

We will import:
 - [Manifest File](https://docs.qiime2.org/2022.11/tutorials/importing/#fastq-manifest-formats)
 - Demultiplexed Sequences (contrast to Multiplexed Sequences)
 
See the [Importing Data Tutorial](https://docs.qiime2.org/2022.11/tutorials/importing/#importing-data) for more information.

In [10]:
# get manifest file
!wget \
  -O "manifest.tsv" \
  "https://data.qiime2.org/2022.11/tutorials/pd-mice/manifest"

--2024-04-07 15:37:52--  https://data.qiime2.org/2022.11/tutorials/pd-mice/manifest
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2022.11/tutorials/pd-mice/manifest [following]
--2024-04-07 15:37:52--  https://s3-us-west-2.amazonaws.com/qiime2-data/2022.11/tutorials/pd-mice/manifest
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.92.176.224, 52.92.131.8, 52.92.128.224, ...
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.92.176.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4640 (4.5K) [binary/octet-stream]
Saving to: ‘manifest.tsv’


2024-04-07 15:37:53 (24.6 MB/s) - ‘manifest.tsv’ saved [4640/4640]



In [11]:
# get demultiplexed sequences
!wget \
  -O "demultiplexed_seqs.zip" \
  "https://data.qiime2.org/2022.11/tutorials/pd-mice/demultiplexed_seqs.zip"

--2024-04-07 15:37:58--  https://data.qiime2.org/2022.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2022.11/tutorials/pd-mice/demultiplexed_seqs.zip [following]
--2024-04-07 15:37:58--  https://s3-us-west-2.amazonaws.com/qiime2-data/2022.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.92.227.40, 52.92.131.8, 52.92.128.224, ...
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.92.227.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21508775 (21M) [application/zip]
Saving to: ‘demultiplexed_seqs.zip’


2024-04-07 15:37:59 (30.5 MB/s) - ‘demultiplexed_seqs.zip’ saved [21508775/21508775]



In [12]:
# unzip sequences
! unzip demultiplexed_seqs.zip

Archive:  demultiplexed_seqs.zip
   creating: demultiplexed_seqs/
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D7_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D14_5_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.413.WT.HC2.D7_12_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.458.ASO.HC3.D49_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.538.WT.PD4.D21_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.459.WT.HC3.D14_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D7_20_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.465.ASO.PD3.D14_16_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D21_11_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.540.ASO.HC4.D7_7_L001_R1_001.fastq.gz  
  i

In [None]:
! head manifest.tsv

**Import and Summarize Data**

In [None]:
! qiime tools import \
  --type "SampleData[SequencesWithQuality]" \
  --input-format SingleEndFastqManifestPhred33V2 \
  --input-path ./manifest.tsv \
  --output-path ./demux_seqs.qza

In [17]:
! qiime demux summarize \
  --i-data ./demux_seqs.qza \
  --o-visualization ./demux_seqs.qzv

[32mSaved Visualization to: ./demux_seqs.qzv[0m
[0m

## Denoising Sequence data

 - DADA2 approach as outlined in the tutorial.
 - Alternate trimming w/ DADA2.
 - Using deblur w/ default trimming.

### Default

In [18]:
getcwd()

'/Users/fatimamubeenshaik/IdeaProjects/ParkinsonMouseTrail/src/main/processed'

In [19]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trunc-len 150 \
    --p-n-threads 4 \
    --o-table ./dada2_table.qza \
    --o-representative-sequences ./dada2_rep_set.qza \
    --o-denoising-stats ./dada2_stats.qza \
    --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/qiime2/fatimamubeenshaik/data/dcdfe722-ba25-4fca-b79b-8dcf1ccc8bcd/data --output_path /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpbvp3ysgl/output.tsv.biom --output_track /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpbvp3ysgl/track.tsv --filtered_directory /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpbvp3ysgl --truncation_length 150 --trim_left 0 --max_expected_errors 2.0 --truncation_quality_score 2 --max_length Inf --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 4 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 16

package ‘optparse’ was built under R version 4.

In [20]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats.qza  \
    --o-visualization ./dada2_stats.qzv

[32mSaved Visualization to: ./dada2_stats.qzv[0m
[0m

In [21]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table.qzv

[32mSaved Visualization to: ./dada2_table.qzv[0m
[0m

In [22]:
! qiime tools export \
    --input-path ./dada2_table.qza \
    --output-path ./dada2_table_export

[32mExported ./dada2_table.qza as BIOMV210DirFmt to directory ./dada2_table_export[0m
[0m

In [23]:
! qiime feature-table tabulate-seqs \
    --i-data ./dada2_rep_set.qza \
    --o-visualization ./dada2_rep_set.qzv

[32mSaved Visualization to: ./dada2_rep_set.qzv[0m
[0m

In [24]:
! biom convert --to-tsv \
    -i ./dada2_table_export/feature-table.biom \
    -o ./dada2_table_export/feature-table.txt

In [25]:
! qiime feature-table transpose \
    --i-table ./dada2_table.qza \
    --o-transposed-feature-table ./dada2_table_transposed.qza

! qiime metadata tabulate \
    --m-input-file ./dada2_table_transposed.qza  \
    --o-visualization ./dada2_table_transposed_tab.qzv

! qiime tools export \
    --input-path ./dada2_table_transposed.qza \
    --output-path ./dada2_table_transposed_export

! biom convert --to-tsv \
    -i ./dada2_table_transposed_export/feature-table.biom \
    -o ./dada2_table_transposed_export/feature-table.txt

[32mSaved FeatureTable[Frequency] to: ./dada2_table_transposed.qza[0m
[0m[32mSaved Visualization to: ./dada2_table_transposed_tab.qzv[0m
[0m[32mExported ./dada2_table_transposed.qza as BIOMV210DirFmt to directory ./dada2_table_transposed_export[0m
[0m

In [26]:
! qiime metadata tabulate \
    --m-input-file ./dada2_table.qza  \
    --o-visualization ./dada2_table_tab.qzv

[32mSaved Visualization to: ./dada2_table_tab.qzv[0m
[0m

### Alternate Trimming w/ DADA2

In [27]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trim-left 30 \
    --p-trunc-len 130 \
    --o-table ./dada2_table_alt.qza \
    --o-representative-sequences ./dada2_rep_set_alt.qza \
    --o-denoising-stats ./dada2_stats_alt.qza \
    --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/qiime2/fatimamubeenshaik/data/dcdfe722-ba25-4fca-b79b-8dcf1ccc8bcd/data --output_path /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpdvyxz33c/output.tsv.biom --output_track /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpdvyxz33c/track.tsv --filtered_directory /var/folders/2y/702nfmtx76sd29583bwt4gp00000gn/T/tmpdvyxz33c --truncation_length 130 --trim_left 30 --max_expected_errors 2.0 --truncation_quality_score 2 --max_length Inf --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 16

package ‘optparse’ was built under R version 4

In [28]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats_alt.qza  \
    --o-visualization ./dada2_stats_alt.qzv

[32mSaved Visualization to: ./dada2_stats_alt.qzv[0m
[0m

In [29]:
q2.Visualization.load('dada2_stats_alt.qzv')

In [30]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table_alt.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table_alt.qzv

[32mSaved Visualization to: ./dada2_table_alt.qzv[0m
[0m

In [31]:
q2.Visualization.load('dada2_table_alt.qzv')

### deblur w/ default

In [32]:
! qiime quality-filter q-score \
    --i-demux ./demux_seqs.qza \
    --o-filtered-sequences demux-seqs-deblur.qza \
    --o-filter-stats demux-deblur-stats.qza

[32mSaved SampleData[SequencesWithQuality] to: demux-seqs-deblur.qza[0m
[32mSaved QualityFilterStats to: demux-deblur-stats.qza[0m
[0m

In [33]:
# Defaults to Greengenes. 
#    If you want to use SILVA or another ref db, then use:
#    `qiime deblur denoise-other`
#    silva files are located here: https://docs.qiime2.org/2022.11/data-resources/
! qiime deblur denoise-16S \
    --i-demultiplexed-seqs demux-seqs-deblur.qza \
    --p-trim-length 131 \
    --o-representative-sequences rep-seqs-deblur.qza \
    --o-table table-deblur.qza \
    --p-sample-stats \
    --o-stats deblur-stats.qza

[32mSaved FeatureTable[Frequency] to: table-deblur.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-deblur.qza[0m
[32mSaved DeblurStats to: deblur-stats.qza[0m
[0m

In [34]:
! qiime metadata tabulate \
    --m-input-file demux-deblur-stats.qza \
    --o-visualization demux-deblur-stats.qzv

! qiime deblur visualize-stats \
    --i-deblur-stats deblur-stats.qza \
    --o-visualization deblur-stats.qzv

[32mSaved Visualization to: demux-deblur-stats.qzv[0m
[0m[32mSaved Visualization to: deblur-stats.qzv[0m
[0m

In [35]:
q2.Visualization.load('demux-deblur-stats.qzv')

In [36]:
q2.Visualization.load('deblur-stats.qzv')

In [37]:
! qiime feature-table summarize \
    --i-table table-deblur.qza \
    --o-visualization table-deblur.qzv \
    --m-sample-metadata-file metadata.tsv

! qiime feature-table tabulate-seqs \
    --i-data rep-seqs-deblur.qza \
    --o-visualization rep-seqs-deblur.qzv

[32mSaved Visualization to: table-deblur.qzv[0m
[0m[32mSaved Visualization to: rep-seqs-deblur.qzv[0m
[0m

In [38]:
q2.Visualization.load('table-deblur.qzv')

In [39]:
q2.Visualization.load('rep-seqs-deblur.qzv')