Test different functions to get the core microbiota:

In [8]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline
data_dir = 'CE'

##### Download metadata

In [9]:
! wget -nv -O $data_dir/food-metadata.tsv 'https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download'

2022-12-15 19:52:49 URL:https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download [42810/42810] -> "CE/food-metadata.tsv" [1]


Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

#### Workflow
1) Try different parameters to find core features
2) Find core features of all cheeses in our feature table
3) Find core features of Swiss cheeses (in categories rindtype = natural, washed or style = alpine
4) Find core features of similar neighboring country cheeses.
5) Compare results of Swiss to neighboring country cheeses.
6) Find core features of different variety cheeses.

### 1) Try different parameters to find core features

I tried different values for the parameters:

#### 1. Try

Used the function with the default values:

In [10]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.5 \
--o-visualization $data_dir/core_microbiota.qzv

Usage: [94mqiime feature-table core-features[0m [OPTIONS]

  Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

[1mInputs[0m:
  [94m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                       The feature table to use in core features
                       calculations.                                [35m[required][0m
[1mParameters[0m:
  [94m--p-min-fraction[0m PROPORTION [32mRange(0.0, 1.0, inclusive_start=False)[0m
                       The minimum fraction of samples that a feature must be
                       observed in for that feature to be considered a core
                       feature.                                 [35m[default: 0.5][0m
  [94

In [11]:
Visualization.load(f'{data_dir}/core_microbiota.qzv')

ValueError: CE/core_microbiota.qzv does not exist.

#### 2. Try

Used the function with higher min-fraction:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--o-visualization $data_dir/core_microbiota_2.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_2.qzv')

#### 3. Try

Using different step value:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--p-steps 5 \
--o-visualization $data_dir/core_microbiota_3.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_3.qzv')

#### 4. Try

Use different min-fraction:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_4.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_4.qzv')

### 2) Find core features of all cheeses in our feature table

#### Download tsv file of core features of all cheeses

TSV file with feature list could be downloaded from the output above. I downloaded the file and put it on polybox. I set the treshold of fraction of samples (The fraction of the total number of samples that a feature must be observed in for that feature to be considered "core".) to 0.7.
Here we import this data from the polybox:

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_0.7.tsv 'https://polybox.ethz.ch/index.php/s/WRm86jdxvkxPOVa/download'

These are the core features of all cheeses:

In [None]:
df_core_all = pd.read_csv(f'{data_dir}/core_microbiota_list_0.7.tsv', sep ='\t')
df_core_all.set_index('Feature ID', inplace = True)
df_core_all

Load/show qiime artifact as pandas dataframe and afterwards add the Taxon column to the core feature table.

In [None]:
taxa = q2.Artifact.load(f'{data_dir}/taxonomy_v4.qza')
taxa = taxa.view(pd.DataFrame)

In [None]:
core_all_taxa = df_core_all.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_all_taxa

### 3) Find core features of Swiss cheeses (in categories rindtype = natural, washed or style = alpine

Do cheeses from
Switzerland share this core microbiome with similar cheeses (e.g., same style/rind type) from neighboring
countries?

##### Find core features of CH cheeses with natural rindtype:

Result: 33 core features

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_CH_natural.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_natural.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_natural.qzv')

##### Find core features of CH cheeses with washed rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_CH_washed.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_washed.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_washed.qzv')

##### Find core features of CH cheeses with alpine style:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [style]='alpine'"\
--o-filtered-table $data_dir/feature_table_CH_alpine.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_alpine.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_alpine.qzv')

### 4) Find core features of similar neighboring country cheeses.

Filter table to have only cheeses from neighboring countries (no cheeses from Germany or Austria in our dataset):

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='France' OR [country]='Italy'"\
--o-filtered-table $data_dir/feature_table_neighbor.qza

##### Find core features of neighboring cheeses with natural rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_neighbor_natural.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_natural.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_natural.qzv')

##### Find core features of neighboring cheeses with washed rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_neighbor_washed.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_washed.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_washed.qzv')

##### Find core features of neighboring cheeses with alpine style:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='alpine'"\
--o-filtered-table $data_dir/feature_table_neighbor_alpine.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_alpine.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_alpine.qzv')

### 5) Compare results of core features of CH cheeses with similar cheeses from neighboring countries

--> add column to table with taxonomy
--> get list with only feature IDs
--> use python set intersection function

##### Cheeses with natural rindtype

Download of tsv files with core features (fraction of samples = 0.7)

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_natural.tsv 'https://polybox.ethz.ch/index.php/s/5ZVUmvDoy1VBTAx/download'

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_neighbor_natural.tsv 'https://polybox.ethz.ch/index.php/s/cAEL47rLr8ELoV5/download'

Read tsv files into pandas dataframe and add column with taxon:

In [None]:
#core features from CH cheeses with natural rindtype
df_core_ch_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_natural.tsv', sep ='\t')
df_core_ch_nat.set_index('Feature ID', inplace = True)
core_ch_nat_taxa = df_core_ch_nat.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_ch_nat_taxa

In [None]:
#core features from neighboring countries with natural rindtype
df_core_nei_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_natural.tsv', sep ='\t')
df_core_nei_nat.set_index('Feature ID', inplace = True)
core_nei_nat_taxa = df_core_nei_nat.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_nei_nat_taxa

Compare values between the two dataframes created above:

In [None]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_nat = list(df_core_ch_nat.index.values)
set_ch_nat = set(index_list_ch_nat)
index_list_nei_nat = list(df_core_nei_nat.index.values)
set_nei_nat = set(index_list_nei_nat)

In [None]:
#get set of Feature IDs which are the same in both sets
set_core_nat = set_ch_nat.intersection(set_nei_nat)

print(set_core_nat)

In [None]:
core_nat = pd.DataFrame(set_core_nat)
core_nat.set_index(0, inplace = True)
#core_nat = core_nat.rename(index={'Feature ID'})
core_nat_taxa = core_nat.join(taxa['Taxon'])
core_nat_taxa

##### Cheeses with washed rindtype

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_washed.tsv 'https://polybox.ethz.ch/index.php/s/M5WGsq8gReQGrQq/download'
! wget -nv -O $data_dir/core_microbiota_list_neighbor_washed.tsv 'https://polybox.ethz.ch/index.php/s/uO4l1YWYO91DkxH/download'

In [None]:
#core features from CH cheeses with washed rindtype
df_core_ch_was = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_washed.tsv', sep ='\t')
df_core_ch_was.set_index('Feature ID', inplace = True)
core_ch_was_taxa = df_core_ch_was.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_ch_was_taxa

In [None]:
#core features from neighboring countries with washed rindtype
df_core_nei_was = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_washed.tsv', sep ='\t')
df_core_nei_was.set_index('Feature ID', inplace = True)
core_nei_was_taxa = df_core_nei_was.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_nei_was_taxa

In [None]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_was = list(df_core_ch_was.index.values)
set_ch_was = set(index_list_ch_was)
index_list_nei_was = list(df_core_nei_was.index.values)
set_nei_was = set(index_list_nei_was)

In [None]:
#get set of Feature IDs which are the same in both sets
set_core_was = set_ch_was.intersection(set_nei_was)

print(set_core_was)

In [None]:
core_was = pd.DataFrame(set_core_was)
core_was.set_index(0, inplace = True)
#core_was = core_was.rename(index={'Feature ID'})
core_was_taxa = core_was.join(taxa['Taxon'])
core_was_taxa

##### Cheeses with alpine style

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_alpine.tsv 'https://polybox.ethz.ch/index.php/s/f8vVurBBWM740hB/download'
! wget -nv -O $data_dir/core_microbiota_list_neighbor_alpine.tsv 'https://polybox.ethz.ch/index.php/s/k4Yy6aCgH2G2gkT/download'

In [None]:
#core features from CH cheeses in alpine style
df_core_ch_alp = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_alpine.tsv', sep ='\t')
df_core_ch_alp.set_index('Feature ID', inplace = True)
core_ch_alp_taxa = df_core_ch_alp.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_ch_alp_taxa

In [None]:
#core features from neighboring countries in alpine style
df_core_nei_alp = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_alpine.tsv', sep ='\t')
df_core_nei_alp.set_index('Feature ID', inplace = True)
core_nei_alp_taxa = df_core_nei_alp.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_nei_alp_taxa

In [None]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_alp = list(df_core_ch_alp.index.values)
set_ch_alp = set(index_list_ch_alp)
index_list_nei_alp = list(df_core_nei_alp.index.values)
set_nei_alp = set(index_list_nei_alp)

In [None]:
#get set of Feature IDs which are the same in both sets
set_core_alp = set_ch_alp.intersection(set_nei_alp)

print(set_core_alp)

In [None]:
core_alp = pd.DataFrame(set_core_alp)
core_alp.set_index(0, inplace = True)
#core_alp = core_alp.rename(index={'Feature ID'})
core_alp_taxa = core_alp.join(taxa['Taxon'])
core_alp_taxa

### 6)

##### Core features of alpine style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='alpine'"\
--o-filtered-table $data_dir/feature_table_alpine.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_alpine.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_alpine.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_alpine.tsv 'https://polybox.ethz.ch/index.php/s/uOeN2PpeHLYPtMs/download'

In [None]:
#read tsv file into dataframe 
#add taxa column
core_alpine = pd.read_csv(f'{data_dir}/core_microbiota_list_alpine.tsv', sep ='\t')
core_alpine.set_index('Feature ID', inplace = True)
core_alpine_taxa = core_alpine.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_alpine_taxa

##### Core features of blue style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='blue'"\
--o-filtered-table $data_dir/feature_table_blue.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_blue.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_blue.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_blue.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_blue.tsv 'https://polybox.ethz.ch/index.php/s/Slu48m7FhULUbxE/download'

In [None]:
#read tsv file into dataframe 
#add taxa column
core_blue = pd.read_csv(f'{data_dir}/core_microbiota_list_blue.tsv', sep ='\t')
core_blue.set_index('Feature ID', inplace = True)
core_blue_taxa = core_blue.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_blue_taxa

##### Core features of washed_bloomy style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='washed_bloomy'"\
--o-filtered-table $data_dir/feature_table_wb.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_wb.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_wb.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_wb.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_wb.tsv 'https://polybox.ethz.ch/index.php/s/WFKVTDYyNiQAE56/download'

In [None]:
#read tsv file into dataframe 
#add taxa column
core_wb = pd.read_csv(f'{data_dir}/core_microbiota_list_wb.tsv', sep ='\t')
core_wb.set_index('Feature ID', inplace = True)
core_wb_taxa = core_wb.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_wb_taxa

##### Core features of clothbound style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='clothbound'"\
--o-filtered-table $data_dir/feature_table_cloth.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_cloth.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_cloth.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_cloth.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_cloth.tsv 'https://polybox.ethz.ch/index.php/s/mCGw1VTuQKNmw5c/download'

In [None]:
#read tsv file into dataframe 
#add taxa column
core_cloth = pd.read_csv(f'{data_dir}/core_microbiota_list_cloth.tsv', sep ='\t')
core_cloth.set_index('Feature ID', inplace = True)
core_cloth_taxa = core_cloth.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_cloth_taxa