Test different functions to get the core microbiota:

In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline
data_dir = 'CE'

##### Download metadata

In [14]:
! wget -nv -O $data_dir/food-metadata.tsv 'https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download'

2022-12-12 08:54:46 URL:https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download [42810/42810] -> "CE/food-metadata.tsv" [1]


Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

#### Workflow
1) Try different parameters to find core features
2) Find core features of all cheeses in our feature table
3) Find core features of Swiss cheeses (in categories rindtype = natural, washed or style = alpine
4) Find core features of similar neighboring country cheeses.
5) Compare results of Swiss to neighboring country cheeses.

### 1)

I tried different values for the parameters:

#### 1. Try

Used the function with the default values:

In [9]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.5 \
--o-visualization $data_dir/core_microbiota.qzv

Usage: [94mqiime feature-table core-features[0m [OPTIONS]

  Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

[1mInputs[0m:
  [94m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                       The feature table to use in core features
                       calculations.                                [35m[required][0m
[1mParameters[0m:
  [94m--p-min-fraction[0m PROPORTION [32mRange(0.0, 1.0, inclusive_start=False)[0m
                       The minimum fraction of samples that a feature must be
                       observed in for that feature to be considered a core
                       feature.                                 [35m[default: 0.5][0m
  [94

In [2]:
Visualization.load(f'{data_dir}/core_microbiota.qzv')

#### 2. Try

Used the function with higher min-fraction:

In [14]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--o-visualization $data_dir/core_microbiota_2.qzv

[32mSaved Visualization to: CE/core_microbiota_2.qzv[0m
[0m

In [3]:
Visualization.load(f'{data_dir}/core_microbiota_2.qzv')

#### 3. Try

Using different step value:

In [17]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--p-steps 5 \
--o-visualization $data_dir/core_microbiota_3.qzv

[32mSaved Visualization to: CE/core_microbiota_3.qzv[0m
[0m

In [4]:
Visualization.load(f'{data_dir}/core_microbiota_3.qzv')

#### 4. Try

Use different min-fraction:

In [22]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_4.qzv

[32mSaved Visualization to: CE/core_microbiota_4.qzv[0m
[0m

In [5]:
Visualization.load(f'{data_dir}/core_microbiota_4.qzv')

### 2)

#### Download tsv file of core features of all cheeses

TSV file with feature list could be downloaded from the output above. I downloaded the file and put it on polybox. I set the treshold of fraction of samples (The fraction of the total number of samples that a feature must be observed in for that feature to be considered "core".) to 0.7.
Here we import this data from the polybox:

In [11]:
! wget -nv -O $data_dir/core_microbiota_list_0.7.tsv 'https://polybox.ethz.ch/index.php/s/WRm86jdxvkxPOVa/download'

2022-12-12 08:53:38 URL:https://polybox.ethz.ch/index.php/s/WRm86jdxvkxPOVa/download [490/490] -> "CE/core_microbiota_list_0.7.tsv" [1]


This are the core features of all cheeses:

In [7]:
df_core_all = pd.read_csv(f'{data_dir}/core_microbiota_list_0.7.tsv', sep ='\t')
df_core_all.set_index('Feature ID', inplace = True)
df_core_all

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,4.0,12.0,184.5,1897.5,8307.25,51957.23,88576.02
805c1b3ec3035abbb7b9f1f7f6157e12,0.0,13.0,98.5,741.0,6608.0,21851.37,102426.32
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,8.0,87.0,1019.25,6905.73,28623.58
369232e1ac9f9983056d09b9fe866df5,0.0,0.0,8.0,44.0,400.75,2877.72,12945.78
398e906d9ad1914eb268fda5c7453e09,0.0,3.0,6.0,32.0,1070.25,11938.72,47885.18


### 3) 

Do cheeses from
Switzerland share this core microbiome with similar cheeses (e.g., same style/rind type) from neighboring
countries?

##### Find core features of CH cheeses with natural rindtype:

Result: 33 core features

In [22]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_CH_natural.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_CH_natural.qza[0m
[0m

In [29]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_natural.qzv

[32mSaved Visualization to: CE/core_microbiota_CH_natural.qzv[0m
[0m

In [19]:
Visualization.load(f'{data_dir}/core_microbiota_CH_natural.qzv')

##### Find core features of CH cheeses with washed rindtype:

In [31]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_CH_washed.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_CH_washed.qza[0m
[0m

In [32]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_washed.qzv

[32mSaved Visualization to: CE/core_microbiota_CH_washed.qzv[0m
[0m

In [5]:
Visualization.load(f'{data_dir}/core_microbiota_CH_washed.qzv')

##### Find core features of CH cheeses with alpine style:

In [35]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [style]='alpine'"\
--o-filtered-table $data_dir/feature_table_CH_alpine.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_CH_alpine.qza[0m
[0m

In [36]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_alpine.qzv

[32mSaved Visualization to: CE/core_microbiota_CH_alpine.qzv[0m
[0m

In [6]:
Visualization.load(f'{data_dir}/core_microbiota_CH_alpine.qzv')

### 4)

Filter table to have only cheeses from neighboring countries (no cheeses from Germany or Austria in our dataset):

In [8]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='France' OR [country]='Italy'"\
--o-filtered-table $data_dir/feature_table_neighbor.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_neighbor.qza[0m
[0m

##### Find core features of neighboring cheeses with natural rindtype:

In [9]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_neighbor_natural.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_neighbor_natural.qza[0m
[0m

In [10]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_natural.qzv

[32mSaved Visualization to: CE/core_microbiota_neighbor_natural.qzv[0m
[0m

In [11]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_natural.qzv')

##### Find core features of neighboring cheeses with washed rindtype:

In [12]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_neighbor_washed.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_neighbor_washed.qza[0m
[0m

In [13]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_washed.qzv

[32mSaved Visualization to: CE/core_microbiota_neighbor_washed.qzv[0m
[0m

In [14]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_washed.qzv')

##### Find core features of neighboring cheeses with alpine style:

In [15]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='alpine'"\
--o-filtered-table $data_dir/feature_table_neighbor_alpine.qza

[32mSaved FeatureTable[Frequency] to: CE/feature_table_neighbor_alpine.qza[0m
[0m

In [17]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_alpine.qzv

[32mSaved Visualization to: CE/core_microbiota_neighbor_alpine.qzv[0m
[0m

In [18]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_alpine.qzv')

### 5) Compare results of core features of CH cheeses with similar cheeses from neighboring countries

##### Cheeses with natural rindtype

Download of tsv files with core features (fraction of samples = 0.7)

In [20]:
! wget -nv -O $data_dir/core_microbiota_list_ch_natural.tsv 'https://polybox.ethz.ch/index.php/s/5ZVUmvDoy1VBTAx/download'

2022-12-12 10:56:55 URL:https://polybox.ethz.ch/index.php/s/5ZVUmvDoy1VBTAx/download [2688/2688] -> "CE/core_microbiota_list_ch_natural.tsv" [1]


In [21]:
! wget -nv -O $data_dir/core_microbiota_list_neighbor_natural.tsv 'https://polybox.ethz.ch/index.php/s/cAEL47rLr8ELoV5/download'

2022-12-12 10:57:38 URL:https://polybox.ethz.ch/index.php/s/cAEL47rLr8ELoV5/download [1254/1254] -> "CE/core_microbiota_list_neighbor_natural.tsv" [1]


Read tsv files into pandas dataframe:

In [38]:
#core features from CH cheeses with natural rindtype
df_core_ch_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_natural.tsv', sep ='\t')
df_core_ch_nat.set_index('Feature ID', inplace = True)
df_core_ch_nat

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
56e99d7158115760f6283fb65ab29bd0,8279.24,8402.58,8684.5,9125.0,10451.0,11299.64,11670.92
f50c8ae2717bb99c926c4ab1f2a6135c,2336.28,2589.26,3167.5,4071.0,5264.5,6028.34,6362.52
da95b61897d9c6cd8b79f052d26a7985,1653.6,1799.2,2132.0,2652.0,2731.0,2781.56,2803.68
2984a873cf9373de5425dd5b5b96c232,360.48,432.16,596.0,852.0,1219.0,1453.88,1556.64
555d9af32f8865d0125fd551a12bebe8,358.72,382.24,436.0,520.0,535.5,545.42,549.76
398e906d9ad1914eb268fda5c7453e09,120.92,152.14,223.5,335.0,508.0,618.72,667.16
805c1b3ec3035abbb7b9f1f7f6157e12,189.6,202.2,231.0,276.0,467.0,589.24,642.72
d8805a58ee0553d4947a5697b758f581,111.68,135.06,188.5,272.0,501.5,648.38,712.64
72497fd9ff5f5766cd19190f290898ce,160.56,169.52,190.0,222.0,305.5,358.94,382.32
f333019d07b48493533c61f9d5749496,76.48,95.66,139.5,208.0,286.0,335.92,357.76


In [37]:
df_core_nei_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_natural.tsv', sep ='\t')
df_core_nei_nat.set_index('Feature ID', inplace = True)
df_core_nei_nat

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,6.64,826.36,2701.0,6468.0,42234.5,73295.76,87862.36
805c1b3ec3035abbb7b9f1f7f6157e12,10.32,51.3,309.5,1154.0,9987.5,28792.58,99149.2
2984a873cf9373de5425dd5b5b96c232,0.0,11.18,272.0,900.0,7441.5,15498.76,31267.08
0f47f1d604a3c0c66dd7a771668df459,0.0,0.0,94.5,411.0,5131.5,17080.08,28580.32
8bd6e175eeb3db8a8390a55b78c8d176,0.0,0.0,32.5,150.0,1071.0,3957.9,9283.16
398e906d9ad1914eb268fda5c7453e09,0.16,3.0,10.5,125.0,2231.0,13013.56,39871.4
d8805a58ee0553d4947a5697b758f581,0.0,0.0,2.5,96.0,879.0,4468.8,13048.96
369232e1ac9f9983056d09b9fe866df5,0.0,4.86,18.0,87.0,396.0,780.26,1747.28
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,7.5,72.0,736.0,2659.36,3855.76
9e9ac50434879829e4bce8eeb1bc4f9c,0.0,0.0,9.0,68.0,298.0,978.32,2745.52


Compare values between the two dataframes created above:

In [39]:
joined_frame = df_core_ch_nat.join(df_core_nei_nat, on='Feature ID', how="inner")

ValueError: columns overlap but no suffix specified: Index(['2%', '9%', '25%', '50%', '75%', '91%', '98%'], dtype='object')

##### Cheeses with washed rindtype

##### Cheeses with alpine style