Test different functions to get the core microbiota:

In [5]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline
data_dir = 'CE'

##### Download metadata

In [9]:
#! wget -nv -O $data_dir/food-metadata.tsv 'https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download'

2022-12-15 19:52:49 URL:https://polybox.ethz.ch/index.php/s/nEd4l5CWGWGEtae/download [42810/42810] -> "CE/food-metadata.tsv" [1]


Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

#### Workflow
1) Try different parameters to find core features
2) Find core features of all cheeses in our feature table
3) Find core features of Swiss cheeses (in categories rindtype = natural, washed or style = alpine
4) Find core features of similar neighboring country cheeses.
5) Compare results of Swiss to neighboring country cheeses.
6) Find core features of different variety cheeses.

### 1) Try different parameters to find core features

I tried different values for the parameters:

#### 1. Try

Used the function with the default values:

In [10]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.5 \
--o-visualization $data_dir/core_microbiota.qzv

Usage: [94mqiime feature-table core-features[0m [OPTIONS]

  Identify "core" features, which are features observed in a user-defined
  fraction of the samples. Since the core features are a function of the
  fraction of samples that the feature must be observed in to be considered
  core, this is computed over a range of fractions defined by the
  `min_fraction`, `max_fraction`, and `steps` parameters.

[1mInputs[0m:
  [94m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                       The feature table to use in core features
                       calculations.                                [35m[required][0m
[1mParameters[0m:
  [94m--p-min-fraction[0m PROPORTION [32mRange(0.0, 1.0, inclusive_start=False)[0m
                       The minimum fraction of samples that a feature must be
                       observed in for that feature to be considered a core
                       feature.                                 [35m[default: 0.5][0m
  [94

In [3]:
Visualization.load(f'{data_dir}/core_microbiota.qzv')

#### 2. Try

Used the function with higher min-fraction:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--o-visualization $data_dir/core_microbiota_2.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_2.qzv')

#### 3. Try

Using different step value:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.8 \
--p-steps 5 \
--o-visualization $data_dir/core_microbiota_3.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_3.qzv')

#### 4. Try

Use different min-fraction:

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/dada2_table_align_filtered.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_4.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_4.qzv')

### 2) Find core features of all cheeses in our feature table

#### Download tsv file of core features of all cheeses

TSV file with feature list could be downloaded from the output above. I downloaded the file and put it on polybox. I set the treshold of fraction of samples (The fraction of the total number of samples that a feature must be observed in for that feature to be considered "core".) to 0.7.
Here we import this data from the polybox:

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_0.7.tsv 'https://polybox.ethz.ch/index.php/s/WRm86jdxvkxPOVa/download'

These are the core features of all cheeses:

In [2]:
df_core_all = pd.read_csv(f'{data_dir}/core_microbiota_list_0.7.tsv', sep ='\t')
df_core_all.set_index('Feature ID', inplace = True)
df_core_all

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,4.0,12.0,184.5,1897.5,8307.25,51957.23,88576.02
805c1b3ec3035abbb7b9f1f7f6157e12,0.0,13.0,98.5,741.0,6608.0,21851.37,102426.32
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,8.0,87.0,1019.25,6905.73,28623.58
369232e1ac9f9983056d09b9fe866df5,0.0,0.0,8.0,44.0,400.75,2877.72,12945.78
398e906d9ad1914eb268fda5c7453e09,0.0,3.0,6.0,32.0,1070.25,11938.72,47885.18


Load/show qiime artifact as pandas dataframe and afterwards add the Taxon column to the core feature table.

In [3]:
taxa = q2.Artifact.load(f'{data_dir}/taxonomy_v4.qza')
taxa = taxa.view(pd.DataFrame)

In [4]:
core_all_taxa = df_core_all.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_all_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,4.0,12.0,184.5,1897.5,8307.25,51957.23,88576.02,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
805c1b3ec3035abbb7b9f1f7f6157e12,0.0,13.0,98.5,741.0,6608.0,21851.37,102426.32,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,8.0,87.0,1019.25,6905.73,28623.58,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
369232e1ac9f9983056d09b9fe866df5,0.0,0.0,8.0,44.0,400.75,2877.72,12945.78,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__
398e906d9ad1914eb268fda5c7453e09,0.0,3.0,6.0,32.0,1070.25,11938.72,47885.18,k__Bacteria


### 3) Find core features of Swiss cheeses (in categories rindtype = natural, washed or style = alpine

Do cheeses from
Switzerland share this core microbiome with similar cheeses (e.g., same style/rind type) from neighboring
countries?

##### Find core features of CH cheeses with natural rindtype:

Result: 33 core features

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_CH_natural.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_natural.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_natural.qzv')

##### Find core features of CH cheeses with washed rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_CH_washed.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_washed.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_washed.qzv')

##### Find core features of CH cheeses with alpine style:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='Switzerland' AND [style]='alpine'"\
--o-filtered-table $data_dir/feature_table_CH_alpine.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_CH_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_CH_alpine.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_CH_alpine.qzv')

### 4) Find core features of similar neighboring country cheeses.

Filter table to have only cheeses from neighboring countries (no cheeses from Germany or Austria in our dataset):

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[country]='France' OR [country]='Italy'"\
--o-filtered-table $data_dir/feature_table_neighbor.qza

##### Find core features of neighboring cheeses with natural rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='natural'"\
--o-filtered-table $data_dir/feature_table_neighbor_natural.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_natural.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_natural.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_natural.qzv')

##### Find core features of neighboring cheeses with washed rindtype:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[rindtype]='washed'"\
--o-filtered-table $data_dir/feature_table_neighbor_washed.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_washed.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_washed.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_washed.qzv')

##### Find core features of neighboring cheeses with alpine style:

In [None]:
! qiime feature-table filter-samples \
--i-table $data_dir/feature_table_neighbor.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='alpine'"\
--o-filtered-table $data_dir/feature_table_neighbor_alpine.qza

In [None]:
! qiime feature-table core-features \
--i-table $data_dir/feature_table_neighbor_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_neighbor_alpine.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_neighbor_alpine.qzv')

### 5) Compare results of core features of CH cheeses with similar cheeses from neighboring countries

--> add column to table with taxonomy
--> get list with only feature IDs
--> use python set intersection function

##### Cheeses with natural rindtype

Download of tsv files with core features (fraction of samples = 0.7)

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_natural.tsv 'https://polybox.ethz.ch/index.php/s/5ZVUmvDoy1VBTAx/download'

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_neighbor_natural.tsv 'https://polybox.ethz.ch/index.php/s/cAEL47rLr8ELoV5/download'

Read tsv files into pandas dataframe and add column with taxon:

In [7]:
#core features from CH cheeses with natural rindtype
df_core_ch_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_natural.tsv', sep ='\t')
df_core_ch_nat.set_index('Feature ID', inplace = True)
core_ch_nat_taxa = df_core_ch_nat.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_ch_nat_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
56e99d7158115760f6283fb65ab29bd0,8279.24,8402.58,8684.5,9125.0,10451.0,11299.64,11670.92,k__Bacteria
f50c8ae2717bb99c926c4ab1f2a6135c,2336.28,2589.26,3167.5,4071.0,5264.5,6028.34,6362.52,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
da95b61897d9c6cd8b79f052d26a7985,1653.6,1799.2,2132.0,2652.0,2731.0,2781.56,2803.68,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__variabile
2984a873cf9373de5425dd5b5b96c232,360.48,432.16,596.0,852.0,1219.0,1453.88,1556.64,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium; s__
555d9af32f8865d0125fd551a12bebe8,358.72,382.24,436.0,520.0,535.5,545.42,549.76,k__Bacteria
398e906d9ad1914eb268fda5c7453e09,120.92,152.14,223.5,335.0,508.0,618.72,667.16,k__Bacteria
805c1b3ec3035abbb7b9f1f7f6157e12,189.6,202.2,231.0,276.0,467.0,589.24,642.72,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
d8805a58ee0553d4947a5697b758f581,111.68,135.06,188.5,272.0,501.5,648.38,712.64,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Oceanospirillales; f__Halomonadaceae; g__Halomonas
72497fd9ff5f5766cd19190f290898ce,160.56,169.52,190.0,222.0,305.5,358.94,382.32,k__Bacteria; p__Bacteroidetes; c__Sphingobacteriia; o__Sphingobacteriales; f__Sphingobacteriaceae; g__Sphingobacterium; s__
f333019d07b48493533c61f9d5749496,76.48,95.66,139.5,208.0,286.0,335.92,357.76,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Microbacterium; s__


In [8]:
#core features from neighboring countries with natural rindtype
df_core_nei_nat = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_natural.tsv', sep ='\t')
df_core_nei_nat.set_index('Feature ID', inplace = True)
core_nei_nat_taxa = df_core_nei_nat.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_nei_nat_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,6.64,826.36,2701.0,6468.0,42234.5,73295.76,87862.36,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
805c1b3ec3035abbb7b9f1f7f6157e12,10.32,51.3,309.5,1154.0,9987.5,28792.58,99149.2,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
2984a873cf9373de5425dd5b5b96c232,0.0,11.18,272.0,900.0,7441.5,15498.76,31267.08,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium; s__
0f47f1d604a3c0c66dd7a771668df459,0.0,0.0,94.5,411.0,5131.5,17080.08,28580.32,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium
8bd6e175eeb3db8a8390a55b78c8d176,0.0,0.0,32.5,150.0,1071.0,3957.9,9283.16,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Arthrobacter; s__
398e906d9ad1914eb268fda5c7453e09,0.16,3.0,10.5,125.0,2231.0,13013.56,39871.4,k__Bacteria
d8805a58ee0553d4947a5697b758f581,0.0,0.0,2.5,96.0,879.0,4468.8,13048.96,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Oceanospirillales; f__Halomonadaceae; g__Halomonas
369232e1ac9f9983056d09b9fe866df5,0.0,4.86,18.0,87.0,396.0,780.26,1747.28,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,7.5,72.0,736.0,2659.36,3855.76,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
9e9ac50434879829e4bce8eeb1bc4f9c,0.0,0.0,9.0,68.0,298.0,978.32,2745.52,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium; s__


Compare values between the two dataframes created above:

In [9]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_nat = list(df_core_ch_nat.index.values)
set_ch_nat = set(index_list_ch_nat)
index_list_nei_nat = list(df_core_nei_nat.index.values)
set_nei_nat = set(index_list_nei_nat)

In [10]:
#get set of Feature IDs which are the same in both sets
set_core_nat = set_ch_nat.intersection(set_nei_nat)

print(set_core_nat)

{'0f47f1d604a3c0c66dd7a771668df459', '5899b66b70d688d5cd95df5fc7a26e3a', '369232e1ac9f9983056d09b9fe866df5', '805c1b3ec3035abbb7b9f1f7f6157e12', 'f50c8ae2717bb99c926c4ab1f2a6135c', '398e906d9ad1914eb268fda5c7453e09', '2984a873cf9373de5425dd5b5b96c232', '0e0c3a6a9489f3439329d12d76275100', 'd8805a58ee0553d4947a5697b758f581', '56e99d7158115760f6283fb65ab29bd0'}


In [11]:
core_nat = pd.DataFrame(set_core_nat)
core_nat.set_index(0, inplace = True)
#core_nat = core_nat.rename(index={'Feature ID'})
core_nat_taxa = core_nat.join(taxa['Taxon'])
core_nat_taxa

Unnamed: 0_level_0,Taxon
0,Unnamed: 1_level_1
0f47f1d604a3c0c66dd7a771668df459,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium
5899b66b70d688d5cd95df5fc7a26e3a,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
369232e1ac9f9983056d09b9fe866df5,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__
805c1b3ec3035abbb7b9f1f7f6157e12,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
f50c8ae2717bb99c926c4ab1f2a6135c,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
398e906d9ad1914eb268fda5c7453e09,k__Bacteria
2984a873cf9373de5425dd5b5b96c232,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium; s__
0e0c3a6a9489f3439329d12d76275100,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Arthrobacter
d8805a58ee0553d4947a5697b758f581,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Oceanospirillales; f__Halomonadaceae; g__Halomonas
56e99d7158115760f6283fb65ab29bd0,k__Bacteria


##### Cheeses with washed rindtype

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_washed.tsv 'https://polybox.ethz.ch/index.php/s/M5WGsq8gReQGrQq/download'
! wget -nv -O $data_dir/core_microbiota_list_neighbor_washed.tsv 'https://polybox.ethz.ch/index.php/s/uO4l1YWYO91DkxH/download'

In [12]:
#core features from CH cheeses with washed rindtype
df_core_ch_was = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_washed.tsv', sep ='\t')
df_core_ch_was.set_index('Feature ID', inplace = True)
core_ch_was_taxa = df_core_ch_was.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_ch_was_taxa

In [13]:
#core features from neighboring countries with washed rindtype
df_core_nei_was = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_washed.tsv', sep ='\t')
df_core_nei_was.set_index('Feature ID', inplace = True)
core_nei_was_taxa = df_core_nei_was.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_nei_was_taxa

In [14]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_was = list(df_core_ch_was.index.values)
set_ch_was = set(index_list_ch_was)
index_list_nei_was = list(df_core_nei_was.index.values)
set_nei_was = set(index_list_nei_was)

In [15]:
#get set of Feature IDs which are the same in both sets
set_core_was = set_ch_was.intersection(set_nei_was)

print(set_core_was)

{'5899b66b70d688d5cd95df5fc7a26e3a', '805c1b3ec3035abbb7b9f1f7f6157e12', 'f50c8ae2717bb99c926c4ab1f2a6135c', '398e906d9ad1914eb268fda5c7453e09', 'da95b61897d9c6cd8b79f052d26a7985', '4db7c06da0197e12d5dd8b3dc1418e50'}


In [16]:
core_was = pd.DataFrame(set_core_was)
core_was.set_index(0, inplace = True)
#core_was = core_was.rename(index={'Feature ID'})
core_was_taxa = core_was.join(taxa['Taxon'])
core_was_taxa

Unnamed: 0_level_0,Taxon
0,Unnamed: 1_level_1
5899b66b70d688d5cd95df5fc7a26e3a,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
805c1b3ec3035abbb7b9f1f7f6157e12,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
f50c8ae2717bb99c926c4ab1f2a6135c,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
398e906d9ad1914eb268fda5c7453e09,k__Bacteria
da95b61897d9c6cd8b79f052d26a7985,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__variabile
4db7c06da0197e12d5dd8b3dc1418e50,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Oceanospirillales; f__Halomonadaceae; g__Halomonas; s__


##### Cheeses with alpine style

In [None]:
! wget -nv -O $data_dir/core_microbiota_list_ch_alpine.tsv 'https://polybox.ethz.ch/index.php/s/f8vVurBBWM740hB/download'
! wget -nv -O $data_dir/core_microbiota_list_neighbor_alpine.tsv 'https://polybox.ethz.ch/index.php/s/k4Yy6aCgH2G2gkT/download'

In [17]:
#core features from CH cheeses in alpine style
df_core_ch_alp = pd.read_csv(f'{data_dir}/core_microbiota_list_ch_alpine.tsv', sep ='\t')
df_core_ch_alp.set_index('Feature ID', inplace = True)
core_ch_alp_taxa = df_core_ch_alp.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_ch_alp_taxa

In [18]:
#core features from neighboring countries in alpine style
df_core_nei_alp = pd.read_csv(f'{data_dir}/core_microbiota_list_neighbor_alpine.tsv', sep ='\t')
df_core_nei_alp.set_index('Feature ID', inplace = True)
core_nei_alp_taxa = df_core_nei_alp.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
#core_nei_alp_taxa

In [19]:
#get list of Feature IDs from core features of CH and neighboring cheeses with natural rindtype and convert list into set
index_list_ch_alp = list(df_core_ch_alp.index.values)
set_ch_alp = set(index_list_ch_alp)
index_list_nei_alp = list(df_core_nei_alp.index.values)
set_nei_alp = set(index_list_nei_alp)

In [20]:
#get set of Feature IDs which are the same in both sets
set_core_alp = set_ch_alp.intersection(set_nei_alp)

print(set_core_alp)

{'fc51328a0e0452be580de099a5b5791a', '5899b66b70d688d5cd95df5fc7a26e3a', '016557b68d4a86357fc47eab6f903d3f', '805c1b3ec3035abbb7b9f1f7f6157e12', 'f50c8ae2717bb99c926c4ab1f2a6135c', '398e906d9ad1914eb268fda5c7453e09', '13abd204fa63efb19248b7c271448d5a', '9e9ac50434879829e4bce8eeb1bc4f9c', 'd847672aeae8e53a505ead86563586e4', 'da95b61897d9c6cd8b79f052d26a7985', 'c3e308088f68e1cabfd16c37f5a2307b', '4db7c06da0197e12d5dd8b3dc1418e50'}


In [21]:
core_alp = pd.DataFrame(set_core_alp)
core_alp.set_index(0, inplace = True)
#core_alp = core_alp.rename(index={'Feature ID'})
core_alp_taxa = core_alp.join(taxa['Taxon'])
core_alp_taxa

Unnamed: 0_level_0,Taxon
0,Unnamed: 1_level_1
fc51328a0e0452be580de099a5b5791a,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__stationis
5899b66b70d688d5cd95df5fc7a26e3a,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
016557b68d4a86357fc47eab6f903d3f,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Candidatus Rhodoluna; s__
805c1b3ec3035abbb7b9f1f7f6157e12,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
f50c8ae2717bb99c926c4ab1f2a6135c,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
398e906d9ad1914eb268fda5c7453e09,k__Bacteria
13abd204fa63efb19248b7c271448d5a,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium; s__
9e9ac50434879829e4bce8eeb1bc4f9c,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium; s__
d847672aeae8e53a505ead86563586e4,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Aerococcaceae; g__Facklamia; s__
da95b61897d9c6cd8b79f052d26a7985,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__variabile


### 6) Find core features of different variety cheeses.

##### Core features of alpine style cheeses

In [22]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='alpine'"\
--o-filtered-table $data_dir/feature_table_alpine.qza

^C

Aborted!
[0m

In [23]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_alpine.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_alpine.qzv

^C

Aborted!
[0m

In [24]:
Visualization.load(f'{data_dir}/core_microbiota_alpine.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_alpine.tsv 'https://polybox.ethz.ch/index.php/s/uOeN2PpeHLYPtMs/download'

In [25]:
#read tsv file into dataframe 
#add taxa column
core_alpine = pd.read_csv(f'{data_dir}/core_microbiota_list_alpine.tsv', sep ='\t')
core_alpine.set_index('Feature ID', inplace = True)
core_alpine_taxa = core_alpine.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_alpine_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,189.6,281.96,901.0,2936.0,10435.0,65465.76,119785.0,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
da95b61897d9c6cd8b79f052d26a7985,0.0,0.0,53.0,1867.0,4147.0,8538.08,38298.96,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__variabile
805c1b3ec3035abbb7b9f1f7f6157e12,48.08,179.32,385.0,1094.0,8049.0,31370.16,114534.68,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
fc51328a0e0452be580de099a5b5791a,0.0,0.0,224.0,677.0,1270.0,4813.88,8807.68,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Corynebacteriaceae; g__Corynebacterium; s__stationis
0f47f1d604a3c0c66dd7a771668df459,0.0,0.0,0.0,357.0,1277.0,2341.84,8568.76,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium
9e9ac50434879829e4bce8eeb1bc4f9c,0.0,5.24,88.0,338.0,1220.0,2137.84,4157.64,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium; s__
016557b68d4a86357fc47eab6f903d3f,0.0,0.0,0.0,83.0,175.0,890.04,1938.44,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Candidatus Rhodoluna; s__
5899b66b70d688d5cd95df5fc7a26e3a,0.0,0.0,5.0,45.0,171.0,4338.64,8787.72,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
d847672aeae8e53a505ead86563586e4,0.0,0.0,0.0,35.0,207.0,550.2,1314.72,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Aerococcaceae; g__Facklamia; s__
398e906d9ad1914eb268fda5c7453e09,0.0,3.32,5.0,12.0,37.0,571.4,60984.04,k__Bacteria


##### Core features of blue style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='blue'"\
--o-filtered-table $data_dir/feature_table_blue.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_blue.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_blue.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_blue.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_blue.tsv 'https://polybox.ethz.ch/index.php/s/Slu48m7FhULUbxE/download'

In [26]:
#read tsv file into dataframe 
#add taxa column
core_blue = pd.read_csv(f'{data_dir}/core_microbiota_list_blue.tsv', sep ='\t')
core_blue.set_index('Feature ID', inplace = True)
core_blue_taxa = core_blue.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_blue_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
f50c8ae2717bb99c926c4ab1f2a6135c,560.44,945.45,2722.5,6643.5,10932.5,99994.22,117509.8,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
805c1b3ec3035abbb7b9f1f7f6157e12,16.22,28.66,287.5,4320.0,14681.25,23831.13,28975.36,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
0f47f1d604a3c0c66dd7a771668df459,0.0,0.0,24.75,279.5,4990.0,7531.68,9041.68,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium
2984a873cf9373de5425dd5b5b96c232,0.0,2.1,72.75,169.5,338.75,556.8,828.76,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium; s__
5899b66b70d688d5cd95df5fc7a26e3a,0.0,6.0,11.25,143.0,2366.5,7837.73,9139.06,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
398e906d9ad1914eb268fda5c7453e09,1.38,3.14,6.0,47.0,436.25,6350.33,25608.46,k__Bacteria
369232e1ac9f9983056d09b9fe866df5,0.0,4.07,8.0,21.5,235.25,4965.89,7806.46,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__


##### Core features of washed_bloomy style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='washed_bloomy'"\
--o-filtered-table $data_dir/feature_table_wb.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_wb.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_wb.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_wb.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_wb.tsv 'https://polybox.ethz.ch/index.php/s/WFKVTDYyNiQAE56/download'

In [27]:
#read tsv file into dataframe 
#add taxa column
core_wb = pd.read_csv(f'{data_dir}/core_microbiota_list_wb.tsv', sep ='\t')
core_wb.set_index('Feature ID', inplace = True)
core_wb_taxa = core_wb.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_wb_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
d2e2bf1c430079549a91d2d13f9a1907,1367.34,1596.03,1862.75,9560.0,19029.5,21395.64,24062.92,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Vibrionales; f__Pseudoalteromonadaceae; g__Pseudoalteromonas
4db7c06da0197e12d5dd8b3dc1418e50,1.76,7.92,41.0,2661.5,6721.25,9205.47,9549.66,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Oceanospirillales; f__Halomonadaceae; g__Halomonas; s__
5899b66b70d688d5cd95df5fc7a26e3a,326.38,387.21,483.0,2237.0,3402.25,5735.46,6078.88,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Psychrobacter
0f9e0ccee567ce67f05c107a5299fc3c,413.0,528.5,994.0,1709.0,4557.0,6698.22,7716.16,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Vibrionales; f__Vibrionaceae; g__Vibrio
8bd6e175eeb3db8a8390a55b78c8d176,41.7,68.65,253.25,578.0,1139.25,2530.63,4119.14,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Arthrobacter; s__
683df7b9d7a82f75614613838b22142c,0.0,0.0,73.5,238.5,921.75,2949.25,3199.5,k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Vibrionales; f__Pseudoalteromonadaceae; g__Pseudoalteromonas
f50c8ae2717bb99c926c4ab1f2a6135c,0.66,2.97,6.0,192.5,1098.25,2357.92,2736.76,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
398e906d9ad1914eb268fda5c7453e09,0.88,3.96,8.0,27.0,104.5,148.15,775.7,k__Bacteria
369232e1ac9f9983056d09b9fe866df5,0.0,0.0,7.75,16.0,239.75,631.49,669.22,k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Lactococcus; s__


##### Core features of clothbound style cheeses

In [None]:
#filter feature table
! qiime feature-table filter-samples \
--i-table $data_dir/dada2_table_align_filtered.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--p-where "[style]='clothbound'"\
--o-filtered-table $data_dir/feature_table_cloth.qza

In [None]:
#find core features from feature table
! qiime feature-table core-features \
--i-table $data_dir/feature_table_cloth.qza \
--p-min-fraction 0.7 \
--p-steps 10 \
--o-visualization $data_dir/core_microbiota_cloth.qzv

In [None]:
Visualization.load(f'{data_dir}/core_microbiota_cloth.qzv')

In [None]:
#import tsv file (downloaded from output above, uploaded on polybox and now downloaded to put into this notebook)
! wget -nv -O $data_dir/core_microbiota_list_cloth.tsv 'https://polybox.ethz.ch/index.php/s/mCGw1VTuQKNmw5c/download'

In [28]:
#read tsv file into dataframe 
#add taxa column
core_cloth = pd.read_csv(f'{data_dir}/core_microbiota_list_cloth.tsv', sep ='\t')
core_cloth.set_index('Feature ID', inplace = True)
core_cloth_taxa = core_cloth.join(taxa['Taxon'])
pd.set_option('max_colwidth', 150)
core_cloth_taxa

Unnamed: 0_level_0,2%,9%,25%,50%,75%,91%,98%,Taxon
Feature ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
805c1b3ec3035abbb7b9f1f7f6157e12,674.2,758.9,1814.5,9852.0,19060.0,23366.9,35133.2,k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus
f50c8ae2717bb99c926c4ab1f2a6135c,448.6,590.7,1457.0,4700.0,6542.5,15809.8,17705.4,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
0f47f1d604a3c0c66dd7a771668df459,14.0,28.0,72.0,747.0,1736.5,3151.5,5003.0,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dermabacteraceae; g__Brachybacterium
6b9b4e06d4dddd92233226e1b5fc0c38,0.0,0.0,72.5,126.0,160.0,487.1,1705.8,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Brevibacteriaceae; g__Brevibacterium
398e906d9ad1914eb268fda5c7453e09,3.6,16.2,29.5,60.0,2495.0,7315.7,7586.6,k__Bacteria
cfc4179935acb998e08c0843310c3b4a,0.0,0.0,12.5,53.0,836.0,1717.6,1861.8,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Kocuria; s__koreensis
e0e8b5701c2ae9e940870ac421500f88,0.0,0.0,21.5,45.0,82.0,115.3,117.4,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Dietziaceae; g__Dietzia; s__timorensis
928ffcfee48a63c0d207e4117e79494b,0.0,0.0,5.0,13.0,245.0,3280.3,6320.4,k__Bacteria
