## Functional prediction using q2-picrust

### Load Modules

Install biom

In [31]:
! conda install -c bioconda biom-format

Retrieving notices: ...working... done
Channels:
 - bioconda
 - defaults
 - conda-forge
 - https://packages.qiime2.org/qiime2/2024.5/amplicon/released
 - https://packages.qiime2.org/qiime2/2024.5/metagenome/staged
 - picrust
Platform: linux-64
Collecting package metadata (repodata.json): done
failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package spades-3.15.2-h95f258a_1 requires sysroot_linux-64 2.17.*, but none of the providers can be installed
  - package cutadapt-4.9-py39hff71179_1 requires python_abi 3.9.* *_cp39, but none of the providers can be installed
  - package sepp-4.4.0-py39_0 requires python_abi 3.9.* *_cp39, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ [32mbowtie2 2.5.4** [0m is requested and can be installed;
├─ [32mcutadapt[0m is installable with the potential options
│  ├─ [32mcutadapt [3.3|3.4|...|4.9][0m would require
│  │  └─ [32mpython_abi 3.9.* *_c

In [2]:
import os
import sys
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Define the data directory
data_dir = './data'

#### 1. Download FUNGuild  
Clone the repository

In [6]:
! git clone https://github.com/UMNFuN/FUNGuild.git

fatal: destination path 'FUNGuild' already exists and is not an empty directory.


#### 2. Export taxonomy as a tsv file so we can use it for FUNGuild  
Clone the repository

In [4]:
! qiime tools export \
    --input-path ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza \
    --output-path ./data/taxonomy_classification/exported-taxonomy

[32mExported ./data/taxonomy_classification/taxonomy_unite_dynamic_s_all.qza as TSVTaxonomyDirectoryFormat to directory ./data/taxonomy_classification/exported-taxonomy[0m
[0m

#### 3. Format input data
Fix the column name for the taxonomy.tsv because FUNGuild can only use the column name "taxonomy" and not "Taxon".

In [12]:
# Load the file
df = pd.read_csv('./data/taxonomy_classification/exported-taxonomy/taxonomy.tsv', sep='\t')

# Rename the column
df.rename(columns={'Taxon': 'taxonomy'}, inplace=True)

# Save the updated file
df.to_csv('./data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.tsv', sep='\t', index=False)

#### 4. Run FUNGuild

Below is the code for the FUNGuild.  
otu: path to the taxonomy file  
db: specifying that fungal database is the one that we will use  
m: output a table with only matched (function-assigned) OTUs  
u: output a table with only unmatched OTUs

In [14]:
! python ./FUNGuild/Guilds_v1.1.py \
    -otu ./data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.tsv \
    -db fungi \
    -m \
    -u

FunGuild v1.1 Beta
Connecting with FUNGuild database ...

Reading in the OTU table: './data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.tsv'

Searching the FUNGuild database...
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%

Found 1091 matching taxonomy records in the database.
Dereplicating and sorting the result...
FunGuild tried to assign function to 1021 OTUs in './data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.tsv'.
FUNGuild made assignments on 693 OTUs.
Result saved to './data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.guilds.txt'

Additional output:
FUNGuild made assignments on 693 OTUs, these have been saved to ./data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.guilds_matched.txt.
328 OTUs were unassigned, these are saved to ./data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.guilds_unmatched.txt.

Total calculating time: 5.96 seconds.


#### The output has several columns:  
Taxon: Original taxonomic assignment.  
Trophic Mode: Ecological role (e.g., saprotroph, pathogen).  
Guild: Specific guild (e.g., plant pathogen, ectomycorrhizal).  
Confidence: Confidence level of the assignment.

#### 5. Look at the created data

In [25]:
df_matched_guilds = pd.read_csv('./data/taxonomy_classification/exported-taxonomy/taxonomy_fixed.guilds_matched.txt', sep='\t')
print(df_matched_guilds['Guild'].value_counts())

Guild
Undefined Saprotroph                                                                        230
|Undefined Saprotroph|                                                                       95
Animal Pathogen-Plant Pathogen-|Undefined Saprotroph|                                        60
Animal Pathogen-Endophyte-Endosymbiont-Epiphyte-|Undefined Saprotroph|                       56
Animal Parasite-Animal Pathogen-Animal Symbiotroph-Plant Pathogen-|Undefined Saprotroph|     42
                                                                                           ... 
Ectomycorrhizal-|Undefined Saprotroph|-Wood Saprotroph                                        1
Endophyte-Nematophagous-Plant Pathogen-|Wood Saprotroph|                                      1
Epiphyte-|Undefined Saprotroph|                                                               1
|Plant Saprotroph|-Undefined Saprotroph-Wood Saprotroph                                       1
|Dung Saprotroph|-Undefined Saprot

In [27]:
#filter by confidence level 
df_matched_guilds[df_matched_guilds['Confidence'] > 0.7]

Unnamed: 0,Feature ID,taxonomy,Confidence,Taxon,Taxon Level,Trophic Mode,Guild,Growth Morphology,Trait,Confidence Ranking,Notes,Citation/Source
0,004c991a685b11054001b8c7d0a68543,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.954798,Saccharomycetales,7,Saprotroph,Undefined Saprotroph,Yeast,,Possible,,"b'Sterkenburg E, et al. 2015. New Phytologist ..."
1,0113430ae816d67262d3c86660ecd51e,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.802044,Saccharomycetales,7,Saprotroph,Undefined Saprotroph,Yeast,,Possible,,"b'Sterkenburg E, et al. 2015. New Phytologist ..."
2,01202eb8b50e4b6ac670afc696cf804c,k__Fungi;p__Ascomycota;c__Eurotiomycetes;o__Eu...,0.999728,Aspergillus,13,Pathotroph-Saprotroph-Symbiotroph,Animal Pathogen-Endophyte-Plant Saprotroph-|Un...,Microfungus,,Probable,As foliar_endophyte (Põlme et al. 2020); Decay...,"b'Seehann G, et al. 1975. List of Fungi in Sof..."
3,014a604d3dadcb95ddc73b1d6c642b45,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.900741,Saccharomycetales,7,Saprotroph,Undefined Saprotroph,Yeast,,Possible,,"b'Sterkenburg E, et al. 2015. New Phytologist ..."
4,01ab17a41e837c842a5a232385a99169,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.956831,Candida,13,Pathotroph-Saprotroph-Symbiotroph,Animal Pathogen-Endophyte-Endosymbiont-Epiphyt...,Dimorphic Yeast-Yeast,,Probable,As nectar-tap_saprotroph (Põlme et al. 2020); ...,"b'Manolakaki D, et al. 2010. Virulence. 1: 367..."
...,...,...,...,...,...,...,...,...,...,...,...,...
688,ff026a993319972d8bec281f38aeacc3,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.999925,Candida,13,Pathotroph-Saprotroph-Symbiotroph,Animal Pathogen-Endophyte-Endosymbiont-Epiphyt...,Dimorphic Yeast-Yeast,,Probable,As nectar-tap_saprotroph (Põlme et al. 2020); ...,"b'Manolakaki D, et al. 2010. Virulence. 1: 367..."
689,ff8c880f0650931cb829568034d00a78,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.992525,Meyerozyma,13,Saprotroph-Symbiotroph,Endophyte-|Epiphyte|-Undefined Saprotroph,Yeast,,Probable,As foliar_endophyte (Põlme et al. 2020); As ne...,"b'P\xc3\xb5lme S, et al. 2020. Fungal Diversit..."
690,ffafb7e58eb5ffd894cf6b5a4610dc89,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.720555,Geotrichum,13,Pathotroph-Saprotroph,Animal Pathogen-Plant Pathogen-|Undefined Sapr...,Microfungus,,Probable,As nectar-tap_saprotroph (Põlme et al. 2020); ...,"b'Thornton CR, et al. 2010. International Jour..."
691,ffd273afcb47f6101f2f9c7a03915a3c,k__Fungi;p__Ascomycota;c__Saccharomycetes;o__S...,0.964271,Candida,13,Pathotroph-Saprotroph-Symbiotroph,Animal Pathogen-Endophyte-Endosymbiont-Epiphyt...,Dimorphic Yeast-Yeast,,Probable,As nectar-tap_saprotroph (Põlme et al. 2020); ...,"b'Manolakaki D, et al. 2010. Virulence. 1: 367..."


#### 6. Prepare the data for beta diversity analysis  
To be able to use this data for qqime 2, we need to fix the format:  
Rows represent samples (based on the Feature ID).  
Columns represent guilds.  
Values represent the abundance of each guild in a sample.

In [29]:
# Pivot the table to create a feature table
feature_table = df_matched_guilds.pivot_table(index='Feature ID', columns='Guild', values='Confidence', aggfunc='sum').fillna(0)

# Save the feature table
feature_table.to_csv('./data/beta_diversity_feature_table.tsv', sep='\t')

#### 7. Import data to QQIME 2

In [39]:
#use biom to convert the table into soemthing QQIME2 can work with
! biom convert \
  -i ./data/beta_diversity_feature_table.tsv \
  -o ./data/beta_diversity_feature_table.biom \
  --table-type="OTU table" \
  --to-hdf5

In [34]:
! qiime tools import \
  --type 'FeatureTable[Frequency]' \
  --input-path ./data/beta_diversity_feature_table.biom \
  --output-path ./data/beta_diversity_feature_table.qza

[32mImported ./data/beta_diversity_feature_table.biom as BIOMV210DirFmt to ./data/beta_diversity_feature_table.qza[0m
[0m

#### 8. Perform beta diversity analysis
##### 8.1. Generate distance matrix

In [35]:
! qiime diversity beta \
    --i-table ./data/beta_diversity_feature_table.qza \
    --p-metric braycurtis \
    --o-distance-matrix ./data/braycurtis_distance_matrix.qza

[32mSaved DistanceMatrix to: ./data/braycurtis_distance_matrix.qza[0m
[0m

##### 8.2. Perform PCoA

In [36]:
! qiime diversity pcoa \
    --i-distance-matrix ./data/braycurtis_distance_matrix.qza \
    --o-pcoa ./data/braycurtis_pcoa.qza

[32mSaved PCoAResults to: ./data/braycurtis_pcoa.qza[0m
[0m

In [None]:
Visualization.load("./data/DA/ancombc_country_da_barplot.qzv")

##### 8.3. Visualize Beta Diversity

In [42]:
! qiime tools export \
    --input-path ./data/filtered-feature-table.qza \
    --output-path ./data/exported-feature-table
#it is exported as a .biom so we need to convert it to tsv so we can work on it

[32mExported ./data/filtered-feature-table.qza as BIOMV210DirFmt to directory ./data/exported-feature-table[0m
[0m

In [45]:
! biom convert \
  -i ./data/exported-feature-table/feature-table.biom \
  -o ./data/exported-feature-table/feature-table.tsv \
  --to-tsv

In [46]:
#look at the table that we made so that we can see the IDs and the features 
! head ./data/exported-feature-table/feature-table.tsv

# Constructed from biom file
#OTU ID	ERR5327198	ERR5327199	ERR5327266	ERR5327282	ERR5327284	ERR5327285	ERR5327287	ERR5327288	ERR5327289	ERR5327300	ERR5327303	ERR5327305	ERR5327308	ERR5327311	ERR5327313	ERR5327314	ERR5327316	ERR5327317	ERR5327318	ERR5327322	ERR5327323	ERR5327325	ERR5327326	ERR5327327	ERR5327329	ERR5327332	ERR5327335	ERR5327338	ERR5327340	ERR5327343	ERR5327344	ERR5327346	ERR5327348	ERR5327349	ERR5327351	ERR5327352	ERR5327353	ERR5327354	ERR5327355	ERR5327356	ERR5327360	ERR5327362	ERR5327363	ERR5327364	ERR5327367	ERR5327379	ERR5327387	ERR5327388	ERR5327394	ERR5327395	ERR5327396	ERR5327401	ERR5327402	ERR5327404	ERR5327405	ERR5327406	ERR5327407	ERR5327408	ERR5327409	ERR5327410	ERR5327412	ERR5327414	ERR5327415	ERR5327416	ERR5327418	ERR5327419	ERR5327421	ERR5327426	ERR5327427	ERR5327428	ERR5327431	ERR5327432	ERR5327433	ERR5327434	ERR5327435	ERR5327439	ERR5327442	ERR5327443	ERR5327444	ERR5327445	ERR5327447	ERR5327450	ERR5327452	ERR5327456	ERR5327464	ERR5327465	ERR5327470	ERR532

##### 8.4. Prepare the Table for QIIME 2  
convert .biom format back to BIOM format and import it into QIIME 2

In [47]:
! biom convert \
  -i ./data/exported-feature-table/feature-table.tsv \
  -o ./data/exported-feature-table/feature-table.biom \
  --table-type="OTU table" \
  --to-hdf5

Import it into QIIME 2

In [48]:
! qiime tools import \
  --type 'FeatureTable[Frequency]' \
  --input-path ./data/exported-feature-table/feature-table.biom \
  --output-path ./data/feature-table.qza

[32mImported ./data/exported-feature-table/feature-table.biom as BIOMV210DirFmt to ./data/feature-table.qza[0m
[0m

##### 8.5. Compute the distance matrix

In [49]:
! qiime diversity beta \
    --i-table ./data/feature-table.qza \
    --p-metric braycurtis \
    --o-distance-matrix ./data/braycurtis_distance_matrix.qza

[32mSaved DistanceMatrix to: ./data/braycurtis_distance_matrix.qza[0m
[0m

Generate PCoA plots:

In [50]:
! qiime diversity pcoa \
    --i-distance-matrix ./data/braycurtis_distance_matrix.qza \
    --o-pcoa ./data/braycurtis_pcoa.qza


[32mSaved PCoAResults to: ./data/braycurtis_pcoa.qza[0m
[0m

Visualize with Emperor:

In [51]:
! qiime emperor plot \
    --i-pcoa ./data/braycurtis_pcoa.qza \
    --m-metadata-file ./data/metadata/fungut_metadata_processed.tsv \
    --o-visualization ./data/braycurtis_emperor.qzv

[32mSaved Visualization to: ./data/braycurtis_emperor.qzv[0m
[0m

View Locally in Jupyter

In [53]:
! qiime tools export \
    --input-path ./data/braycurtis_emperor.qzv \
    --output-path ./data/exported-emperor

[32mExported ./data/braycurtis_emperor.qzv as Visualization to directory ./data/exported-emperor[0m
