## Distribution of FDA in fungal orders

**Notebook summary**
This Jupyter Notebook:
- analyses and visualizes the distribution of FDA in Fungi at the order level
- is associated with Step 2 of the Pub approach: Defining the ‘working set of species’
- provides the codes for the Pub Figures 4B

**Context/Goal reminder**
We have been able to determine our working set of 853 fungal species. For each of these fungal species we are able to detremine their FDA presence/absence status. To have a better understanding of the distribution of FDA in the Fungi kingdom, we study the distribution of FDA at the fungi order level by analysing and visualizing what fraction of each order possess FDA (ie the number of species taht possess FDA in an order divided by the total number of species in this order)

**Notebook purpose**
In this notebook, we determine the FDA status of all the species in the working set and then study distribution of FDA at the fungi order level. We eventually visualize this information onto a order level phylogenetic tree.

---

### Setup path and environment

In [1]:
setwd('..')

#library(dplyr)
#library(tidyverse)


In [46]:
library(dplyr)
library(ggtree)
library(ape)
library(ggtreeExtra)


Attaching package: ‘ape’

The following object is masked from ‘package:ggtree’:

    rotate



### Analysis

In [30]:
## Import data
    # We import data for the working set and data for the extended cluster

all_set=read.csv('results/step2/uniprot_all_wset_taxo.csv')[,-1]  # working set

FDA_set=read.csv('data/step1/FDA_clust_taxo_manually_corrected.csv')[,-1] # FDA extented set
FDA_set_f=subset(FDA_set,FDA_set$Kingdom=='Fungi')


In [35]:
## Adding FDA information status to the working set
    # Any species in the extended set possess FDA, while species in the working set but not in the extented set don't possess FDA

FDA_fungi_species=data.frame('Organism_uniprot'=c(unique(as.character(FDA_set_f$Organism))),
                             'FDA'='Yes') 



In [45]:
all_set_ch<- all_set %>%
  mutate_all(~as.character(.))

FDA_fungi_species_ch<- FDA_fungi_species %>%
  mutate_all(~as.character(.))

all_set_w_FDA=left_join(all_set_ch,FDA_fungi_species_ch,by='Organism_uniprot') 

all_set_w_FDA$FDA[is.na(all_set_w_FDA$FDA)]<-'No'

head(all_set_w_FDA,15)

Organism_uniprot,n_prot,species_name,kingdom,phylum,class,order,family,genus,FDA
[Candida] intermedia,10617,[Candida] intermedia,Fungi,Ascomycota,Saccharomycetes,Saccharomycetales,Metschnikowiaceae,Candida,No
[Torrubiella] hemipterigena,11065,[Torrubiella] hemipterigena,Fungi,Ascomycota,Sordariomycetes,Hypocreales,Clavicipitaceae,Torrubiella,Yes
Aaosphaeria arxii CBS 175.79,13815,Aaosphaeria arxii CBS 175.79,Fungi,Ascomycota,Dothideomycetes,Pleosporales,,Aaosphaeria,No
Absidia glauca (Pin mould),14217,Absidia glauca,Fungi,Mucoromycota,Mucoromycetes,Mucorales,Cunninghamellaceae,Absidia,Yes
Absidia repens,14353,Absidia repens,Fungi,Mucoromycota,Mucoromycetes,Mucorales,Cunninghamellaceae,Absidia,Yes
Acaromyces ingoldii,7585,Acaromyces ingoldii,Fungi,Basidiomycota,Exobasidiomycetes,Exobasidiales,Cryptobasidiaceae,Acaromyces,No
Acidomyces richmondensis BFW,10856,Acidomyces richmondensis BFW,Fungi,Ascomycota,Dothideomycetes,Mycosphaerellales,Teratosphaeriaceae,Acidomyces,No
Agaricus bisporus var. burnettii (strain JB137-S8 / ATCC MYA-4627 / FGSC 10392) (White button mushroom),10948,Agaricus bisporus var. burnettii,Fungi,Basidiomycota,Agaricomycetes,Agaricales,Agaricaceae,Agaricus,No
Ajellomyces capsulatus (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432) (Darling's disease fungus) (Histoplasma capsulatum),9199,Ajellomyces capsulatus,Fungi,Ascomycota,Eurotiomycetes,Onygenales,Ajellomycetaceae,Ajellomyces,Yes
Ajellomyces capsulatus (strain H143) (Darling's disease fungus) (Histoplasma capsulatum),9314,Ajellomyces capsulatus,Fungi,Ascomycota,Eurotiomycetes,Onygenales,Ajellomycetaceae,Ajellomyces,Yes


In [47]:
## Calculate fraction of FDA at the order level
    # for each fungal order we calulate the order FDA fraction as the number of species with FDA in the order divided by the number of species in the order

Levels=na.omit(c(unique(all_set_w_FDA$order)))

df_order=data.frame()
for (i in 1:length(Levels)){
  dat_temp=subset(all_set_w_FDA, all_set_w_FDA$order==Levels[i])
  n=dim(dat_temp)[1]
  phy=unique(dat_temp[,'phylum'])
  FDA=sum(dat_temp$FDA=="Yes")
  
  df_temp=data.frame('Order'=Levels[i],
                     'Order_size'=n,
                     'Phylum'=phy,
                     'FDA_yes'=FDA,
                     'FDA_no'=(n-FDA),
                     'FDA_frac'=FDA/n)
  
  df_order=rbind(df_order, df_temp)
  
}

head(df_order,10)


Order,Order_size,Phylum,FDA_yes,FDA_no,FDA_frac
Saccharomycetales,18,Ascomycota,8,10,0.44444444
Hypocreales,105,Ascomycota,1,104,0.00952381
Pleosporales,46,Ascomycota,29,17,0.63043478
Mucorales,18,Mucoromycota,11,7,0.61111111
Exobasidiales,2,Basidiomycota,0,2,0.0
Mycosphaerellales,19,Ascomycota,14,5,0.73684211
Agaricales,35,Basidiomycota,26,9,0.74285714
Onygenales,41,Ascomycota,19,22,0.46341463
Blastocladiales,2,Blastocladiomycota,2,0,1.0
Helotiales,40,Ascomycota,20,20,0.5


In [54]:
## Save order information table
write.csv(all_set_w_FDA, 'results/step2/working_set_w_FDA.csv')
write.csv(df_order, 'results/step2/order_information_FDA_fraction.csv')

### Visualization - Figure 4B

In [50]:
## Import newick file of fungal order tree (from Timetree)

tree_order=read.tree('data/step2/Fungi_order_timetree.nwk')

In [52]:
## Define the set of orders present in our dataset and trim the tree

order_uni=c(unique(all_set_w_FDA$order))  # orders present in our working set

inter_order=intersect(tree_order$tip.label,order_uni) #look at intersection with tree tips

tree_order_trimmed=keep.tip(tree_order,inter_order) # keep only the orders of interest from the tree

In [53]:
## Reformat data to map onto the tree

tree_data=subset(df_order, df_order$Order%in%inter_order)

tree_data$Phylum=factor(tree_data$Phylum,levels=c('Ascomycota','Basidiomycota','Mucoromycota','Zoopagomycota',
                                                  'Blastocladiomycota','Chytridiomycota'))

colnames(tree_data)=c('tip.label',colnames(tree_data[,2:6]))

In [None]:
## Tree - Figure 4B

    # Defining color palette

accent_ordered <- c('#5088C5', '#F28360', '#F7B846', '#97CD78',
                    '#7A77AB', '#F898AE', '#3B9886', '#C85152',
                    '#73B5E3', '#BAB0A8', '#8A99AD', '#FFB984')

    # Base tree

pt=ggtree(tree_order_trimmed, branch.length="none")  %<+% tree_data  

    # Adding phylum information as colored tip
pt1=pt + geom_tippoint(aes(color=Phylum)) +
  scale_color_manual(values=c((accent_ordered))) 

    # Adding the heatmap for the FDA fraction

pt2 = pt1 + new_scale_fill() +
  geom_fruit(geom=geom_tile,
             mapping=aes(y=tip.label, fill=FDA_frac, width=2),
             color = "grey50", offset = 0.08) +
  scale_fill_viridis_c(direction=1, option='D')

    # Adding order size information colored by Phylum

pt3=pt2 + new_scale_fill()+ 
  geom_fruit(geom=geom_bar,
             mapping=aes(y=tip.label,x=Order_size, fill=Phylum),
             pwidth=0.38, 
             orientation="y", 
             stat="identity",
             offset=0.09)+
  scale_fill_manual(values=c((accent_ordered)))

pt3
