# Bioactive molecular networking notebook v1.1 (MZmine)

Website: https://github.com/DorresteinLaboratory/Bioactive_Molecular_Networks

The bioactive molecular network workflow integrates MS/MS molecular networking and bioassay-guided fractionation into the concept of bioactive molecular networking. The workflow relies on open bioinformatic tools, such MZmine2 [http://mzmine.github.io/] or Optimus (using OpenMS) [https://github.com/MolecularCartography/Optimus], a Jupyter notebook, and the GNPS web-platform (http://gnps.ucsd.edu). 

The code is released as a Jupyter notebook for easiness and reproducibility. The Jupyter notebook has been prepared by Dr. Ricardo Silva (UCSD).

#### Citation: 
Bioactive molecular networking: Nothias, L.-F.; Nothias-Esposito, M.; da Silva, R.; Wang, M.; Protsyuk, I.; Zhang, Z.; Sarvepalli, A.; Leyssen, P.; Touboul, D.; Costa, J.; Paolini J., Alexandrov T., Litaudon M., Dorrestein, P. Bioactivity-Based Molecular Networking for the Discovery of Drug Leads in Natural Product Bioassay-Guided Fractionation. J. Nat. Prod. 2018. 
#### Manuscript: 
(Open access) https://pubs.acs.org/doi/10.1021/acs.jnatprod.7b00737

### Instructions to run the notebook.

-> This notebook is suitable for MZmine feature table.

-> Make sure R is installed in your environment.

-> Upload your files to the jupyter notebook folder.

-> Update the filename if needed (in red) and indicated in the cell comments

-> Run all cells by clicking on: Cell / Run All Below.

-> Get the output file.

## Lets run the bioactive molecular networking notebook !

In [1]:
# Load and inspect the MZmine feature table with bioassay results
# Change the name in the code below if needed (.CSV file in red)
# NB: Make sure to add the value of bioactivity in the second row.
in_tab <- read.csv("Alnus_quant.csv", stringsAsFactor=FALSE, check.names=FALSE)
dim(in_tab)
in_tab[1:5,]

row ID,row m/z,row retention time,Af-T,Af-L,Af-F,Af-B,Ah-L,Ahv-T,Ah-T,Ah-F,Ahv-F,Ahv-T.1,Ahv-L,Aj-B,Aj-L,Aj-T,Aj-F
BioactivityGlucosidase,,,7.47,12.29,6.8,8.48,100.0,100.0,69.55,100.0,7.33,100.0,29.84,100.0,23.36,100.0,100.0
1,521.2015,5.8094,94356.47,1309.903,2473.208,165738.53,5294.4402,187542.992,48039.91,1133.1423,23826.384,62358.32,6208.37,55591.921,12169.4194,35885.2618,443.9682
2,313.1421,7.1964,56500.42,3480.599,3442.195,156224.91,9836.5262,28637.062,30208.92,962.3373,10211.38,55928.92,4905.908,22283.084,7707.6103,21243.2224,2466.4783
3,191.0525,0.8311,84356.67,89122.171,11819.555,74792.18,164211.6796,11677.336,34998.1,593.1454,148587.824,90628.91,128688.165,11425.459,116888.0003,92118.6479,173002.3475
4,485.3616,20.9064,87234.57,945.145,21064.144,32355.47,453.6797,2624.752,42.9,56.42,423.395,40.75,1862.751,2637.282,829.6499,736.3936,2549.0584


In [2]:
# Transpose and format column and row labels to follow the workflow below
# Change the the 'Bioactivity' variable in red below to the column row index name 
# corresponding to the bioassay results
tab <- t(in_tab[,-c(1:3)])
tab <- data.frame(Sample_name=sub("\\.mzXML Peak area", "", rownames(tab)), tab)
colnames(tab)[-1] <- c('Bioactivity', apply(in_tab[,2:3][-1,], 1, paste, collapse='_'))
rownames(tab) <- NULL

In [3]:
# Display the table 
tab[1:5,1:5]
dim(tab)

Sample_name,Bioactivity,521.2015_5.8094,313.1421_7.1964,191.0525_0.8311
Af-T,7.47,94356.466,56500.422,84356.67
Af-L,12.29,1309.903,3480.599,89122.17
Af-F,6.8,2473.208,3442.195,11819.55
Af-B,8.48,165738.529,156224.914,74792.18
Ah-L,100.0,5294.44,9836.526,164211.68


In [4]:
# Take out blank rows in the table
if(any(is.na(tab[,2]))) tab <- tab[!is.na(tab[,2]),]
dim(tab)

In [5]:
# Add 1 to all to help scaling feature intensities and Normalize the features by TIC  
tab2 <- tab
tab2[,-c(1:2)] <- t(apply(tab2[,-c(1:2)], 1, function(x) (x+1)/sum(x+1, na.rm = T)))

In [6]:
# Calculate the correlation coefficient between a single feature and the bioactivity.
# Scale should help correlation - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1534033/
cor.test(scale(tab2[,2])[,1], scale(tab2[,3])[,1])[c("estimate", "p.value")]

In [7]:
# How to do for all features
ct <- t(sapply(3:ncol(tab2), function(x) unlist(cor.test(scale(tab2[,2])[,1], scale(tab2[,x])[,1])[c("estimate", "p.value")])))

In [8]:
# Show the dimensions of the features_quantificaton_matrix.csv
dim(tab2)
dim(ct)


In [9]:
# Create an output table with correlation coefficient value and p-value for every features

ct <- rbind(c("cor"," p_value"), c(0,0), ct)

tab3 <- rbind(t(ct),  as.matrix(tab2))
rownames(tab3) <- NULL
tab3[1:5, 1:5]
write.csv(tab3, "features_quantification_matrix_edited_with_correlation.csv", row.names=FALSE)

Sample_name,Bioactivity,521.2015_5.8094,313.1421_7.1964,191.0525_0.8311
cor,0.0,-0.0990331837715462,-0.277778860100569,-0.227760719474164
p_value,0.0,0.725480406866948,0.316145713044248,0.41427118610819
Af-T,7.47,0.0201424761,0.0120613514,0.01800782
Af-L,12.29,0.000323332,0.0008587308,0.0219820852
Af-F,6.8,0.0012981543,0.0018065573,0.0062019455


In [10]:
# Tranpose the table for molecular networking mapping in Cytoscape
new = t(tab3)
colnames(new) = new[1,]
new = new[-1,]
new = cbind(0:(nrow(new)-1), rownames(new), new)
rownames(new) <- NULL
colnames(new)[1:2] <- c("shared name", "IDs")
new[1,1] <- ""
new[1:5,1:5]
write.csv(new, "features_quantification_matrix_transposed_with_correlation.csv", row.names=FALSE)

shared name,IDs,cor,p_value,Af-T
,Bioactivity,0.0,0.0,7.47
1.0,521.2015_5.8094,-0.0990331837715462,0.725480406866948,0.0201424761
2.0,313.1421_7.1964,-0.277778860100569,0.316145713044248,0.0120613514
3.0,191.0525_0.8311,-0.227760719474164,0.41427118610819,0.01800782
4.0,485.3616_20.9064,-0.548205564693836,0.0343590235349789,0.01862216


In [11]:
# Get the significant correlation coefficients for both cases (>0.05)
which(as.numeric(ct[-c(1,2),2])<0.05)

In [12]:
# Show the features ID with correlation coefficient
nm <- colnames(tab)
nm[-c(1:2)][as.numeric(ct[-c(1,2),2])<0.05]

In [13]:
# Call the ID
which(p.adjust(as.numeric(ct[-c(1:2),2]), method = "bonferroni")<0.05)

In [14]:
# Features passing Bonferronii method
nm[-c(1:2)][which(p.adjust(as.numeric(ct[-c(1:2),2]), method = "bonferroni")<0.05)]


In [15]:
# Prepare the new table
new <- cbind(new[,1:5], c(0, p.adjust(as.numeric(ct[-c(1:2),2]), method = "bonferroni")), new[,-c(1:5)])
colnames(new)[6] <- "p_value_corrected"
new[,1:10]

shared name,IDs,cor,p_value,Af-T,p_value_corrected,Af-L,Af-F,Af-B,Ah-L
,Bioactivity,0,0,7.47,0,12.29,6.80,8.48,100.00
1,521.2015_5.8094,-0.0990331837715462,0.725480406866948,0.0201424761,1,0.0003233320,0.0012981543,0.0400581859,0.0013620572
2,313.1421_7.1964,-0.277778860100569,0.316145713044248,0.0120613514,1,0.0008587308,0.0018065573,0.0377588059,0.0025303418
3,191.0525_0.8311,-0.227760719474164,0.41427118610819,0.0180078200,1,0.0219820852,0.0062019455,0.0180770332,0.0422376715
4,485.3616_20.9064,-0.548205564693836,0.0343590235349789,1.862216e-02,1,2.333651e-04,1.105235e-02,7.820352e-03,1.169496e-04
5,377.0837_0.8187,-0.277674101561888,0.316335831932352,0.0120465426,1,0.0051022300,0.0033213269,0.0042066816,0.0050600738
6,183.0267_3.2085,-0.64698928161494,0.00913411859020061,0.0067597620,1,0.0166951802,0.0175223294,0.0130912935,0.0116131503
7,533.3835_19.8615,-0.622613496677409,0.0131752063125508,1.461880e-02,1,5.254344e-03,5.232765e-03,6.335313e-03,1.253914e-05
8,951.4056_5.8065,-0.169975615104143,0.544757022540926,8.195832e-03,1,2.466484e-07,5.246747e-07,2.552164e-02,7.260615e-06
9,475.1955_5.8095,-0.123795786444844,0.660254693327031,9.481888e-03,1,1.040453e-04,3.899866e-04,1.844200e-02,4.044024e-04


In [16]:
# Write the final table with corrected p_value
write.csv(new, "features_quantification_matrix_transposed_with_significant_correlation_pvalue_corrected.csv", row.names=FALSE)