# MicrobiomeAnalystR [optional]

MicrobiomeAnalystR is an R-based alternative to running Qiime2 and PICRUSt2 from the command line. It is an R wrapper for the GUI [MicrobiomeAnalyst](https://www.microbiomeanalyst.ca/). MicrobiomeAnalystR is still in the beta stage of development, so we present this option primarily as a means of seeing how the pipeline might work in an R environment rather than as a best practice for your own research data.

The workflow below is comprised of four modules. The first is the Marker Data Profiling (MDP) module that is designed for analysis of 16S rRNA marker gene survey data. The second is the Shotgun Data Profiling (SDP) module that contains functions for analyzing metagenomics or metatranscriptomics data. The third module, the Taxon Set Enrichment Analysis (TSEA), is designed to test whether there are biologically or ecologically meaningful patterns from a given list of taxa of interest. Finally, the Projection with Public Data (PPD) module allows users to visually compare their data with our collection of curated public datasets for novel patterns or biological insights.

<div class="alert alert-block alert-danger">
    <i class="fa fa-exclamation-circle" aria-hidden="true"></i>
    <b>Alert: </b> MicrobiomeAnalystR you guessed it is a R package! To run this package we will need to <b>switch to the R kernel</b> the same way we switched it from Python 3 to qiime2-2022.2.
</div>

We will be running the MDP, SDP, and PPD modules but first we will need to load in our input data from this modules bucket via systems() command which will allow us to run bash commands in R. Then we can create our output directory. 

The reason we are using different input data for this package is because the inputs differ in format compared to Qiime2. For more information visit https://www.microbiomeanalyst.ca/MicrobiomeAnalyst/docs/DataFormat.xhtml

In [None]:
#download input files from bucket
system("gsutil -m cp -r gs://nigms-sandbox/nosi-usd-biofilms/MicrobiomeAR_Inputs 4_BioMarker_Discovery/", intern = TRUE)
#Create our output directory
dir.create("./3_Microbiome_Analysis/MicrobiomeAR_Outputs")

For this analysis our output directory will work as our working directory to store all data files from the MDP analysis. We will also create some additional directories that will hold images and figures from each analysis.

In [None]:
#set working directory
setwd("/home/jupyter/MetagenomicsUSD/4_BioMarker_Discovery/MicrobiomeAR_Inputs/")

In [None]:
#Creating output directories for images and graphs
dir.create("./mdp")
dir.create("./spd")
dir.create("./ppd")

In [None]:
#load in the packages
library(MicrobiomeAnalystR)
library(dplyr)

#### Setting up MicrobiomeAnalyst R MDP module

In [None]:
processMdp<- function()
{
  mbSet<-Init.mbSetObj()
  mbSet<-SetModuleType(mbSet, "mdp")
  
  
  taxa_type<-"Greengenes" 
  format<- "text"
  
  #file location
  file_location<-"../MicrobiomeAR_Inputs/mdp/"
  save_location<-"./MicrobiomeAR_Outputs/mdp/"
  
  #Read16SAbundData(mbSetObj, dataName, format, taxa_type, ismetafile)
  #input data
  mbSet<-Read16SAbundData(mbSet,paste0(file_location,"ibd_asv_table.txt"), format, taxa_type,is.normalized = "T");
  mbSet<-ReadSampleTable(mbSet,paste0(file_location,"ibd_meta.csv"));
  mbSet<-Read16STaxaTable(mbSet,paste0(file_location,"ibd_taxa.txt") );
  
  #processing
  mbSet<-SanityCheckData(mbSet, "text");
  mbSet<-PlotLibSizeView(mbSet, "norm_libsizes_0","png");
  mbSet<-CreatePhyloseqObj(mbSet, taxalabel = "text",taxa_type = taxa_type,isNormInput = "F")
  mbSet<-ApplyAbundanceFilter(mbSet, "prevalence", 4, 0.2);
  mbSet<-ApplyVarianceFilter(mbSet, "iqr", 0.1);
  mbSet<-PerformNormalization(mbSet, "none", "colsum", "none")
  
  
  #####Visual exploration
  ##Stacked bar/area plot
  print("stacked bar/area plot started")
  mbSet<-PlotTaxaAundanceBar(mbSet, paste0(save_location,"taxa_alpha_0"),"Genus","Class", "null", "barraw",10, "set3","sum",10, "bottom", "F", "png")
  print("stacked bar/area plot output created")
  
  # Interactiv Pie chart
  mbSet<-PlotOverallPieGraph(mbSet, "Phylum", 10,"sum", 10, "bottom")
  GetSeriesColors()
  mbSet<-SavePiechartImg(mbSet, "Phylum",paste0(save_location,"primary_piechart_0"),"png")
  print("Interactiv Pie chart output created")
  
  #Rarefaction Curve
  mbSet<-PlotRarefactionCurve(mbSet, "filt","Class","Class","Class","5",paste0(save_location,"rarefaction_curve_0"),"png");
  print("Rarefaction Curve output created")
  
  ###Community profiling
  #Alpha-diversity analysis
  paste0(save_location,"alpha_diverbox_0")
  mbSet<-PlotAlphaData(mbSet, "filt",  paste0(save_location,"alpha_diver_0"),"Chao1","Class","Genus", "default", "png");
  mbSet<-PlotAlphaBoxData(mbSet, paste0(save_location,"alpha_diverbox_0"),"Chao1","Class","default", "png");
  mbSet<-PerformAlphaDiversityComp(mbSet, "tt","Class");
  print("Alpha-diversity analysis output created")
  
  #Beta-diversity analysis
  
  

  mbSet<-PlotBetaDiversity(mbSet, paste0(save_location,"beta_diver_1"), ordmeth="PCoA", distName="bray", colopt="expfac", metadata="Class",
                           showlabel="none", taxrank="Genus", taxa="null", alphaopt="Chao1", ellopt="yes", format="png",
                           dpi=72, custom_col="viridis");
  
  mbSet<-PCoA3D.Anal(mbSet, "PCoA","bray","Genus","expfac","Class","g__Bacteroides","Chao1",paste0(save_location,"beta_diver3d_1.json")) 
  mbSet<-PerformCategoryComp(mbSet, "Genus", "adonis","bray","Class");
  print("Beta-diversity analysis output created")
  
  #Core microbiome analysis 
  mbSet<-CoreMicrobeAnalysis(mbSet, paste0(save_location,"core_micro_0"),0.2,0.01,"OTU","bwm","overview", "all_samples", "Class", "CD", "png");
  print("Core microbiome analysis output created")
  
  ###Clustering & correlation
  #Heat-map clustering
  mbSet<-PlotHeatmap(mbSet, paste0(save_location,'heatmap_0'),"euclidean","ward.D","bwm","Class","Genus","overview","F", "png","T","T","8.0","8.0","F");
  print("Heat-map clustering output created")
  
  #Dendrogram analysis
  mbSet<-PlotTreeGraph(mbSet,  paste0(save_location,'plot_tree_0'),"bray","ward.D","Class","Genus", "default", "png");
  print("Dendrogram analysis output created")
  
  #Pattern search
  mbSet<-FeatureCorrelation(mbSet, "pearson", "Genus", "g__Bacteroides")
  mbSet<-PlotCorr(mbSet, paste0(save_location,'ptn_0'),"png", width=NA)
  print("Pattern search output created")
  
  ###Comparison & classification
  
  #Univariate analysis
  
  mbSet<-PerformUnivarTest(mbSet, "Class",0.05,"NA","Genus","tt")
  mbSet<-PerformMetagenomeSeqAnal(mbSet, "Class",0.05,"NA","Genus","zigfit")
 
  
  print("LEfse output created")
  
  #Random Forest
  
  mbSet<-RF.Anal(mbSet, 500,7,1,"Class","Genus")
  mbSet<-PlotRF.Classify(mbSet, 15, paste0(save_location,'rf_cls_0'),"png", width=NA)
  mbSet<-PlotRF.VIP(mbSet, 15, paste0(save_location,'rf_imp_0'),"png", width=NA)
  print("Random Forest output created")
  

  print("MDP Completed")
  
 
}

#### Settting up MicrobiomeAnalyst R SDP module

In [None]:
processSDP<- function()
{
  #file locations
  file_location<-"../MicrobiomeAR_Inputs/sdp/"
  save_location<-"./MicrobiomeAR_Outputs/sdp/"
  
  mbSet<-Init.mbSetObj()
  mbSet<-SetModuleType(mbSet, "sdp")
  mbSet<-ReadShotgunTabData(mbSet,paste0(file_location,"ko_mouse_sdp.csv"),"ko");
  mbSet<-ReadSampleTable(mbSet, paste0(file_location,"mouse_metadata_sdp.csv") );
  mbSet<-SanityCheckData(mbSet, "text");
  mbSet<-PlotLibSizeView(mbSet, "norm_libsizes_0","png");
  mbSet<-CreatePhyloseqObj(mbSet, "text","na","F")
  mbSet<-ApplyAbundanceFilter(mbSet, "prevalence", 4, 0.2);
  mbSet<-ApplyVarianceFilter(mbSet, "iqr", 0.1);
  mbSet<-PerformNormalization(mbSet, "none", "CSS", "none");
  
  #Clustering analysis
  #Heatmap Clustering
  mbSet<-PlotHeatmap(mbSet, paste0(save_location,"heatmap_0"),"euclidean","ward.D","bwm","Age","OTU","overview","F", "png","T","T","8.0","8.0","F");
  
  mbSet<-PlotTreeGraph(mbSet, paste0(save_location,"plot_tree_0"),"bray","ward.D","Age","OTU", "default", "png");
  
  mbSet<-PreparePCA4Shotgun(mbSet, paste0(save_location,"pca3d_0.json"),
                            paste0(save_location,"pca_2D_0"), "json", 1,2,3,"Age","none", "png")
  
  mbSet<-Match.Pattern(mbSet, "pearson", "1-2-3", "OTU", "Age")
  mbSet<-PlotCorr(mbSet,  paste0(save_location,"ptn_0"), "png", width=NA)

  #Differential Analysis
  
  #biomaker analysis
  mbSet<-PerformLefseAnal(mbSet, 0.1, "fdr", 2.0, "Age","F","ko","OTU");
  mbSet<-PlotLEfSeSummary(mbSet, 15, "dot", paste0(save_location,"bar_graph_sdp_0"),"png");

  mbSet<-RF.Anal(mbSet, 500,7,1,"Age","OTU")
  mbSet<-PlotRF.Classify(mbSet, 15, paste0(save_location,"rf_cls_sdp_0"),"png", width=NA)
  mbSet<-PlotRF.VIP(mbSet, 15, paste0(save_location,"rf_imp_sdp_0"),"png", width=NA)
  
  
  print("SDP Completed")
  
}

#### Setting up MicrobiomeAnalyst R PPD module

In [None]:
processPPD<- function()
{
  #file locations
  file_location<-"../MicrobiomeAR_Inputs/ppd/"
  save_location<-"./MicrobiomeAR_Outputs/ppd/"
  
  
mbSet<-Init.mbSetObj()
mbSet<-SetModuleType(mbSet, "ppd")
mbSet<-Read16SAbundData(mbSet, paste0(file_location,"atherosclerosis.txt"),"text","GreengenesID","T");
mbSet<-ReadSampleTable(mbSet, paste0(file_location,"athero_sample.txt") );
mbSet<-Read16STaxaTable(mbSet, paste0(file_location,"atherotax.txt"));
mbSet<-SanityCheckData(mbSet, "text");

mbSet<-PlotLibSizeView(mbSet, paste0(save_location,"norm_libsizes_0"),"png");
mbSet<-CreatePhyloseqObj(mbSet, "text","GreengenesID","F")
mbSet<-PerformRefDataMapping(mbSet, "costello_gut","GreengenesID","CLASS","Human Gut");
mbSet<-PrepareMergedData(mbSet, "CLASS","perc_feat");
mbSet<-PCoA3DAnal.16SRef(mbSet, paste0(save_location,"ppd_pcoa_0") ,"PCoA","bray","Genus", "CLASS")

print("PPD Completed")
}

Now that we have step up all three processes we can execute them.

In [None]:
processMdp()
processSDP()
processPPD()

Using microbiome diversity from previous submodule and related sequences, you learned to run a downstream analysis with MicrobeAnalystR to gene/protein markers significantly differentially express between 2 or more experimental condition using SDP (Shotgun Data Profiling) function.   