OMnalysis: An intergrated web application to visulaize and analyze differential quantitative data

OMnalysis is developed using R shiny, flexdashboard and bioconductor packages. This tool targets researchers who are new to RNA-seq technology and proteomics study and often depends on commercial vendors or core facilities to sequence and analyze their data. Although exploration of the identified list of genes and proteins is also tedious and challenging. There are open source R packages such as DESeq2 and edgeR that are extensively used to identify differentially expressed genes from the count data. The filtering of DEGs using FDR value or Fold change value is still the matter of discussion among the scientific community, therefore the young researchers face difficulty to make actionable insight. OMnalysis uses the list of genes produced from edgeR’s glmTreat with count per million normalization, significance value (P-value) and fold change value assigned to each list of gene over or under expressed in the treatments. Whereas, for label-free relative quantitative proteomics data must contain the columns of UniProt ID, FDR-adjusted P-value and Fold Change in an excel file with each sheet of an experimental condition named Treatment1, Treatment2, Treatment3 and Treatment4.

The app is hosted on shiny.io link https://omnalysis.shinyapps.io/OMnalysis/

Instructions

You can run this app on your desktop after installing R base version 4.0.3 and R studio version 1.4.1106.
Once the environment is ready, install packages required for OMnalysis.
Install R shiny supporting packages.
install.packages(c("flexdashboard", "dplyr", "shiny", "shinydashboard", "DT", "tidyverse", "shinythemes", "tidyr", "gplots", "tibble", "gridExtra", "RColorBrewer", "slickR", "devtools", "ggbiplot", "factoextra", "ggplot2", "data.table", "VennDiagram", "fields", "wordcloud", "SBGNview", "europepmc", "shinyjs", "futile.logger", "rio", "plyr"))
Install Bioconductor packages version 3.12 using Install.packages("BiocManager")
install.packages(c("AnnotationDbi", "Biobase", "BiocFileCache", "BiocGenerics", "BiocParallel", "BiocVersion", "biomaRt", "Biostrings", "clusterProfiler", "DO.db", "DOSE", "EnhancedVolcano", "enrichplot", "fgsea", "GO.db", "GOSemSim", "graph", "graphite", "IRanges", "KEGGgraph", "KEGGREST", "org.Bt.eg.db", "org.Gg.eg.db", "org.Hs.eg.db", "org.Ss.eg.db", "pathview", "qvalue", "reactome.db", "ReactomePA", "Rgraphviz", "S4Vectors", "XVector", "zlibbioc", "STRINGdb", "SPIA", "SBGNview"))

DATA FORMAT TRANSCRIPTOMICS

Upload data must be in comma-separated-value (.CSV) file for transcriptomics data.
Headers must be true in the uploaded file.
The first column must be biomarkers (In transcriptomics – Ensembl accession number).
Column names must be ENSEMBLEGENE, logFC, logCPM, Pvalue for all four uploaded treatments of transcriptomics study.
The data must be as shown in the data screenshot below.

DATA FORMAT PROTEOMICS

Upload data must be in the xlsx file for the label free quantitative proteomics data.
Headers must be true in the uploaded file.
The first column must be biomarkers (In proteomics – UniProt ID).
Column names must be UniProt ID, FDR-adjusted P-value and Fold Change for all four uploaded treatments of proteomics study.
The data must be as shown in the data screenshot below.

UPLOAD DATA

The app features can be explored using the Differentially expressed example data tab for transcriptomics analysis and Proteomics abundance example data for proteomics data analysis.
In side panel below, use the preloaded RNA-Seq data produced by challenging human brain microvascular endothelial cells induced with Borrelia burgdorferi (Treatment1) and Neisseria meningitidis (Treatment2), Streptococcus pneumoniae (Treatment3) and West Nile Virus (Treatment4).
In the side panel below, use the preloaded label free proteomics example data of milk whey samples collected at different time point (36, 42, 57 and 81 hours) from cow intramammary infection induced with Streptococcus uberis bacteria (Mudaliar, et al., 2016).
Select the Upload Expression Data Browse tab, to upload your own transcriptomics data in CSV file up to four experimental conditions with expression value (log fold change), a measure of significance (P-value), count of reads (log counts per million).
Select the second Upload Expression Data Browse tab, to upload your own proteomics data in xlsx format file up to four experimental conditions with UniProt ID, FDR-adjusted P-value and Fold Change.
Once the data is uploaded (ONLY ONE OMICS TYPE at a time), select the species provided in the select a species tab (Human, Chicken, Pig and Cow) with the help of a dropdown to perform further analysis.
After selecting the species, the user needs to convert the less informative ENSEMBLGENE ID or UniProt ID to more informative IDs using the ID conversion tab.
Press the Submit button to perform the ID conversion and addition of converted ID column to the uploaded data.
Press the Download button to download the converted ID table in CSV format.

PCA

Principal component analysis (PCA) is an unsupervised dimension reduction method that allows us to understand the relationships among the attributes of expression data. PCA analysis calculates the principal components using the Euclidean distance and linear transformation of the expression data. It calculates the most significant variable in the provided data to calculate the first principal component (PC1), second principal component (PC2) and so on.
Two types of PCA plots are provided, first is Variable PCA plot that provides information about the direction and relationship among the variables (treatments) and second is Biplot PCA that visualize the features of variable PCA plot and observation (genes or proteins) of each treatment in single plot.
Press Variable PCA or Biplot PCA to generate transcriptomics variable PCA plot, biplot PCA plot in Plot visualization window.
Press Variable PCA or Biplot PCA to generate proteomics variable PCA plot, biplot PCA plot in Plot visualization Proteomic window.
Most of the variables are explained in PC1 and PC2, however, to compare principal components, Select PC from Compare PCA for Biplot only drop-down tab is to explore and compare the other available PCs (PC1 vs PC2, PC2 vs PC3 and PC3 vs PC4).
Select one of the Format to download images in jpeg, png, pdf and tiff formats.
Provide width, height and resolution to the output PCA plot using numerical input tabs. The default values are 20cm, 20cm and 300px for width, height and resolution respectively.
Press Download button to download the PCA plot.

Plots

Scatter plot uses log fold change versus log counts per million to visualize your expression data in the form of up and down-regulated genes. Each dot in the scatter plot represents up-regulated genes (green colour), non-significant genes (black colour) and downregulated genes (red colour).
Select from the Uploaded treatments checkboxes, Example- Treatment-1, then provide the numeric input Pvalue cutoff for both scatter and volcano plot and FC-cutoff for volcano plot to generate plots. The default numeric Pvalue cutoff is 0.001 and FC-cutoff is 1.2 for transcriptomics and change be changed.
Provide plot name (Example- “Scatter plot of hbmec induced with WNV”) in Plot title option.
Press one of the checkboxes of Plots to generate type of plot and Table to visualize each treatment data with converted ids on Scatter plot, Volcano plot or table output window.
Select image format type from the drop-down Format tab.
Provide width, height and resolution to the output PCA plot using numerical input tabs. The default values are 20cm, 20cm and 300px for width, height and resolution, respectively.
Press Download button to download the PCA plot.

Statistical filtering

With the help of checkbox button, select one from Treatments uploaded tab (Treatment-1, Treatment-2, Treatment-3 and Treatment-4) you can visualize the data and plot of the selected treatment on Filter data, Venn Diagram and Histogram window.
Select one from the Omics Type drop down tab (Transcriptomics or Proteomics).
Provide threshold numerical values in Statistical filtering tab input boxes provided to filter out the genes that are unable to cross the threshold values. The default value is 0 for all components (LogFC, LogCPM, Pvalue).
Once the filtered data is visible on the Filter data window. Use the Venn Diagram checkbox to visualize the common significantly DEGs or abundance protein in the uploaded treatments. Use Split into Up and Down-regulated checkboxes to obtain common or different up and down-regulated genes in two groups of uploaded treatments.
Provide width, height and resolution to the output Venn diagram and Histogram using numerical input tabs. The default values are 20cm, 20cm and 300px for width, height and resolution, respectively.
Use two separate Download tabs, first to download Venn diagram and second to download histogram.
Click on the Histogram’s Treatment checkbox and one treatment checkbox to generate each treatment histogram showing the number of differentially expressed genes or proteins that falls on the log fold change range and All Treatment checkbox generate the total number of up and down-regulated genes or proteins available in each treatment in a single histogram.
Use Title input tab to write the title to the generated diagram or histogram (example: - “Venn diagram or Histogram of Treatment-1”)
Select Venn diagram and Histogram image format types from jpeg, png, pdf and tiff using the drop-down Format option.

Gene ontology (GO) enrichment analysis

Select the Omics Type according to the uploaded data types.
Use the Gene ontology classes checkbox to perform the categorical analysis of your expression data, either clicking on the GO Biological Process (multiple molecular activities integrates to perform a process) or GO Molecular Function (activities at the molecular level by gene product) or GO Cellular Component (site of function concerning cellular structure).
After selecting one of the Gene ontology classes, provide Pvalue cutoff and the q-value cutoff ORA. The default Pvalue and q-value cutoff is 0 and can be adjusted.
Select one of the provided adjustment methods from the pAadjust Method tab. List are Holm, Hochberg, Hommel, Bonferroni, Benjamini and Hochberg (BH), Benjamini and Yekutieli (BY), FDR to control false positive results. Above mentioned first four methods control the family-wise error rate (probability of making one or more false discoveries) and remaining methods control the expected proportion of discoveries that are rejected falsely (FDR). We suggest FDR correction methods for a more reliable result.
Select one of the provided Enrichment analysis method, GO ORA (based on hypergeometric test and mapping of genes to the annotated biological vocabulary) and GO GSEA(based on the Kolmogorov Smirnov test and consider gene set with their sorted log fold change value).
After providing the necessary inputs click on the Go! button to launch the enrichment analysis. Please keep in mind that the same input values will be used for all treatments to maintain the standard of enrichment analysis.
The result of the enrichment analysis is visualized on the subtabs Ontology result-1, Ontology result-2, Ontology result-3, and Ontology result-4.
From the ontology result table, select one enriched GO term by selecting only one row and same GO term in all the treatments. This information will be used for the generation of heatmaps in the next GO heatmaps section.
Download GO result in CSV format by clicking on treatment checkbox and then on Download button.

GO heatmaps

It is mandatory to select at least two similar enriched GO terms from the previous tabular output of GO enrichment analysis of two treatments.
Select one of the GO ORA or GO GSEA methods from Heatmap visualization.
Click on at least two checkbox buttons from Treatments 1, 2, 3 and 4 to generate heatmap (Example if you select Treatment-1 and 3, Treatment-2 will automatically include in the heatmap.
Adjust value in col text size, row text size and col key size numerical tabs. The default value for _col text size, row text siz_e and col key size are 1.2, 1.2 and 0.04, respectively.
Provide text in the Title for heatmap input tab to provide main text on heatmap.
Download Heatmap and word cloud using image Format drop down tab.
Adjust dimension and resolution of output heatmap and word cloud using numerical input in tabs Plot width, Plot height and Plot resolution. The default value is 20cm, 20cm and 300px for width, height and resolution, respectively.
After providing the necessary information download the generated heatmap using the Download button.
Generate Word cloud of the selected treatment using Word cloud checkbox.
Provide the max words numeric input tab option to increase or decrease the number of enriched go terms visualization in the word cloud. The default value of the maximum words in the word cloud is 100.
Press the second Download button to download the generated word cloud diagram.

Pathway enrichment analysis

Select the Omics Type according to the uploaded data types.
Select one of the options provided in Select Pathway analysis Method (Over-represented analysis (ORA), Gene set enrichment analysis (GSEA), Network Topology Analysis (Human), ReactomePA (Human) and STRING to perform pathway analysis.
Provide numeric input in the Pvalue cutoff tab and select one of the pAdjusted Method tabs to overcome the chances of getting false positive results. Note that stringency cutoff value may result in a fewer number of gene set enrichment ranked pathways. Network Topology Analysis (Human) is supported by four databases first, biocarta (protein sets participating in the pathway), second, panther (a curated and comprehensive database to classify protein and their genes through evolutionary relationship), third, NCI-Nature Pathway Interaction Database (Signaling pathways composed of human biomolecular interactions and cellular processes) and fourth, pharmgkb (a comprehensive resource that provides information about how human genetic variation affects the response to medications). Once the above steps are checked, click on the Go! tab to launch the pathway enrichment analysis. Click on the subtabs to perform pathway enrichment analysis for “Pathway result-2, Pathway result-3, and Pathway result-4". From the pathway analysis result table, select one row with the same enriched pathway in all treatments. This information will be used for the visualization of the expressed gene or proteins in the next section Enriched pathway visualization. Click on the Pathway enrichment result Treatments and then Download button to download result in CSV format.

Enriched pathway visualization

After selecting the single and same pathway in all the treatments from the pathway enrichment result table, you can visualize the pathway using pathway visualization tabs and according to the pathway enrichment method performed in the previous section of OMnalysis.
Select one of the Pathway ORA, Pathway GSEA, ReactomePA and STRING PPI checkbox provided in Pathway visualization. You can select any one of them and view the output on the right-side subtab panels ORA pathway output or GSEA pathway output or Reactome pathway output or STRING network.
If the previous pathway enrichment analysis was ORA, then only the ORA pathway will be visualized in the ORA pathway output subtab, if not, it will show an error.
Select the color code from three drop down tabs for highly expressed (Up), No induced or absent expressed (No sign) and supressed genes or proteins (Down).
The default colour code for Up is green, No sign grey and Down in red. You can select more colour combination from the none, red, green, yellow, blue, and grey to visualize the expression values on enriched pathway.
Select the Treatments uploaded to visualize the enriched pathway of Treatment-1, Treatment-2, Treatment-3, Treatment-4.
The output from the reactome pathway analysis can be interpreted using the below notation image.
Keep in mind that the visualization of pathways depends on the pathway enrichment analysis method and selection of the same pathway in all treatments.
Press the Download tab to download the pathway image of ORA, GSEA and ReactomePA and STRING output in PNG image format.

Literature info

This section provides the option to retrieve the information from the Europe PMC by providing biomarkers name, species, disease, cell or tissue type in the text input tab below Literature search.
The Literature retrieval limit has an input option to provide a number that will decide the fetching of the literatures. The default is 100.
Once the literature search and retrieval limit are provided, you can proceed with the submit button to perform the retrieval of scientific literature.
The result in tabular form will appear in the literature info subtabs.
Select one scientific literature row in literature info table at a time to view the abstract and other information on the next subtab Abstract info.
If the keyword provided in the Literature search is not correct then the result may produce an error or blank table of literature info.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
rsconnect/documents/OMnalysis.Rmd/shinyapps.io/omnalysis		rsconnect/documents/OMnalysis.Rmd/shinyapps.io/omnalysis
www		www
OMnalysis.Rmd		OMnalysis.Rmd
OMnalysis.Rproj		OMnalysis.Rproj
README.md		README.md
Step-by-step guide to reprodce OMnalysis example workflow result.docx		Step-by-step guide to reprodce OMnalysis example workflow result.docx
Test_t.csv		Test_t.csv
biocartaEx1SPIA.RData		biocartaEx1SPIA.RData
nciEx1SPIA.RData		nciEx1SPIA.RData
pantherEx1SPIA.RData		pantherEx1SPIA.RData
pharmgkbEx1SPIA.RData		pharmgkbEx1SPIA.RData
reactome_ids.RData		reactome_ids.RData

Punit201016/OMnalysis

Folders and files

Latest commit

History

Repository files navigation