Skip to content

Using MetaNetwork

avcarr2 edited this page Apr 12, 2021 · 5 revisions

WGCNA Workflow

MetaNetwork requires 3 files: data, experimental groups, and a UniProt database with protein accessions and protein names.

1. Data file format.

The data file should be structured so that the first column contains UniProt accessions. The next columns can contain any identifying information, notes, etc. In the vignette data folder (https://github.com/smith-chem-wisc/MetaNetwork/tree/master/Vignette%20Data), the ProstateCancerDataUpload.csv file contains imputed, log2-transformed, normalized data to be analyzed. The columns in the DATA FOR METANETWORK.csv file contains only a column of UniProt accession values as the first column.

All subsequent columns should contain protein intensity values from each experiment. Each column should have a unique column name.

2. Experimental file format

The experimental file has two columns. The rows of the first column should contain the unique identifiers of each column from the Data file. The rows of the second columns are used to describe the sample. The first column should be labeled as "SampleID", and the second column should be identified as "Experiment". In the example EXPERIMENTAL GROUPS FOR METANETWORK.csv file from the vignette folder, the rows of the Sample column correspond to the names from the ProstateCancerDataUpload.csv file. The corresponding tissue type can be found in the rows of the Experiment column.

3. UniProt Database

There are two databases in the MetaNetwork repository: a reviewed M. musculus and a reviewed H. sapiens database. If you are working with an organism outside of these two, you can upload your own database. The required information from the UniProt database is only the accession, Gene name, and Protein name columns.

4. WGCNA parameters selection

All parameters and the WGCNA are explained in depth in the paper describing MetaNetwork. For each parameter, we will provide a brief summary, along with a description of how this parameter will influence the final WGCNA workflow.

WGCNA Paramters

Scale-free cutoff. This parameter controls the acceptable R^2 value corresponding to the minimum threshold at which the scale-free topology is met. Higher scale-free cutoff values will require higher values of ß to achieve scale-free topology. Increasing ß will reduce the overall number of connections between proteins. If there is no underlying scale-free topology in a data set, the value of ß will never be large enough to achieve the Scale-free cutoff, indicating the presence of some effect, either experimental or batch, that is driving the data's correlation structure.

Module Merge Cut Height. To reduce the number of modules, MetaNetwork merges those that are closely correlated. To merge, MetaNetwork clusters the module eigenproteins based on their correlation. The height at which the modules branch corresponds to their correlation coefficient. Setting the Module Merge Cut Height to 0.25 means that modules that are correlated at 0.75 or higher are merged together. Decreasing the Module Merge Cut Height increases the number of modules. Modifying this parameter influences the size and number of modules.

There are trade-offs inherent with the Module Merge Cut Height parameter and the final chosen value will be highly dependent on the results you want. If you want to get a high-level idea of the broad changes in biological pathways between experimental conditions, then merging modules at higher values of Module Merge Cut Height will reduce the number of module eigenproteins and modules that you need to examine. If you are looking for specific pathways, then using higher values of Module Merge Cut Height will yield more modules, and therefore more chances the specific pathway you are looking for will be easily identified in a module during GO enrichment. Higher Module Merge Cut Height values trade sensitivity for time spent analyzing GO results.

Upper Power. This parameter is the maximum power, ß, that will be tested by MetaNetwork when determining whether the networks are scale-free.

Minimum Module Size. Controls the minimum size of the module during clustering. Defaults to 20 proteins.

Non-data columns. Users should set this parameter so that the value matches the number of non-data columns in the uploaded Data file.

Check for automatic power selection. If selected, MetaNetwork will use the first power that achieves scale-free topology. If left unselected, the power used will be 12, which is sufficient for around 20 samples.

Advanced options

Enter how many threads to use. Allows users to control the number of threads MetaNetwork uses.

Enter custom power. Setting this parameter allows users to specify what ß MetaNetwork should use.

5. GO Enrichment Workflow

After the WGCNA workflow is complete, the Results folder will contain an Excel workbook named ResultsWGCNA.xlsx. The GO enrichment analysis only requires uploading this file, selecting the organism, and clicking submit job. After completion of the workflow, another Excel workbook, AllezEnrichment.xlsx, will be written to the Results folder. This completes the GO Enrichment workflow.

Clone this wiki locally