Class comparison. Worked examples and exercises
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
INPUT
#### STEPS [1. Select your data](tutorial-expression.-class-comparison#select-your-data)
[2. Select the class to analyse](tutorial-expression.-class-comparison#select-the-class-to-analyse)
[3. Select test](tutorial-expression.-class-comparison#select-test)
[4. Choose multiple-test correction](tutorial-expression.-class-comparison#choose-multiple-test-correction)
[5. Define a threshold for adjusted p-value](tutorial-expression.-class-comparison#define-a-threshold-for-adjusted-p-value)
[6. Fill information job](tutorial-expression.-class-comparison#fill-information-job)
[7. Press *Launch job* button](tutorial-expression.-class-comparison#press-launch-job-button)
#### OUTPUT - [Input parameters](tutorial-expression.-class-comparison#input-parameters) - [Output files](tutorial-expression.-class-comparison#output-files) - [Significant results](tutorial-expression.-class-comparison#significant-results) - [Continue processing](tutorial-expression.-class-comparison#continue-processing)
**[Worked examples and exercises](tutorial-expression.-class-comparison#worked-examples-and-exercises)**
INPUT
#####Input data Input data should be a raw counts matrix upload as the data type Data matrix expression. See data types [here](Data Types). #####Online example Here you can load a small dataset from our server. You can use them to run this example and see how the tool works. Click on the links to load the data: correlation.txt.
### STEPS #####Select your data First step is to select your data to analyze.
#####Select the class to analyse
- This variable is relative to experimental design.
- You can select all or some values of class variable. If you don't use any value of the class variable, you should click none for this value.
#####Select test Select test that you want to perform:
- One-class: limma.
- Two-classes: t-test, limma, fold-change.
- Multi-classes: ANOVA, limma.
See [Differential Expression]((Differential Expression for arrays)) section for detailed information about methods.
See [Correlation](Differential Expression for arrays) section for detailed information about methods.
#####Choose multiple-test correction
- Several methods are implemented to adjust p.values for multiple statistical tests. This is a significance adjustment when many genes are tested in the same.
- You can select between FDR (False Discovery Rate), Holmm, Hochberg, Bonferroni and BY (Benjamini and Yekutieli).
- See [Differential Expression](Differential Expression for arrays) section for detailed information about methods.
#####Define a threshold for adjusted p-value You can choose an adjusted p-value between 0 and 1.
#####Fill information job
- Select the output folder
- Choose a job name
- Specify a description for the job if desired.
#####Press Launch job button
Press launch button and wait until the results is finished. A normal job may last approximately few minutes but the time may vary depending on the size of data. See the state of your job by clicking the jobs button in the top right at the panel menu. A box will appear at the right of the web browser with all your jobs. When the analysis is finished, you will see the label "Ready". Then, click on it and you will be redirected to the results page.

### OUTPUT #### Input parameters In this section you will find a reminder of the parameters or settings you have used to run the analysis.
Output files
-
In the output file link you will find a text file with results of the analysis for all genes. In general this file will have a first column of gene identifiers followed by some more columns of estimate statistics, their respective p-values, raw and corrected (see multiple testing section) and some other results. Since each particular statistical method reports different parameters, the exact layout of the results file depends on the method that you applied to your data.
-
The way genes are ordered in the results files is thought to be statistically meaningful according to the method used in the analysis. It also tries to be most meaningful for the biological interpretation of the results.
Significant results
-
List of genes and heatmap including only significant results.
-
In any analysis you run, we will provide a grid image representing your data. Each gene is represented in a row and each condition or array is represented in a column. High intensity measurements of gene expression are represented in red colors while blue colors represent lower measurements.
-
Genes are sorted according to their expression patterns in the same order as they are in the output file. Experimental conditions or arrays are ordered depending on their labels.
-
When studying differential expression under two or more conditions arrays are sorted by class. The first class on the left will be the one that appears first in the specified value of class variable. The second class on the left will be the second one to appear in the selected values of class variable.
Continue processing
In this section you will send directly your results to other tools in Babelomics.
### Worked examples and exercises
Worked Examples
Example 1. Rheumatoid Arthritis and Osteoarthritis Study
Download the data from the {{:example_data:rheumatoid_arthritis:arthritis_rma.txt|Rheumatoid Arthritis dataset}}. Open the file with a text editor and see how it looks like. This data correspond to 15 samples from two conditions: disease and control.
Affymetrix (GeneChip Human Genome U95A Array) platform was used to do the hybridization. The data here presented have been normalized using rma methodology implemented in Babelomics normalization tools.
The original data, including .CEL files and information about the samples, can be downloaded from GEO. They correspond to the series http:*www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1919.
We want to analyze differential gene expression between disease and control. To do this kind of analysis you can use the two-class section of the Babelomics Differential Gene Expression module.
Two steps:
-
Upload data. http:*babelomics.bioinfo.cipf.es/upload.html you can upload data. Data type is "Data matrix expression".
-
Go to the two-classes section of the http:*babelomics.bioinfo.cipf.es/expression.html#classcomparison choosing:
- select your data: arthritis_rma.txt.
- select the class to analyze: arthritis (this variable differenciate two groups: 0 is control, 1 is disease).
- select test: ttest.
- select multiple test correction: FDR.
- select adj. p-value: 0.005.
- Name your job and running!
You will get a graph like this:
{{:images:differential_expression:arthritis_heat_ttest.png?400&direct|}}
This option performs, for each gene, a t-test for the difference in mean expression between the two groups of arrays. T-statistics and p-values are reported.
In the output file as well as in the image, genes are ranked according to the t-statistic. Genes in the top of the results list are those more expressed in group 0 (control). Genes in the bottom part of the list are those more expressed in the group 1 (disease).
Example 2. Molecular Apocrine Breast Tumor
Download the data from the {{:example_data:apocrine_rma.txt|Molecular Apocrine Breast Tumor dataset}}. Open the file with a text editor and see how it looks like. This data correspond to 49 tumors of breast cancer patients. The tumors are classified into 3** classes**: apocrine, basal and luminal.
Affymetrix (GeneChip Human Genome U133 Array Set HG-U133A) platform was used to do the hybridization. The data here presented have been normalized using rma methodology implemented in Babelomics normalization tools.
The original data, including .CEL files and information about the samples, can be downloaded from GEO. They correspond to the series http:*www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1561.
Imagine that we were now interested in finding the genes which expression pattern is more heterogeneous across all three tumors. Such genes could be the ones involved in the processes that differentiate the tumors behavior, having therefore a clinical interest.
To do this kind of analysis you can use the multi-class section of the Babelomics Differential Gene Expression module.
If you upload the data in Babelomics and you use the multi-class section of the http:*babelomics.bioinfo.cipf.es/expression.html#classcomparison choosing the anova method, you will get a graph like this:
{{:images:differential_expression:workedexample2_heatmap_anova.png?600&direct|}}
As in the two classes analysis, rows of the grid represents genes and columns represent arrays. In the columns of the right of the table you have the estimates of the F-statistic and the adjusted p-values (FDR).
Genes are ranked according to the significance of the differential expression among groups. Genes on the top of the table are those with more differentiated pattern across tumors. Genes on the bottom of the table are those showing less different pattern between tumors. If you see the results file you will see that genes are ranked in the same way they are in the graph. The results of Babelomics Differential Gene Expression module in this case arranges the genes from more differentially expressed among the classes to no differentially expressed among them.
Exercises
####Exercise 1. Aged Muscle Dataset
*Download the data from {{:example_data:trained_sedentary:trained_sedentary_rma.txt|here}}. Open the file with a text editor and see how it looks like.
*This data corresponds to 12 samples of human tissue. They were taken from 6 healthy men, aged from 56 to 76, before and after 3 months of physical training.
*Affymetrix (GeneChip Human Genome U133 Array Set HG-U133A) platform was used to do the hybridization. The data here presented have been normalized using rma as implemented in Babelomics Affymetrix Normalization Tools.
*The original data, including .CEL files and information about the samples, can be downloaded from the NCBI repository called Gene Expression Omnibus. They correspond to the series GSE1786.
Questions
1. Use the data of this example to compare the samples of sedentary men with the trained ones. Use the t-test method and see the results file.
- Can you find any gene differentially expressed between the two groups?
- What is the implication of the sign of the estimate of the t-statistic?
- How is the relationship among the estimate of the statistic, the p-values and the adjusted p-values?
2. Use limma method to perform the same analysis. (Input parameters. Adj.p-value: 0.05 and Multiple-test correction: FDR)
- Compare the ranking of the genes given by the different methods. Are they very different?
- How many significant genes do you have?
- Change the adj.p-value: 0.1, using "Other actions/Open input form" from the form. Now, how many significant genes do you have?
- Redirect some of the results to FatiGO or FatiScan (the only parameters you need to select is the "Homo sapiens" label in the Organism box and some biological database, for example "GO Biological Process"). Don't worry! We will see these interesting functional tools in the next session.
3. Suppose that we are interested in finding genes which expression is higher or lower in elderly men than in young men.
- What tool do you have to use in Babelomics to evaluate the relationship between the expression of the genes and the age of the men?
- Use different tests. How is the arrangement of the genes in the heatmap?
Exercise 2. Molecular Apocrine Breast Tumor Dataset
*This is dataset that we used in the worked example 2.
Questions
1. Before we used anova test to evaluate expression gene between experimental conditions. Repeat the analysis using limma test. (Input parameters. Adj. p-value: 0.01, multiple test correction:FDR)
- How is the arrangement of the genes in the heatmap?
- How many significant genes do you have?
- Can you indicate the ten genes more differentially expressed? What does it mean in our experimental context?
2. Now, we are interested in comparing theses conditions by pairs. (Input parameters: adj. p-value: 0.01, multiple test correction:FDR)
- Compare the basal tumors with the luminal ones. How many genes do you have?
- Can you indicate the ten genes more under-expressed in luminal? And the gen more over-expressed in luminal?
- Compare the basal tumors with the apocrine ones. How many genes do you have?
Find the Babelomics suite at http://babelomics.org