Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
Introduction
We have read an article about rheumatoid arthritis which indicates that the etiology of this inflammatory disease is unknown and it would be interesting to characterize at the molecular level to find some key mechanisms to improve their prevention and treatment.
To do this, authors designed an experiment using expression arrays, distinguishing three groups:
- Patients with rheumatoid arthritis.
- Patients with osteoarthritis.
- Healthy people. We would use the same data expression arrays used by the authors and play the analysis of microarray data.
This is the article (no need to download): Molecular signatures and new candidates to target the pathogenesis of rheumatoid arthritis. Physiol Genomics2010 November 29, 42A (4): 267-82.
Raw data for the study are in the repository Gene Expression Omnibus. The code in GEO is GSE1919. Could you find and download these data?. This is a good way to get contact with GEO! In any case, it is possible to download raw data from this link. You have to download a compressed folder called GSE1919_RAW.tar (takes a while, because it weighs 45.9 Mb).
Goals
Performing a bioinformatic analysis of expression data Babelomics designed to show that genes are differentially expressed between groups of subjects listed above. We are also interested in the functional interpretation of the results.
Data
We have data for 15 subjects. 5 for each of the defined groups: normal, osteoarthritis and rheumatoid arthritis. The following table shows the group for each sample (microarray). The platform array was Affymetrix:
id samplename filename CLASS disease
GSM34379.CEL GSM34379 ND_1 ND normal_donor
GSM34383.CEL GSM34383 ND_2 ND normal_donor
GSM34385.CEL GSM34385 ND_3 ND normal_donor
GSM34388.CEL GSM34388 ND_4 ND normal_donor
GSM34391.CEL GSM34391 ND_5 ND normal_donor
GSM34393.CEL GSM34393 osteoarthritis OA OA_A
GSM34394.CEL GSM34394 osteoarthritis OA OA_B
GSM34395.CEL GSM34395 osteoarthritis OA OA_x
GSM34396.CEL GSM34396 osteoarthritis OA OA_y
GSM34397.CEL GSM34397 osteoarthritis OA OA_z
GSM34398.CEL GSM34398 RA_A RA rheumatoid_arthritis
GSM34399.CEL GSM34399 RA_B RA rheumatoid_arthritis
GSM34400.CEL GSM34400 RA_x RA rheumatoid_arthritis
GSM34401.CEL GSM34401 RA_y RA rheumatoid_arthritis
GSM34402.CEL GSM34402 RA_z RA rheumatoid_arthritis
What's the work plan in Babelomics?
- We create an user in Babelomics (to save all jobs).
- Three steps with different processes for each one:
-
- Processing
-
- Differential Expression Analysis
-
- Functional Profiling
-
1. Processing
- We upload raw data in Babelomics. So we got a compressed folder containing all samples. As we download raw data from the GEO repository, we could load it in Babelomics on the Upload data. Note that the data is from Affymetrix. In the tutorial there are some clues to upload data.
- To remove systematic variations arising from the measurement process and to obtain a common measure for all chips, will make the normalization data from Processing / Normalization / Microarray One Channel / Affymetrix. We choose default parameters. Would it make sense to normalize data in our experiment into several groups or better all at once in a single process normalization?
- So far the tool does not know which group each array. I would have to say it from Processing / Edit . Here, select the data that we have normalized and generate a new variable called class (for example) and will include three categories: osteo, reu and normal. More information to edit variables.
- Before beginning the analysis of differential expression, we would like to explore the data again now that are already normalized. At the begining, we think patients belonging to each of the three groups listed, showing a regular expression pattern for each group. To verify this, we will make a clustering analysis:
- Clustering. From Expression / Clustering. We select our normalized data. We use default parameters in clustering and select SOTA method. Have more detailed information clustering methods in this link. Samples are grouped as we thought before clustering? Interpret these results.
Some more questions:
- Imagine this case: samples belonging to the same group, you have a common expression pattern (close samples in the clustering tree) and a different pattern between groups (differentiated or distant in clustering tree). Do you think you get a large number of differentially expressed genes?.
- If we detect any sample with a different pattern to samples of the same group, what strategy do you propose?
2. Differential expression analysis
Data are already normalized and ready for differential expression analysis. We would like to find genes differentially expressed in the following comparisons:
- Osteoarthritis vs. Normal
- Rheumatoid arthritis vs. Normal
- Osteoarthritis vs. Rheumatoid arthritis
Depending on the experimental design will use one method or another in Babelomics. For this example, we make a comparison of groups and will choose the Expression / Differential Expression / Microarray / Class comparison option. It is important to indicate which is the variable that sets groups (Select the class to analyse). Babelomics automatically detects the number of categories that we have and will offer analysis method for this variable. For example, with two categories will have several options: t-test, limma or fold-change. With three categories: anova or limma.
As we work with pairwise comparisons, Babelomics display the 3 categories (normal, osteo and reu) but we have to change to none, by clicking the dropdown for one of them.
Important: if we make the comparison Osteoarthritis vs. Normal, the first category to be displayed in the square of dialogue will be Osteoarthritis and the second one will be Normal. We will choose the type of test: limma.
For each of these three comparisons launch a job to show us the corresponding results of differential expression. The values of the parameters will leave them with what appears by default.
Questions
A. In comparison Osteoarthritis vs.Normal using the limma test:
- How many differentially expressed genes detected?
- How are sorted the genes in the heatmap?
- What is the relationship between statistic p-value and adj.p-value?
- Change the p-value adjusted and now work with adj.p-value <0.1. How many differentially expressed genes detected?
- Again analyzes the difference in expression between the two groups but using t-test method. How many genes detected as differentially expressed now?
- Open the text file with significant results. Do you know each statistical indicators appear? (Significant results / Statistic table values).
- Genes differentially expressed, are the same in both methods?. To quantify this intersection there are several ways, both need to use the txt files with significant genes in both jobs (they are in the section significative results):
- Unload both text files and read from R. The function intersect gets the common genes.
- Another option to intersect two sets of identifiers (genes, miRNAs ....) is Venny: a tool very useful and easy to use to assess intersections. To do this open each of the two preceding text file with a spreadsheet and select ids genes and paste it into Venny, same for the second file and determines the number of genes that are differentially expressed in both scenarios.
- Why both methods give different results?
B. Can you reproduce the above process for all other comparisons?
C. Using the method limma what genes are differentially expressed in the two comparisons: Osteoarthritis vs. Normal and Rheumatoid arthritis vs. Normal?
3. Functional Profiling
Differential Expression Analysis gives us significant genes for each comparison of interest described above. This information is very useful, but also we would like to know which biological functions are behind these genes and even know if there are groups of genes with a common expression pattern that shares these features.
From the results of differential expression analysis in comparison Osteoarthritis vs. Normal:
- Perform Single Enrichment Analysis in genes that are UP expressed in osteoarthritis, using biological databases: GO biological process, molecular function GO and GO cellular component. Note: from the results of the analysis of differential expression, you can forward this output directly as input enrichment analysis (significant results vs. genome top list Send to Single Enrichment tool). Already in Single Enrichment form, you will have to complete the fields: organism and databases of interest for you. Finally assign a name to the job and launch the job!
- Are there functions that are over-represented in this group of genes UP expressed in osteoarthritis?
- Repeat enrichment analysis by changing the p-value adjusted to 0.1. Are there significant variations in results?
- Interpret graphical representation of the GO biological process
- Perform other enrichment analysis but this time on the genes that are DOWN expressed in comparison Osteoarthritis vs. Normal using the same databases that were selected in the previous job.
- Now prepare another enrichment analysis with a different strategy: we want to know what functions are over-represented when we compare list UP and DOWN genes expressed in Osteoarthritis vs. Normal. Each of these jobs, assign them a name that is informative because at the end we will have many jobs and provide a clear name we easily find what we are doing.
- Finally we do a Gene Set Enrichment from the results of differential expression Osteoarthritis vs. Normal. Interpret the results.
- Repeat the above analysis for the other two comparisons of interest:
- Rheumatoid arthritis vs. Normal.
- Osteoarthritis vs. Rheumatoid arthritis.
Find the Babelomics suite at http://babelomics.org