Tutorial Genepix Two Colors Microarray Normalization
Clone this wiki locally
Input data should be a matrix upload as the data type Microarray Two channels Genepix data type. See data types here.
In this section of the form you can select the dataset you want to normalize. There are two options to upload your data:
- from this form: Select your data / File browser / Upload
- previously using the Upload Menu in Babelomics and tagged it with the Microarray Two channels Genepix data type. Datasets that are not tagged as _Microarray two chanels GenePix data type cannot be normalized using the GenePix Normalization Tool.
Here you can load a dataset from our server. You can use them to run this example and see how the tool works. Click on the links to load the data: genepiz two-channel normalization.
Select your data
First step is to select your data to analyze.
Normalization of GenePix two color microarrays is a tree stage process:
- Background correction: for each channel or color measured in the array, this step corrects background effects so all features are measured over the same starting intensity in that channel scale.
- Within array normalization: the aim of this step is to correct for artifacts produced by differences in the measurements of the two signal channels. Also in this step the two signals are summarized into a unique one by a log ratio transformation.
- Between array scaling: in this step measurements from all microarray are rescaled into a unique final distribution so that comparisons can be done between them.
GenePix Microarray Image Analysis Software has some options for flagging abnormal features. If you have created this quality flags when scanning your arrays, you may choose to exclude them when fitting the normalization parameters. Doing so, this bad features will not influence your data. The first check box in the Analysis section lets you set this option.
Even if you did not use flagged features to fit the normalization parameters, you may still want to get a normalized value for them. In the second check box in the Analysis section, Babelomics lets you choose if you want to get normalized values for the flagged features or you rather setting them as missing values.
See microarray normalization methods section for details on the algorithms.
Fill information job
- Select the output folder
- Choose a job name
- Specify a description for the job if desired.
Press Launch job button
Press launch button and wait until the results is finished. A normal job may last approximately few minutes but the time may vary depending on the size of data. See the state of your job by clicking the jobs button in the top right at the panel menu. A box will appear at the right of the web browser with all your jobs. When the analysis is finished, you will see the label "Ready". Then, click on it and you will be redirected to the results page.
In this section you will find a reminder of the parameters or settings you have used to run the analysis.
In this section of the results page you will find normalized data files.
After GenePix two color normalization procedures three results files are created. These files are text files tab delimited.
Normalized Data File: the M-values file.
Contains the processed intensity measurements for all arrays in the dataset and for all biological features in the arrays. In the two color arrays, with some additional transformation, this is the log ratio of the two channel intensityes: the M-vaues.
Babelomics tries to identify non biological features and to exclude them from all steps of the analysis. If we succeeded in finding out the non biological features of your arrays they will not be represented in the normalized data. If we cannot distinguish non biological spots, all the features will be used in the normalization and reported in the normalized data file.
In this file:
- Arrays or samples are arranged in columns.
- Genes, spots or features are set in rows.
- Some header lines may be included at the beginning of the file. They will all start by #.
- One of those header lines, starting by #NAMES will contain the names of the arrays in your dataset.
- The first column contains feature identifiers. For Agilent one color arrays, Babelomics tries to figure out which is the best feature id among those provided within the raw data files. Some other feature ids will we reported in the Feature Data File.
Feature Data File
The rows in this file match the rows in the normalized data file. The columns, contain some additional array design information about the features in the data you have normalized.
Generally reported feature characteristics are:
- Any feature ID we find in the raw data files.
- Any gene or transcript ID we find in the raw data files.
- Row and Column position of the features within the array layout.
- Chromosomal position of the target genes or transcripts of each feature in the array (when this information is available within the raw data)
For each spot in the two channel arrays, its A-value is defined as the mean of the two color intensities in the log2 scale.
This values are reported by Babelomics in a file which lay out and organization are the same as the M-values ones.
In this section you will find some plots representing the normalized data. You can use them to asses how good the normalization process performed in your dataset. Generally you may compare different normalization options between them using this plots. Also you may want to compare the plots of the normalized data to those of the raw data and check if noise levels are lowered down after normalization.
- Box-Plots representing the normalized intensity distribution for each of the samples (arrays). In general you want overall distributions across arrays to be very similar after normalization. And as normalized measurements are scaled log ratios you will expect your data to be centered around zero.
- MA-Plots representing the normalized intensity distribution of each sample against a consensus mean sample. A LOESS line fitting the trend between M and A values is drawn in red. After normalization you expect no trend in the LOESS line, that is, you expect it to be as close as possible tho the horizontal 0 axis.
- Pseudo Image Plots: represent the normalized intensity of each spot within the array coordinates, creating a pseudo photo of the normalized array. High intensities are represented in red colors, low intensities are represented in blue colors. Ideally you will see an evenly colored image, meaning that, after normalization, there is no spatial effect in the array measurements. M-values ie. log ratios of the two channel intensities are represented in this plots.
|Go back to the Processing page|
|Go back to the Home page|