Tutorial Data matrix preprocessing
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
INPUT
#### STEPS [1. Select your data](tutorial-data-matrix-preprocessing#select-your-data)
[2. Select logarithmic transformation](tutorial-data-matrix-preprocessing#select-logarithmic-transformation)
[3. Choose exponential function](tutorial-data-matrix-preprocessing#choose-exponential-function)
[4. Merge replicates](tutorial-data-matrix-preprocessing#merge-replicates)
[5. Filter missing values](tutorial-data-matrix-preprocessing#filter-missing-values)
[6. Impute missing values](tutorial-data-matrix-preprocessing#impute-missing-values)
[7. Extract IDs from dataset and save into a file](tutorial-data-matrix-preprocessing#extract-ids-from-dataset-and-save-into-a-file)
[8. Filter genes by names](tutorial-data-matrix-preprocessing#filter-genes-by-names)
[9. Fill information job](tutorial-data-matrix-preprocessing#fill-information-job)
[10. Press *Launch job* button](tutorial-data-matrix-preprocessing#press-launch-job-button)
#### OUTPUT - [Input parameters](tutorial-data-matrix-preprocessing#input-parameters) - [Output files](tutorial-data-matrix-preprocessing#output-files)
INPUT
#####Input data Input data should be a matrix upload as the data type Data matrix expression. See data types [here](Data Types).
#####Online example Here you can load small datasets from our server. You can use them to run this example and see how the tool works. Click on the links to load the data: preprocessing a two classes matrix or preprocessing a correlation matrix.
### STEPS #####Select your data First step is to select your data to analyze.
#####Select logarithmic transformation This function calculates the logarithm of the expression values. You can select the base you prefer for this.
#####Choose exponential function This option applies exponential function. You can select the base you prefer for this.
#####Merge replicates This function looks for replicated clones (ids, genes...) and merge their patterns. You can choose between averaging the original patterns or getting the median.
#####Filter missing values
-
This option is intended for removing the patterns with many missing values. You can choose the "Minimum percentage of existing values" you want to impose.
-
For example, if you have a dataset with 10 conditions and you set up the minimum percentage of existing values to 70%, all the patterns with less than 7 existing values will be removed, i.e., all the patterns with more than 3 missing values will be removed.
-
This function looks for replicated clones (ids, genes...) and merge their patterns. You can choose between averaging the original patterns or getting the median.
#####Impute missing values
This function fills out missing values. Several algorithms are available:
-
Impute with zeros: replace missing values by zeros. This is the simplest option and we do not recommend to use it unless you really know what you are doing
-
Row mean imputation: replace missing values by the. row average. This option is better than the first one but again we do not recommend to use it unless you really know what you are doing
-
Row median imputation: replace missing values by the. row median. This option is better than the first one but again we do not recommend to use it unless you really know what you are doing
-
KNN imputation: replace missing values by the average value of the K nearest patterns. You need at least 5 non-mising values for imputing the rest of the pattern. Good values for K are around 15.
See Troyanskaya et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17 (6), pp. 520-525
#####Extract IDs from dataset and save into a file Sometimes is useful to have all ids in a file for other analysis tools.
#####Filter genes by names This option will remove all the genes that are present in the extra list you upload.
#####Fill information job
- Select the output folder
- Choose a job name
- Specify a description for the job if desired.
#####Press Launch job button
Press launch button and wait until the results is finished. A normal job may last approximately few minutes but the time may vary depending on the size of data. See the state of your job by clicking the jobs button in the top right at the panel menu. A box will appear at the right of the web browser with all your jobs. When the analysis is finished, you will see the label "Ready". Then, click on it and you will be redirected to the results page.

### OUTPUT #### Input parameters In this section you will find a reminder of the parameters or settings you have used to run the analysis.
Output files
Preprocessing a data matrix yield an other data matrix with transformed measurements.
The processed matrix is stored in a new text file tab delimited.
In this file:
- Arrays or samples are arranged in columns.
- Genes, spots or features are set in rows.
- Some header lines may be included at the beginning of the file. They will all start by #.
- One of those header lines, starting by #NAMES will contain the names of the arrays in your dataset.
- The first column contains feature identifiers. For Agilent one color arrays, Babelomics tries to figure out which is the best feature id among those provided within the raw data files. Some other feature ids will we reported in the Feature Data File.
| Go back to the Processing page |
|---|
| Go back to the Home page |
|---|
Find the Babelomics suite at http://babelomics.org