Tutorial SNP stratification
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
A. Methods
The purpose of this tool is to find out if the population is stratified in groups. To find out any evidence of population substructure (from this or any other analysis) can be incorporated in subsequent association tests via the specification of clusters.
Hierarchical agglomerative clustering. We use complete linkage agglomerative clustering, based on pairwise identity-by-state (IBS) distance. In addition, some modifications are taken into account in the clustering process: a significant pairwise population concordance test for whether two individuals belong to the same population (i.e. do not merge clusters that contain significantly different individuals) and also cluster size restrictions (i.e. such that, with a cluster size of 2, for example, the subsequent association test would implicitly match every case with its nearest control, as long as the case and control do not show evidence of belonging to different populations).
B. Input form
Data
The data we need to work out with should be in a compressed archive containing two (a ped and a map) files. Babelomics can read .zip, .gz and .tar.gz files but needs to be able to uncompressed data without finding any folder structure.
There are two ways to choose data, one selecting from the browse server where one should have previously uploaded data, or secondly, directly uploading data clicking on the Upload [genotype] label and then selecting the data from its computer.
Options
(Optional) Mark the first option if you check group differences in this metric with respect to a binary phenotype.
To set up the values of the following options:
-
Pairwise population concordance (PPC): this is a simple significance test for whether two individuals belong to the same random-mating population. To only merge clusters that do not contain individuals differing at a certain p-value (0.0001 by default).
-
Maximum cluster size: to set the maximum number of clusters to a certain value, so that to stratify the population in many groups (enter 0 to not apply).
Job
** Job name. Give a short name to your analysis job ** Job description. You can use this section to document further the characteristics of this analysis
Its aim is to help you identifying the analysis you are running and distinguishing between several analysis. To set the name is mandatory but you can leave the description empty if you do not want to use it.
Run
Once all options are set you can run the job. You may get some error message if some parameters are not properly set. If you do, just check the options you have chosen.
C. Output form
Here you will find the result generated by the stratification tool, plain-text and space-delimited data files.
Files
Here you will find the result generated by the stratification tool, plain-text and space-delimited data files.
- result file (plink.cluster2): This file contains the information about the stratification (clustering) process listed one line per individual (SNP). The three columns of the file are: family ID, individual ID and assigned cluster.
A 1 0
B 1 1
C 1 1
D 1 2
Other actions
- Input data form: Stratification tool opens the input form with the same parameters that the user used for this experiment before. It is a feature to permit the re-running of the same experiment but varying some of the parameters of the experiment or the methodology used before.
Find the Babelomics suite at http://babelomics.org