Skip to content

Association Analysis doc

Francisco García edited this page Jan 30, 2015 · 4 revisions

This tool allows the study of SNPs applying microarray technology from GWAS (Genome-Wide Association Studies) and TDTs.

The basic association test is for a disease trait and is based on comparing allele frequencies between cases and controls (empirical p-values are available). Also implemented are the Chi-square case/control association test, Fisher's exact test, Linear, Logistic regression and TDT test (only for family-based analysis). As well some modifications are implemented as the SNP filtering on the basis of minor allele frequency.

In general, the idea of population association studies is to identify patterns of polymorphisms that vary systematically between individuals with different disease states and could therefore represent the effects of risk-enhancing or protective alleles.

The statistical determination of how associated the genotype and phenotype are, it can be analysed with different tests that we propose in this section, where the use of one test or other principally depends on the type of incoming data.

There are several implemented tests for Association Analysis:

Chi-square case/control

The chi-square test statistic determines whether there is association between a particular SNP variant and phenotype (cases and controls). Or said differently, with this test we check if there is difference between the allele proportion presence on the phenotype variable (case and control). If the pvalue < 0.05, the null hypothesis of to not have association between variables is unlikely to be true and then we reject the null hypothesis. For every SNP, the chi-square test statistic builds a 2x2 contingency table by counting the number of times each possible allele SNP appears in a case or control sample.

HINT: When there is a small number of counts in the table, the use of the chi-square test statistic may not be appropriate. Specifically, it has been recommended that this test not be used if any cell in the table has an expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five. Under this scenario, the Fisher's exact test is recommended for conducting tests of hypothesis.

Fisher's exact

This test is similar than the Chi-square test but in the case of to have a small sample size, it is better to use Fisher's exact test than Chi-squared.

Linear

This test allows for multiple covariates when testing for quantitative trait SNP association, and for interactions with those covariates.

Logistic

The logistic regression test is similar than the linear but instead of testing for quantitative trait it is for disease trait SNP association.

TDT

We will use this test only for family-based association (eg. trios) testing for disease traits.

References

Spielman R.S., McGinnis R.E., Ewens W.J. (1993) "Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM)". Am J Hum Genet. 1993 March; 52(3): 506–516.

Clone this wiki locally
You can’t perform that action at this time.