Skip to content

Tutorial SNP Association Analysis

Ralonso edited this page Apr 9, 2015 · 22 revisions

INPUT


#### STEPS [1. Select your data](tutorial-snp-association-analysis#select-your-data)
[2. Select the association test](tutorial-snp-association-analysis#select-the-association-test)
[3. Choose the MAF (Minor Allele Frequency)](tutorial-snp-association-analysis#choose-the-maf)
[4. Fill information job](tutorial-snp-association-analysis#fill-information-job)
[5. Press *Launch job* button](tutorial-snp-association-analysis#press-launch-job-button)

#### OUTPUT - [Input parameters](tutorial-snp-association-analysis#input-parameters) - [Output files](tutorial-snp-association-analysis#output-files)




INPUT

#####Input data

  • Input data should be a zip file contains the ped and map files (plane text). See data types [here](Data Types).

  • PED files contain genotype information (one person per row) and MAP files contain information on the name and position of the markers in the PED file. Babelomics can read .zip, .gz and .tar.gz files but needs to be able to uncompressed data without finding any folder structure.

#####Online example

  • Association analysis: chi-square test The zip file contains the ped and map files. With this analysis we can study whether there is association between SNPs and case/control samples using the chi-squared test.

### STEPS #####Select your data First step is to select your data to analyze.

#####Select the association test We need to select one of the following tests:

  • Chi-square case/control: to test whether there is association between the two classification variables (phenotype and genotype). (To know whether to reject the null hypothesis that there is no association between variables).

  • Fisher's exact: this test is similar than the Chi-square test but in the case of to have a small sample size, it is better to use Fisher's exact test than Chi-squared.

  • Linear: this test allows for multiple covariates when testing for quantitative trait SNP association, and for interactions with those covariates.

  • Logistic: the logistic regression test is similar than the linear but instead of testing for quantitative trait it is for disease trait SNP association.

  • TDT: we will use this test only for family-based association (eg. trios) testing for disease traits.

#####Choose the MAF

  • The minor allele frequency (MAF value) is used to filter SNPs on the basis of MAF value, it means only include SNPs with MAF >= "MAF value". The default value is 0.02.

  • This quantity is based only on founders (i.e. individuals for whom the paternal and maternal individual codes are both 0).

#####Fill information job

  • Select the output folder
  • Choose a job name
  • Specify a description for the job if desired.

#####Press Launch job button Press launch button and wait until the results is finished. A normal job may last approximately few minutes but the time may vary depending on the size of data. See the state of your job by clicking the jobs button in the top right at the panel menu. A box will appear at the right of the web browser with all your jobs. When the analysis is finished, you will see the label "Ready". Then, click on it and you will be redirected to the results page.


### OUTPUT #### Input parameters In this section you will find a reminder of the parameters or settings you have used to run the analysis.

Output files

Here you will find the result generated by the association tool, plain-text and space-delimited data files. After use one of the five proposed methods Chi-square, Fisher, Linear, Logistic or tdt you will get 3 links to different files, where the first link gets the main info of the results. Most files will have the same number of fields per line and will have the field names in the first line, facilitating use of a spreadsheet to view and process the results.

  • Association result file (the output files can be plink.assoc, plink.fisher, plink.assoc.linear, plink.assoc.logistic or plink.tdt): This file contains the statistics obtained for all SNPs when applying any test (remember that we do a test par each SNP). When we use the linear or logistic test we obtain a model for each SNP. Then, we see:

    • Manhattan plot represents the association for each SNP. Values between 3 and 5 (-log10 p-value) represent strong association.
    • top hits.txt shows signficant SNPs. This table includes detailed information about each position: p-value, Odds Ratio,...
    • When selecting a SNP, you can visualize this position from Genome Maps.
  • List of heterozygous haploid genotypes (plink.hh) This file contains a list of heterozygous haploid genotypes as the name itself indicates.

  • log file from PLINK (plink.log): This file contains the history of the different steps that association process has carried out. Very useful when a process is not working to see in which step has stopped the process.



Go back to the Genomics page
Go back to the Home page
Clone this wiki locally