a software package for analysis of genetic association, Hardy-weinberg equilibrium, linkage disequilibrium and haplotype construction at multiallelic polymorphism loci, compatible for both diploid and polyploid species.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
SHEsisWebServer
.gitignore
Alg.h
Alloc.h
ArrayStorage.cpp
ArrayStorage.h
AssociationTest.cpp
AssociationTest.h
AssociationTest_test.cpp
BMP.cpp
BMP.h
Cloneable.h
CreatHtmlTable.cpp
CreatHtmlTable.h
DataGenerator.cpp
Dimacs.h
GeneInteraction.cpp
GeneInteraction.h
GeneInteractionBinary.cpp
GeneInteractionBinary.h
GeneInteractionBinary_test.cpp
GeneInteractionQTL.cpp
GeneInteractionQTL.h
GeneInteractionQTL_test.cpp
HWETest.cpp
HWETest.h
HWETest_test.cpp
Haplotype.cpp
Haplotype.h
HaplotypeBase.cpp
HaplotypeBase.h
HaplotypeDiploid.cpp
HaplotypeDiploid.h
HaplotypeDiploid_test.cpp
HaplotypeEM.cpp
HaplotypeEM.h
HaplotypeEM_test.cpp
HaplotypeLD.cpp
HaplotypeLD.h
HaplotypeLD_test.cpp
Haplotype_test.cpp
Heap.h
IndexingVariables.cpp
IndexingVariables.h
IntTypes.h
LDTest.cpp
LDTest.h
LDTest_test.cpp
LICENSE
Map.h
MarkerRegression.cpp
MarkerRegression.h
MarkerRegression_test.cpp
Multinominal.h
Options.cc
Options.h
ParseUtils.h
QTL.cpp
QTL.h
QTL_test.cpp
Queue.h
README.md
SHEsisData.cpp
SHEsisData.h
SHEsisData_test.cpp
SHEsisGUI.py
Solver.cc
Solver.h
SolverTypes.h
Sort.h
System.cc
System.h
Vec.h
XAlloc.h
fisher.cpp
fisher.h
font.cpp
font.h
linear.cpp
linear.h
linear_test.cpp
logistic.cpp
logistic.h
logistic_test.cpp
main.cpp
makefile
minifont.cpp
minifont.h
regression.cpp
regression.h
utility.cpp
utility.h

README.md

SHEsisPlus

##Introduction SHEsisPlus is a open source software package for analysis of genetic association, Hardy-weinberg equilibrium, linkage disequilibrium and haplotype construction at multiallelic polymorphism loci, compatible for both diploid and polyploid species. The web-based version can be accessed via SHEsisPlus web version.

##What's new Compared to previous version of SHEsis, SHEsisPlus is compatitable for haploid, diploid and polyploid species. It can not only analyze case/control data, but also quantitative trait data. It provides various ways of P value adjustments, including Holm step-down, Sidak single-step, Sidak step-down, FDR and permutation tests. All these can be can be performed via the webui.

##Compile To build SHEsisPlus from source code, please first install Boost C++ Library. ###Linux Modify makefile to specify the locations of Boost include files and libs. Then type "make" in the souce code directory. ###Windows: Create a project in Microsoft visual studio. Add all the source files and header files EXCEPT unit test source files (*_test.cpp) to the current project. Modify the project properties and specify the path of Boost includefiles and libs. Then build it.

Note: SHEsisPlus is developed and tested under Linux. Its behaviour under Windows is not guaranteed. If you want to compile it under Windows, we recommand you to build it within Cygwin.

##Input format

###Case/control data

####Sample data for diploid species

id1  G A  C C  1 1  A1 A2
id2  A A  T C  1 1  A2 A2
id3  A A  T T  2 2  A3 A4
id4  0 0  T T  3 3  A5 A3
id5  G G  A A  2 3  A1 A2
id6  A A  C A  0 0  A6 A7

####Sample data for triploid species

id1  A G A  T C C  1 1 1  AA T  TT
id2  A A A  C T C  2 1 1  A  T  AA
id3  G A A  C T T  3 2 2  TT A  T
id4  0 0 0  A T T  3 3 3  AA T  AT
id5  G G G  T A A  1 2 3  TA TT T
id6  G A A  C C A  0 0 0  AA A  A

The first column is sample id. The following columns are genotypes. They should be deliminated by space, comma or tab. Adjacent tokens will be compressed and will be treated as a single token. Genotypes can be any string (e.g. 1,2,3,4, or A,T,G,C , or A1,A2,A3,A4, or anything else) except 0, which is the coding for missing genotypes.

The above shown is sample data for diploid and tripolid species. For diploid species, the columns correspond to: sample id, site1-allele1, site1-allele2, site2-allele1, site2-allele2, .... For triploid species, the columns should be: sample id, site1-allele1, site1-allele2, site1-allele3, site2-allele1, site2-allele2, site2-allele3, ...

###Quantitative trait data

####Sample data for diploid species

id1  20.6  G A  C C  1 1  A1 A2
id2  25.4  A A  T C  1 1  A2 A2
id3  23.1  A A  T T  2 2  A3 A4
id4  42.4  0 0  T T  3 3  A5 A3
id5  11.0  G G  A A  2 3  A1 A2
id6  5.5   A A  C A  0 0  A6 A7

####Sample data for triploid species

id1  1.1  A G A  T C C  1 1 1  AA T  TT
id2  3.2  A A A  C T C  2 1 1  A  T  AA
id3  14   G A A  C T T  3 2 2  TT A  T
id4  4.3  0 0 0  A T T  3 3 3  AA T  AT
id5  24   G G G  T A A  1 2 3  TA TT T
id6  4.49 G A A  C C A  0 0 0  AA A  A

The format for quantitative trait data is similar to that for case/control data except that the second column is the quantitative trait. The quantitative trait should be numeric.

##Arguments

###Allowed options:

  --help                produce help message
  --input arg           path for the input file containing both cases and 
                        controls, can be specified for multiple times
  --input-case arg      path for the input file containing cases, can be 
                        specified for multiple times
  --input-ctrl arg      path for the input file containing controls, can be 
                        specified for multiple times
  --snpname-file arg    path for file that contains names of snps
  --snpname-line arg    snp names are as arguments
  --output arg          prefix of output files
  --report-txt          report results in plain-text format. By default, 
                        results will be reported in html.
  --ploidy arg          number of ploidy
  --hwe                 perform Hardy-Weinberg disequilibrium test
  --assoc               perform association test, case/control analysis by 
                        default. To perform quantitative trait loci analysis, 
                        please specified together with --qtl.
  --qtl                 input phenotype is quantitative traits. input file 
                        should be specified with --input, the second column of 
                        the input file is the quantitative trait
  --permutation arg     times for permutation
  --haplo-EM            perform haplotype analysis using expectation 
                        maximization algorithm
  --haplo-SAT           perform haplotype analysis using SAT-based algorithm
  --mask arg            mask of snps for haplotype analysis, comma delimited. 
                        eg. mask=1,0,1 to use 1st and 3rd SNPs when there are 3
                        SNPs in all.
  --lft arg             lowest frequency threshold for haplotype analysis
  --ld-in-case          perform Linkage disequilibrium test in cases
  --ld-in-ctrl          perform Linkage disequilibrium test in controls
  --ld                  perform Linkage disequilibrium test in both cases and 
                        controls
  --adjust              adjust p-value for multiple testing
  --webserver           Internal use for webserver

###example

./SHEsisPlus --input-case case.txt --input-ctrl ctrl-txt --snp-line "rs1,rs2,rs3" --output out --ploidy 2 --hwe --assoc --permutation 1000 --haplo-EM --mask "1,1,0" --ld-in-case --adjust

./SHEsisPlus --input qtl.txt --qtl --ploidy 3 --hwe --assoc --ld --haplo-SAT --lft 0.01 --permutation 10000

##Interpret output

###Binary phenotype

####Association test

This is an exmple of output for case/control association analysis. The fileds are:

Header Explanation
SNPNames of SNPs. Your can specify the them in the textfiled Marker names. If you don't provide them, or your input is invalid, SNP names will default to site1, site2, site3 ...
Call rateThe percentage of individuals with nonmissing genotypes
Chi2χ2 in Pearson's Chi square test
Pearson's pp value calculated from Pearson's Chi square test
Fisher's pp value calculated from Fisher's exact test
Permutation pp value acquired from permutation test
OR [95% CI]Odds ratio [95% confident interval]. Please note that this value is only presented when a site has two allele types.
HolmHolm (1979) step-down adjusted p-values for strong control of the family-wise Type I error rate (FWER)
SidakSSSidak single-step adjusted p-values for strong control of the family-wise Type I error rate (FWER)
SidakSDSidak step-down adjusted p-values for strong control of the family-wise Type I error rate (FWER)
FDR_BHadjusted p-values for the Benjamini & Hochberg (1995) step-up FDR controlling procedure
FDR_BYadjusted p-values for the Benjamini & Yekutieli (2001) step-up FDR controlling procedure
DetailThe counts and frequencies of a specific genotype or alleles

####Hardy-weinberg equilibrium test

The output is straightforward. Hardy-weinberg equilibrium is calculated in cases, in controls, and in both cases and controls. Both Pearson's chi square test and Fisher's exact test are performed.

####Haplotype analysis

In this example, haplotypes with frequency <0.03 are discarded. 0.03 is the default value. You can change this value by option --lft. The fields in the table are also easy to understand. Apart from association test for every single haplotype, a global result is also given. This result shows if the haplotype distribution is different between cases and controls.

####Linkage disequilibrium analysis

For linkage disequilibrium analysis, pair-wise D' and R2 are calculated. The higher two loci are in linkage disequilibrium, the darker the color will be.

###Quantitative trait

####Association test

This is an exmple of output for case/control association analysis. The fileds are:

HeaderExplanation
SNPNames of SNPs. Your can specify the them in the textfiled Marker names. If you don't provide them, or your input is invalid, SNP names will default to site1, site2, site3 ...
Effect alleleContributing allele. For biallelic species, the effect allele is the minor allele. For multiallelic species, the effect allele is the allele that gives the lowest p value.
NonmissingNumber of non-missing individuals included in analysis
BetaRegression coefficient
SEStandard error
R2Regression r-squared
TWald test (based on t-distribtion)
pWald test asymptotic p-value
permutation pp value acquired from permutation test

The following fields have been described before.

####Hardy-weinberg equilibrium test

For quantative trait, Hardy-weinberg equilibrium test in all samples are carried out.

####Haplotype analysis

For quantitative trait, linkage disequilibirum is calculated in all samples. The results are similar to that of case/control data.

##References:

[1] Neigenfind J1, Gyetvai G, Basekow R, Diehl S, Achenbach U, Gebhardt C, Selbig J, Kersten B.Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics 2008 Jul 30;9:356. doi: 10.1186/1471-2164-9-356.

[2] Purcell S1, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC.PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007 Sep;81(3):559-75. Epub 2007 Jul 25.

[3] Hedrick PW. Gametic disequilibrium measures: proceed with caution. Genetics 1987 Oct;117(2):331-41.