SHEsisPlus

##Introduction SHEsisPlus is a open source software package for analysis of genetic association, Hardy-weinberg equilibrium, linkage disequilibrium and haplotype construction at multiallelic polymorphism loci, compatible for both diploid and polyploid species. The web-based version can be accessed via SHEsisPlus web version.

##What's new Compared to previous version of SHEsis, SHEsisPlus is compatitable for haploid, diploid and polyploid species. It can not only analyze case/control data, but also quantitative trait data. It provides various ways of P value adjustments, including Holm step-down, Sidak single-step, Sidak step-down, FDR and permutation tests. All these can be can be performed via the webui.

##Compile To build SHEsisPlus from source code, please first install Boost C++ Library. ###Linux Modify makefile to specify the locations of Boost include files and libs. Then type "make" in the souce code directory. ###Windows: Create a project in Microsoft visual studio. Add all the source files and header files EXCEPT unit test source files (*_test.cpp) to the current project. Modify the project properties and specify the path of Boost includefiles and libs. Then build it.

Note: SHEsisPlus is developed and tested under Linux. Its behaviour under Windows is not guaranteed. If you want to compile it under Windows, we recommand you to build it within Cygwin.

##Input format

###Case/control data

####Sample data for diploid species

id1  G A  C C  1 1  A1 A2
id2  A A  T C  1 1  A2 A2
id3  A A  T T  2 2  A3 A4
id4  0 0  T T  3 3  A5 A3
id5  G G  A A  2 3  A1 A2
id6  A A  C A  0 0  A6 A7

####Sample data for triploid species

id1  A G A  T C C  1 1 1  AA T  TT
id2  A A A  C T C  2 1 1  A  T  AA
id3  G A A  C T T  3 2 2  TT A  T
id4  0 0 0  A T T  3 3 3  AA T  AT
id5  G G G  T A A  1 2 3  TA TT T
id6  G A A  C C A  0 0 0  AA A  A

The first column is sample id. The following columns are genotypes. They should be deliminated by space, comma or tab. Adjacent tokens will be compressed and will be treated as a single token. Genotypes can be any string (e.g. 1,2,3,4, or A,T,G,C , or A1,A2,A3,A4, or anything else) except 0, which is the coding for missing genotypes.

The above shown is sample data for diploid and tripolid species. For diploid species, the columns correspond to: sample id, site1-allele1, site1-allele2, site2-allele1, site2-allele2, .... For triploid species, the columns should be: sample id, site1-allele1, site1-allele2, site1-allele3, site2-allele1, site2-allele2, site2-allele3, ...

###Quantitative trait data

####Sample data for diploid species

id1  20.6  G A  C C  1 1  A1 A2
id2  25.4  A A  T C  1 1  A2 A2
id3  23.1  A A  T T  2 2  A3 A4
id4  42.4  0 0  T T  3 3  A5 A3
id5  11.0  G G  A A  2 3  A1 A2
id6  5.5   A A  C A  0 0  A6 A7

####Sample data for triploid species

id1  1.1  A G A  T C C  1 1 1  AA T  TT
id2  3.2  A A A  C T C  2 1 1  A  T  AA
id3  14   G A A  C T T  3 2 2  TT A  T
id4  4.3  0 0 0  A T T  3 3 3  AA T  AT
id5  24   G G G  T A A  1 2 3  TA TT T
id6  4.49 G A A  C C A  0 0 0  AA A  A

The format for quantitative trait data is similar to that for case/control data except that the second column is the quantitative trait. The quantitative trait should be numeric.

##Arguments

###Allowed options:

  --help                produce help message
  --input arg           path for the input file containing both cases and 
                        controls, can be specified for multiple times
  --input-case arg      path for the input file containing cases, can be 
                        specified for multiple times
  --input-ctrl arg      path for the input file containing controls, can be 
                        specified for multiple times
  --snpname-file arg    path for file that contains names of snps
  --snpname-line arg    snp names are as arguments
  --output arg          prefix of output files
  --report-txt          report results in plain-text format. By default, 
                        results will be reported in html.
  --ploidy arg          number of ploidy
  --hwe                 perform Hardy-Weinberg disequilibrium test
  --assoc               perform association test, case/control analysis by 
                        default. To perform quantitative trait loci analysis, 
                        please specified together with --qtl.
  --qtl                 input phenotype is quantitative traits. input file 
                        should be specified with --input, the second column of 
                        the input file is the quantitative trait
  --permutation arg     times for permutation
  --haplo-EM            perform haplotype analysis using expectation 
                        maximization algorithm
  --haplo-SAT           perform haplotype analysis using SAT-based algorithm
  --mask arg            mask of snps for haplotype analysis, comma delimited. 
                        eg. mask=1,0,1 to use 1st and 3rd SNPs when there are 3
                        SNPs in all.
  --lft arg             lowest frequency threshold for haplotype analysis
  --ld-in-case          perform Linkage disequilibrium test in cases
  --ld-in-ctrl          perform Linkage disequilibrium test in controls
  --ld                  perform Linkage disequilibrium test in both cases and 
                        controls
  --adjust              adjust p-value for multiple testing
  --webserver           Internal use for webserver

###example

./SHEsisPlus --input-case case.txt --input-ctrl ctrl-txt --snp-line "rs1,rs2,rs3" --output out --ploidy 2 --hwe --assoc --permutation 1000 --haplo-EM --mask "1,1,0" --ld-in-case --adjust

./SHEsisPlus --input qtl.txt --qtl --ploidy 3 --hwe --assoc --ld --haplo-SAT --lft 0.01 --permutation 10000

##Interpret output

###Binary phenotype

####Association test

This is an exmple of output for case/control association analysis. The fileds are:

Header	Explanation
SNP	Names of SNPs. Your can specify the them in the textfiled Marker names. If you don't provide them, or your input is invalid, SNP names will default to site1, site2, site3 ...
Call rate	The percentage of individuals with nonmissing genotypes
Chi²	χ² in Pearson's Chi square test
Pearson's p	p value calculated from Pearson's Chi square test
Fisher's p	p value calculated from Fisher's exact test
Permutation p	p value acquired from permutation test
OR [95% CI]	Odds ratio [95% confident interval]. Please note that this value is only presented when a site has two allele types.
Holm	Holm (1979) step-down adjusted p-values for strong control of the family-wise Type I error rate (FWER)
SidakSS	Sidak single-step adjusted p-values for strong control of the family-wise Type I error rate (FWER)
SidakSD	Sidak step-down adjusted p-values for strong control of the family-wise Type I error rate (FWER)
FDR_BH	adjusted p-values for the Benjamini & Hochberg (1995) step-up FDR controlling procedure
FDR_BY	adjusted p-values for the Benjamini & Yekutieli (2001) step-up FDR controlling procedure
Detail	The counts and frequencies of a specific genotype or alleles

####Hardy-weinberg equilibrium test

The output is straightforward. Hardy-weinberg equilibrium is calculated in cases, in controls, and in both cases and controls. Both Pearson's chi square test and Fisher's exact test are performed.

####Haplotype analysis

In this example, haplotypes with frequency <0.03 are discarded. 0.03 is the default value. You can change this value by option --lft. The fields in the table are also easy to understand. Apart from association test for every single haplotype, a global result is also given. This result shows if the haplotype distribution is different between cases and controls.

####Linkage disequilibrium analysis

For linkage disequilibrium analysis, pair-wise D' and R² are calculated. The higher two loci are in linkage disequilibrium, the darker the color will be.

###Quantitative trait

####Association test

This is an exmple of output for case/control association analysis. The fileds are:

Header	Explanation
SNP	Names of SNPs. Your can specify the them in the textfiled Marker names. If you don't provide them, or your input is invalid, SNP names will default to site1, site2, site3 ...
Effect allele	Contributing allele. For biallelic species, the effect allele is the minor allele. For multiallelic species, the effect allele is the allele that gives the lowest p value.
Nonmissing	Number of non-missing individuals included in analysis
Beta	Regression coefficient
SE	Standard error
R²	Regression r-squared
T	Wald test (based on t-distribtion)
p	Wald test asymptotic p-value
permutation p	p value acquired from permutation test

The following fields have been described before.

####Hardy-weinberg equilibrium test

For quantative trait, Hardy-weinberg equilibrium test in all samples are carried out.

####Haplotype analysis

For quantitative trait, linkage disequilibirum is calculated in all samples. The results are similar to that of case/control data.

##References:

[1] Neigenfind J1, Gyetvai G, Basekow R, Diehl S, Achenbach U, Gebhardt C, Selbig J, Kersten B.Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics 2008 Jul 30;9:356. doi: 10.1186/1471-2164-9-356.

[2] Purcell S1, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC.PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007 Sep;81(3):559-75. Epub 2007 Jul 25.

[3] Hedrick PW. Gametic disequilibrium measures: proceed with caution. Genetics 1987 Oct;117(2):331-41.

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
SHEsisWebServer		SHEsisWebServer
.gitignore		.gitignore
Alg.h		Alg.h
Alloc.h		Alloc.h
ArrayStorage.cpp		ArrayStorage.cpp
ArrayStorage.h		ArrayStorage.h
AssociationTest.cpp		AssociationTest.cpp
AssociationTest.h		AssociationTest.h
AssociationTest_test.cpp		AssociationTest_test.cpp
BMP.cpp		BMP.cpp
BMP.h		BMP.h
Cloneable.h		Cloneable.h
CreatHtmlTable.cpp		CreatHtmlTable.cpp
CreatHtmlTable.h		CreatHtmlTable.h
DataGenerator.cpp		DataGenerator.cpp
Dimacs.h		Dimacs.h
GeneInteraction.cpp		GeneInteraction.cpp
GeneInteraction.h		GeneInteraction.h
GeneInteractionBinary.cpp		GeneInteractionBinary.cpp
GeneInteractionBinary.h		GeneInteractionBinary.h
GeneInteractionBinary_test.cpp		GeneInteractionBinary_test.cpp
GeneInteractionQTL.cpp		GeneInteractionQTL.cpp
GeneInteractionQTL.h		GeneInteractionQTL.h
GeneInteractionQTL_test.cpp		GeneInteractionQTL_test.cpp
HWETest.cpp		HWETest.cpp
HWETest.h		HWETest.h
HWETest_test.cpp		HWETest_test.cpp
Haplotype.cpp		Haplotype.cpp
Haplotype.h		Haplotype.h
HaplotypeBase.cpp		HaplotypeBase.cpp
HaplotypeBase.h		HaplotypeBase.h
HaplotypeDiploid.cpp		HaplotypeDiploid.cpp
HaplotypeDiploid.h		HaplotypeDiploid.h
HaplotypeDiploid_test.cpp		HaplotypeDiploid_test.cpp
HaplotypeEM.cpp		HaplotypeEM.cpp
HaplotypeEM.h		HaplotypeEM.h
HaplotypeEM_test.cpp		HaplotypeEM_test.cpp
HaplotypeLD.cpp		HaplotypeLD.cpp
HaplotypeLD.h		HaplotypeLD.h
HaplotypeLD_test.cpp		HaplotypeLD_test.cpp
Haplotype_test.cpp		Haplotype_test.cpp
Heap.h		Heap.h
IndexingVariables.cpp		IndexingVariables.cpp
IndexingVariables.h		IndexingVariables.h
IntTypes.h		IntTypes.h
LDTest.cpp		LDTest.cpp
LDTest.h		LDTest.h
LDTest_test.cpp		LDTest_test.cpp
LICENSE		LICENSE
Map.h		Map.h
MarkerRegression.cpp		MarkerRegression.cpp
MarkerRegression.h		MarkerRegression.h
MarkerRegression_test.cpp		MarkerRegression_test.cpp
Multinominal.h		Multinominal.h
Options.cc		Options.cc
Options.h		Options.h
ParseUtils.h		ParseUtils.h
QTL.cpp		QTL.cpp
QTL.h		QTL.h
QTL_test.cpp		QTL_test.cpp
Queue.h		Queue.h
README.md		README.md
SHEsisData.cpp		SHEsisData.cpp
SHEsisData.h		SHEsisData.h
SHEsisData_test.cpp		SHEsisData_test.cpp
SHEsisGUI.py		SHEsisGUI.py
Solver.cc		Solver.cc
Solver.h		Solver.h
SolverTypes.h		SolverTypes.h
Sort.h		Sort.h
System.cc		System.cc
System.h		System.h
Vec.h		Vec.h
XAlloc.h		XAlloc.h
fisher.cpp		fisher.cpp
fisher.h		fisher.h
font.cpp		font.cpp
font.h		font.h
linear.cpp		linear.cpp
linear.h		linear.h
linear_test.cpp		linear_test.cpp
logistic.cpp		logistic.cpp
logistic.h		logistic.h
logistic_test.cpp		logistic_test.cpp
main.cpp		main.cpp
makefile		makefile
minifont.cpp		minifont.cpp
minifont.h		minifont.h
regression.cpp		regression.cpp
regression.h		regression.h
utility.cpp		utility.cpp
utility.h		utility.h

License

celaoforever/SHEsisPlus

Folders and files

Latest commit

History