A C and CUDA implementation of tabulating linear regression for an exhaustive pairwise interaction search on a CUDA enabled GPU (Kam-Thong et al., Human Heredity 2012) http://goo.gl/XE54ir
Cuda C++ C R Python Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
code
README.md

README.md

GLIDE

Summary

This program is a C and CUDA implementation of tabulating linear regression for an exhaustive pairwise interaction search on a CUDA enabled GPU.

There is also a Python implementation in this repository, which is also forked here.

For a detailed description of the setting and the search algorithm, please refer to the associated paper:

  • T. Kam-Thong, C.-A. Azencott, L. Cayton, B. Puetz, A. Altmann, N. Karbalai, P. G. Saemann, B. Schoelkopf, B. Mueller-Myhsok, and K. M. Borgwardt. (2012) GLIDE: GPU-Based Linear Regression for Detection of Epistasis. Human Heredity, 73 (4), 220-236 link

Installation

In a terminal:

make all

Note: If running on an older GPU of compute capability 1.3, Please replace -sm_20 with -sm_13 in the Makefile and replace BlockSize 16 with Blocksize 10 in test.h

Usage

In a terminal:

./GLIDE -f1 genoFile -f2 genoFile2 -fp phenoFile -n NSubj -m NSNPS -m2 NSNPS2 -p NSNPS_GPU -t t_thres -o results -g deviceOrdinal

where:

  • genoFile = txt file containing the first genotype file in row major order (i.e. each SNP is a row and each subject is a column)
  • genoFile2 = txt file containing the second genotype file in row major order (i.e. each SNP is a row and each subject is a column)
  • phenoFile = txt file containing the phenotype
  • NSUB = Number of Subjects
  • NSNPS = Number of SNPs
  • NSNPS_GPU = Size of the partition SNPs, must be integer multiple of BlockSize
  • t_thres = t-score threshold
  • results = output file ; stored in text format
  • deviceNum = GPU ID # use for the run

Note:

  1. Selection of the t-threshold (-t) is based on the constraint imposed by the available storage space on the host machine.
  2. Selection of the partition size (-p) is based on the constraint imposed by the available memory on the GPU.

Example:
Retaining 1 million pairs resulted in file size of 56 MB and for a 1GB of device memory, a partition size of 2000 SNPs have worked reliably well without incurring segmentation fault.

Post-processing

In a terminal:

R --vanilla --args "Resultsname" "Set1_snpname" "Set2_snpname" "chunksize1" "chunksize2" "nsubjects" "NewResultsname" "BlockSize"  < search_name_onestudy.R

Example

In a terminal, the following:

make all  
./GLIDE -f1 Test1kind_first1ksnp.txt -f2 Test1kind_second1ksnp.txt -fp Test1kind_pheno.txt -n 1000 -m 1000 -m2 1000 -p 1000 -t 4 -o Results_1k.txt -g 0  
R --vanilla --args "Results_1k.txt" "first1k_snpnames.txt" "second1k_snpnames.txt" "1000" "1000" "1000" "Results_snpnames.txt" "16" < search_name_onestudy.R

will produce:

P1 P2 bidx bidy tidx tidy Tint TSnp1 TSnp2 TSnp1n2 Snp1 Snp2 Pint PSnp1 PSnp2 PSnp1n2
0 0 2 47 11 15 1.07445 -1.70963 -4.05737 4.19323 Set1rs44 Set2rs768 0.282881265084217 0.0876457911241603 5.35104201455988e-05 2.99492307327786e-05
0 0 3 31 11 14 1.56592 -3.40932 -3.44837 4.50097 Set1rs60 Set2rs511 0.117684988548390 0.000677358484171921 0.00058758464263497 7.56251433068282e-06
0 0 5 12 3 2 0.86708 -2.55246 -2.88948 4.01919 Set1rs84 Set2rs195 0.386107019174172 0.0108447652338024 0.00394246680550002 6.28008695821487e-05
...

where:

  • P1 = Partition 1
  • P2 = Partition 2
  • bidx = block ID x
  • bidy = block ID y
  • tidx = thread ID x
  • tidy = thread ID y
  • Tint = Tscore of estimated Intercept
  • TSnp1= Tscore of estimated SNP1 coefficient
  • TSnp2 = Tscore of estimated SNP2 coefficient
  • TSnp1n2 = Tscore of estimated Interaction coefficient
  • Snp1 = SNP1 name
  • SNP2 = SNP2 name
  • Pint = p-value of estimated Intercept
  • PSnp1 = p-value of estimated SNP1 coefficient
  • PSnp2 = p-value of estimated SNP2 coefficient
  • PSnp1n2 = p-value of estimated Interaction coefficient

Files

  • Makefile : Makefile
  • GLIDE_main.cu : GLIDE main code
  • GLIDE_kernel_op.cu : GLIDE kernel code
  • CUDAbook.h : CUDA header file
  • CUDAcheck.h : CUDA header file
  • GLIDE.h : GLIDE header file
  • search_name_onestudy.R : R file for post-processing of GLIDE output
  • plink2glide.py : Python file for transforming binary PLINK data into a genotype file readable by GLIDE
  • Test1kind_first1ksnp.txt : Test file with 1000 SNPs, 1000 individuals
  • Test1kind_second1ksnp.txt : Test file with 1000 SNPs, 1000 individuals
  • Test1kind_pheno.txt : Test file with 1000 phenotypes
  • first1k_snpnames.txt : Names of the SNPs in Test1kind_first1ksnp.txt
  • second1k_snpnames.txt : Names of the SNPs in Test1kind_second1ksnp.txt

License

GLIDE v0.1 Tony Kam-Thong tony@mpipsykl.mpg.de

(C) Copyright 2011, Tony Kam-Thong [tony@mpipsykl.mpg.de]

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3.0 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/

Contact

For any questions, please contact Tony Kam-Thong at: tkamth@gmail.com