Glide companion scripts & tutorial
These Python scripts are provided for basic tasks around the usage of the GLIDE package for the detection of epistasis in GWAS data.
T. Kam-Thong, C.-A. Azencott, L. Cayton, B. Puetz, A. Altmann, N. Karbalai, P. G. Saemann, B. Schoelkopf, B. Mueller-Myhsok, and K. M. Borgwardt. (2012) GLIDE: GPU-Based Linear Regression for Detection of Epistasis. Human Heredity, 73 (4), 220-236 doi: 10.1159/000341885
They were mostly developed at the MLCB (Machine Learning for Computational Biology) research group of the Max Planck Institutes in Tuebingen (Germany). Thanks to Damian Roqueiro for stress-testing them and providing useful feedback!
The documentation, which contains additional CLI instructions and short scripts, is available under
Compile the documentation with
pdflatex -shell-escape documentation/HowToGLIDE.tex
qqplot.py: Plot a Q-Q plot for (single SNP) p-values, from PLINK output.
naive_impute.py: Impute missing values in GLIDE input file, using the most frequent value for this phenotype.
plink2h5.py: Convert a binary PLINK file into a
transpose.py: Transpose a PLINK
.rawfile into a
split_glide.py: Write the batch file that will run GLIDE sequentially on the "tiles" created beforehand by splitting the input file.
compute_all_pvalues.py: Compute all p-values corresponding to the t-test scorse output by GLIDE. Run
compute_pvalues.pyon all outputs and match indices back to SNP names.
compute_pvalues.py: Compute the p-values corresponding to the t-test scorse output by GLIDE.
compute_block_to_index.py: Compute the dictionary (stored in
CONST.py) used to convert between tile IDs and SNP indices.
CONST.py: Provide the dictionary necessary to the conversion between tile IDs and indices.
rerun_confounder_h5.py: Run linear regression again on a small number of SNP pairs. Include confounders. Read genotypes from
.h5file (created with
rerun_confounder.py: Run linear regression again on a small number of SNP pairs. Include confounders.
rerun_h5.py: Run linear regression again on a small number of SNP pairs. Read genotypes from
.h5file (created with
rerun.py: Run linear regression again on a small number of SNP pairs.
utils.py: Utilities that can be reused by several other scripts.
meta_analysis.py: An example of running a meta-analysis of the pairs of SNPs selected by running GLIDE on 4 different cohorts.
Any questions can be directed to Chloe-Agathe Azencott:
chloe-agathe [dot] azencott [at] mines-paristech [dot] fr