Skip to content
/ Needles Public

Large-scale genomic prediction with marker-by-environment interaction

Notifications You must be signed in to change notification settings

arnedc/Needles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Needles

A Distributed AI-REML Best Linear Unbiased Prediction framework for genomic prediction including marker-by-environment interaction. This software has been described and validated in the manuscript Needles: towards large-scale genomic prediction with marker-by-environment interaction. (De Coninck et al., 2015, submitted to GENETICS)

This software was developed by Arne De Coninck and can only be used for research purposes.

Genomic datasets used for genomic prediction are constantly growing due to the decreasing costs of genotyping and increasing interest in improving agronomic performance of animals and plants. To be able to deal with those large-scale datasets, a distributed-memory framework was developed based on a message passing interface the ScaLAPACK library and the PARDISO library for efficiently dealing with the sparse information introduced by the marker-by-environemnt interaction effects. The complexity of the algorithm is defined by the number of genetic markers and environments included in the genomic prediction setting; the number of individuals only has a linear effect on the read-in time. To enhance performance it is advised to compile and execute Needles on an MPI-optimized machine.

#Installation

Dependencies

Needles relies heavily on the following software packages, which have to be installed prior to installation of Needles. These software packages are all open source, except for the vendor-optimized implementations and PARDISO, but an academic license of PARDISO is free of charge.

  1. MPI (OpenMPI, MPICH, IntelMPI)
  2. ScaLAPACK and all its dependencies BLAS, BLACS, LAPACK, PBLAS (It is recommended to install a vendor optimized implementation )
  3. [PARDISO] (http://www.pardiso-project.org/)
  4. CMake (http://www.cmake.org/)

Currently, compilation will only work with the Intel MKL libraries installed. When MKL libraries are not available, one must change the MKL libraries in the CMakelists.txt file to the ones which are installed.

Step-by-step

  1. Unpack zip-file or clone git-repository
  2. go into the directory Needles
  3. make a new directory build
  4. go into the directory build
  5. type cmake ..
  6. type make

Usage

Needles only needs an input file to start. A default input file is provided: defaultinput.txt, more information on the arguments in the input-file can be found on the wiki.

Example

To test Needles with a default example, the following command should be entered in the example directory: mpirun -np 4 ../build/Needles GxE_20penv_QTL_input.txt At least 2 MPI processes should be initialised, because all sparse operations are performed by a single MPI process, while the other MPI processes are used to handle the dense operations. This test-case is one of the many test cases as described in the research article Needles: towards large-scale genomic prediction with marker-by-environment interaction and it analyses 800 observations, genotyped with 1575 QTL markers and evaluated at 10 different environments. The simulated QTL effects are in the file QTL_summary_20penv_10env.txt and the different ocntirbutions to the final phenotypic values are in the file Observations_summary_lowvar_20penv_10env.txt. When Needles is working correctly, the output should be exactly the same as in the files starting with correct_. An example of the output that is produced by Needles is in the file Needles_out_4procs.txt.

Output

Needles creates 3 output-files with the estimates/predictors for the different effects.

  • estimates_fixed_effects.txt: Lists the estimates for the fixed effects. Usually these are the fixed environmental effects, but users are free to choose the included fixed effects.
  • estimates_random_genetic_effects.txt: Lists the predictions for the random genetic effects. These are the predictions for the global genetic effects, independent of the environment.
  • estimates_random_sparse_effects.txt: Lists the predictions for the random marker-by-environment interaction effects.

Both random effects can be chosen by the user to model something else than genetic effects and their environmental interaction, but up until now one of the random effects should result in a sparse part of the coefficient matrix and the other shoudl result in a dense part. Also, both random effects can have a different variance, but the variandce is homoscedastic for each of the random effects, meaning that the variance of each random effect so modeled as a constant diagonal matrix.

Next to the result files, two files are given as output that monitor memory usage in the root node, which performs all the operations on the sparse part of the system (root_output.txt), and in the other nodes, performing all operations on the dense part of the system (cluster_output.txt).

Version history

  • Version 0.1 (09/2015):
    1. First public release of Needles

Contact

Please feel free to contact arne.deconinck[at]ugent.be for any questions or suggestions.

About

Large-scale genomic prediction with marker-by-environment interaction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published