Needles

A Distributed AI-REML Best Linear Unbiased Prediction framework for genomic prediction including marker-by-environment interaction. This software has been described and validated in the manuscript Needles: towards large-scale genomic prediction with marker-by-environment interaction. (De Coninck et al., 2015, submitted to GENETICS)

This software was developed by Arne De Coninck and can only be used for research purposes.

Genomic datasets used for genomic prediction are constantly growing due to the decreasing costs of genotyping and increasing interest in improving agronomic performance of animals and plants. To be able to deal with those large-scale datasets, a distributed-memory framework was developed based on a message passing interface the ScaLAPACK library and the PARDISO library for efficiently dealing with the sparse information introduced by the marker-by-environemnt interaction effects. The complexity of the algorithm is defined by the number of genetic markers and environments included in the genomic prediction setting; the number of individuals only has a linear effect on the read-in time. To enhance performance it is advised to compile and execute Needles on an MPI-optimized machine.

#Installation

Dependencies

Needles relies heavily on the following software packages, which have to be installed prior to installation of Needles. These software packages are all open source, except for the vendor-optimized implementations and PARDISO, but an academic license of PARDISO is free of charge.

MPI (OpenMPI, MPICH, IntelMPI)
ScaLAPACK and all its dependencies BLAS, BLACS, LAPACK, PBLAS (It is recommended to install a vendor optimized implementation )
[PARDISO] (http://www.pardiso-project.org/)
CMake (http://www.cmake.org/)

Currently, compilation will only work with the Intel MKL libraries installed. When MKL libraries are not available, one must change the MKL libraries in the CMakelists.txt file to the ones which are installed.

Step-by-step

Unpack zip-file or clone git-repository
go into the directory Needles
make a new directory build
go into the directory build
type cmake ..
type make

Usage

Needles only needs an input file to start. A default input file is provided: defaultinput.txt, more information on the arguments in the input-file can be found on the wiki.

Example

To test Needles with a default example, the following command should be entered in the example directory: mpirun -np 4 ../build/Needles GxE_20penv_QTL_input.txt At least 2 MPI processes should be initialised, because all sparse operations are performed by a single MPI process, while the other MPI processes are used to handle the dense operations. This test-case is one of the many test cases as described in the research article Needles: towards large-scale genomic prediction with marker-by-environment interaction and it analyses 800 observations, genotyped with 1575 QTL markers and evaluated at 10 different environments. The simulated QTL effects are in the file QTL_summary_20penv_10env.txt and the different ocntirbutions to the final phenotypic values are in the file Observations_summary_lowvar_20penv_10env.txt. When Needles is working correctly, the output should be exactly the same as in the files starting with correct_. An example of the output that is produced by Needles is in the file Needles_out_4procs.txt.

Output

Needles creates 3 output-files with the estimates/predictors for the different effects.

estimates_fixed_effects.txt: Lists the estimates for the fixed effects. Usually these are the fixed environmental effects, but users are free to choose the included fixed effects.
estimates_random_genetic_effects.txt: Lists the predictions for the random genetic effects. These are the predictions for the global genetic effects, independent of the environment.
estimates_random_sparse_effects.txt: Lists the predictions for the random marker-by-environment interaction effects.

Both random effects can be chosen by the user to model something else than genetic effects and their environmental interaction, but up until now one of the random effects should result in a sparse part of the coefficient matrix and the other shoudl result in a dense part. Also, both random effects can have a different variance, but the variandce is homoscedastic for each of the random effects, meaning that the variance of each random effect so modeled as a constant diagonal matrix.

Next to the result files, two files are given as output that monitor memory usage in the root node, which performs all the operations on the sparse part of the system (root_output.txt), and in the other nodes, performing all operations on the dense part of the system (cluster_output.txt).

Version history

Version 0.1 (09/2015):
1. First public release of Needles

Contact

Please feel free to contact arne.deconinck[at]ugent.be for any questions or suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
example		example
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Needles

Dependencies

Step-by-step

Usage

Example

Output

Version history

Contact

About

Releases

Packages

Languages

arnedc/Needles

Folders and files

Latest commit

History

Repository files navigation

Needles

Dependencies

Step-by-step

Usage

Example

Output

Version history

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages