Selecting CONected Explanatory SNPs
C++ Matlab
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
code 2013 version of the code Oct 18, 2016
data 2013 version of the code Oct 18, 2016
linear 2013 version of the code Oct 18, 2016
maxflow 2013 version of the code Oct 18, 2016
utils 2013 version of the code Oct 18, 2016
README.md Related repositories Oct 18, 2016

README.md

SConES

(Selecting Connected Explanatory SNPs)

Summary

MATLAB code implementing the method described in:

C.-A. Azencott, D. Grimm, M. Sugiyama, Y. Kawahara and K. Borgwardt (2013) Efficient network-guided multi-locus association mapping with graph cuts, Bioinformatics 29 (13), i171-i179 doi:10.1093/bioinformatics/btt238

Related work and repositories

  • EasyGWAS: SConES has been integrated to EasyGWAS, a framework for the analysis and meta-analysis of GWAS data. In particular, this offers a Python interface.

  • sfan: Regarding the feature selection part (i.e. after the GWAS data has been processed and the SNP scored), sfan uses a different (faster) maxflow solver, is written in Python, and also incorporates the multi-task version proposed in

M. Sugiyama, C.-A. Azencott, D. Grimm, Y. Kawahara and K. Borgwardt (2014) Multi-task feature selection on multiple networks via maximum flows, SIAM ICDM, 199-207 doi:10.1137/1.9781611973440.23

  • MultiSConeS: For the original version of this multi-task version, see Multi-SConES.

Demo:

In the code folder, there is a MATLAB script demo.m.

To run the demo start MATLAB and type in:

demo()

Data:

  • X = Genotypematrix of size n x s, where n is the number of samples and s is the number of SNPs
  • Y = Phenotypevector of size n x 1, where n is the number of samples
  • W = sparse network with size s x s

Demo files are provided in the data folder.

Running SConES for given values of the lambda and eta parameters

[indicators, objectives] = scones(data, option)

Input:

To run SConES two parameters are needed. The first one is a data cell array:

  • data.X is the genotype data
  • data.Y is the phenotype
  • data.W is the sparse network
  • data.selected_PCs is the number of principal components that should be used for population structure correction
  • data.lambda_values is a vector of size 1 x k with k values for lambda
  • data.eta_values is a vector of size 1 x h with h values for eta

The second parameter is a options cell array (optional - default values are specified):

  • options.automatic : if this parameter is true data.lambda_values and data.eta_values are determined automatically (default: true)
  • options.number_parameters : this parameters specifices the number of eta and lambda values in the case options.automatic is set to true (default: 10)
  • options.stdout : if this parameter is true output is printed into the terminal window (default: true)

Output:

  • indicators = indicator matrix of size n x k x h, where n is the length of vector c, k the length of vector lambda_values and h the length of vector eta_values
  • objectives = matrix with all objective values with size k x h for the grid of lambda x eta values

Using an inner cross-validation to determine the best values of the lambda and eta parameters

[indicators, objectives] = scones_crossvalidation(data, option)

Input:

The first parameter (data) is the same as described above.

The second parameter (options) can additionally take the following values:

  • options.nfold : if scones_crossvalidation is called this parameter specifices the number of folds (default: 10)
  • options.seed : if scones_crossvalidation is called this parameter specifies a seed for splitting the data (default: 0)

Contact

Any questions can be directed to Chloe-Agathe Azencott: chloe-agathe.azencott [at] mines-paristech.fr