Skip to content

BorgwardtLab/SiNIMin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SiNIMin

This repository contains the code for the Significant Network Interval Mining approach, short SiNIMin, and its permutation-testing based counterpart SiNIMin-WY. The methods are described in Network-guided detection of candidate intervals that exhibit genetic heterogeneity (under review).

Data formatting

Assuming we are given a data set of n samples with d binary features. An example of all files can be found in the folder examples.

The method requires the following input:

  1. data file with d rows corresponding to features (important: the features are assumed to follow a natural ordering, such as genetic variants by their position on the DNA ) and n columns, corresponding to n samples. The values are supposed to be binary.
  2. label file with n rows, that contains the binary phenotype of the n samples. Samples are assumed to be in same ordering as in data file.
  3. feature file d rows, that contains the name of the d features. Samples are assumed to be in same ordering as in data file.
  4. edge file, where each row contains the names of the nodes adjacent to the edge in tab-separated format.
  5. mapping file, linking the features to the nodes in the network. Each row contains the name of the node followed by a white-space separated list of feature names.
  6. target FWER, the target family wise error rate, default: 0.05.
  7. covariate file (optional) with n rows. Each row contains the index of the class of the corresponding sample. Samples are assumed to be in same ordering as in data file.

Usage information

Compilation (manual)

Note that the package relies on the Eigen-library. This library has to be linked upon re-compilation of the method. OpenMP is used for parallelization of permutation testing.

We provide a Makefile that may have to be adjusted for the compilation to work. You can compile the program using the following steps:

$ cd SiNIMin/C
$ make

If the compile step does not work, please try adjusting the compiler settings in the Makefile or use another compilation method.

Compilation (CMake)

Another way to compile the package involves compiling it using cmake. For Mac OS X, we recommend installing the following packages using Homebrew:

$ brew install cmake gcc eigen

After cloning this repository, the following steps are required to compile the package:

$ cd SiNIMin/C
$ mkdir build
$ cd build
$ cmake -DCMAKE_CXX_COMPILER=g++-9 ../
$ make

Optionally, the compiler version can also be changed if a more recent compiler is present. Compiling the package with the Apple version of the clang compiler (which is sometimes confusingly also present as g++ in the system) currently does not work.

Having compiled the package, it can optionally be installed by issuing

$ make install

from the build directory created above.

Installation using Homebrew (Mac OS X )

For Mac OS X, we recommend installing the package using the Homebrew package manager:

$ brew install --cc=gcc BorgwardtLab/mlcb/sinimin

Afterwards, the package can be automatically used on the command-line.

Example usage

Examples on how to execute the methods SiNIMin and SiNIMin-WY can be found in examples/runs with corresponding data in examples/data. The executable for both methods is called sinimin and can be found in SiNIMin/compiled.

./sinimin \
  -i "${data_file}" \
  -l "${labels_file}" \
  -c "${covariate_file}" \
  -m "${mapping_file}" \
  -e "${edge_file}" \
  -s "${feature_file}" \
  -f 0.05 \
  -o "${output_prefix}" \

There exist additional flags that can be set, namely:

  -d ${maxlen} \
  -n ${number_threads} \
  -p ${number_permutations} 

The -d flag toggles the maximum length of intervals to be tested. For example, if d is set to 1, only interactions between single features are tested. The -p flag toggles the number of permutations. If this flag is set, SiNIMin-WY is executed, i.e. Westfall-Young permutations are used to estimate family-wise error rates. The -n flag sets the number of processes. This parameter only results in a speed-up for permutation testing. sinimin uses OMP to parallelize.

Help

If you have questions concerning SiNIMin or you encounter problems when trying to build the tool under your own system, please open an issue in the issue tracker. Try to describe the issue in sufficient detail in order to make it possible for us to help you.

Contact

anja.gumpinger@bsse.ethz.ch

Releases

No releases published

Packages

No packages published