This repository contains the code for the Significant Network Interval Mining approach, short SiNIMin, and its permutation-testing based counterpart SiNIMin-WY. The methods are described in Network-guided detection of candidate intervals that exhibit genetic heterogeneity (under review).
Assuming we are given a data set of n samples with d binary features. An example of all files can be found in the folder examples.
The method requires the following input:
- data file with d rows corresponding to features (important: the features are assumed to follow a natural ordering, such as genetic variants by their position on the DNA ) and n columns, corresponding to n samples. The values are supposed to be binary.
- label file with n rows, that contains the binary phenotype of the n samples. Samples are assumed to be in same ordering as in data file.
- feature file d rows, that contains the name of the d features. Samples are assumed to be in same ordering as in data file.
- edge file, where each row contains the names of the nodes adjacent to the edge in tab-separated format.
- mapping file, linking the features to the nodes in the network. Each row contains the name of the node followed by a white-space separated list of feature names.
- target FWER, the target family wise error rate, default: 0.05.
- covariate file (optional) with n rows. Each row contains the index of the class of the corresponding sample. Samples are assumed to be in same ordering as in data file.
Note that the package relies on the Eigen-library. This library has to be linked upon re-compilation of the method. OpenMP is used for parallelization of permutation testing.
We provide a Makefile
that may have to be adjusted for the compilation
to work. You can compile the program using the following steps:
$ cd SiNIMin/C
$ make
If the compile step does not work, please try adjusting the compiler
settings in the Makefile
or use another compilation method.
Another way to compile the package involves compiling it using cmake
.
For Mac OS X, we recommend installing the following packages using
Homebrew:
$ brew install cmake gcc eigen
After cloning this repository, the following steps are required to compile the package:
$ cd SiNIMin/C
$ mkdir build
$ cd build
$ cmake -DCMAKE_CXX_COMPILER=g++-9 ../
$ make
Optionally, the compiler version can also be changed if a more recent
compiler is present. Compiling the package with the Apple version of the
clang
compiler (which is sometimes confusingly also present as g++
in the system) currently does not work.
Having compiled the package, it can optionally be installed by issuing
$ make install
from the build
directory created above.
For Mac OS X, we recommend installing the package using the Homebrew package manager:
$ brew install --cc=gcc BorgwardtLab/mlcb/sinimin
Afterwards, the package can be automatically used on the command-line.
Examples on how to execute the methods SiNIMin and SiNIMin-WY can be found in examples/runs with corresponding data in examples/data.
The executable for both methods is called sinimin
and can be found in SiNIMin/compiled.
./sinimin \
-i "${data_file}" \
-l "${labels_file}" \
-c "${covariate_file}" \
-m "${mapping_file}" \
-e "${edge_file}" \
-s "${feature_file}" \
-f 0.05 \
-o "${output_prefix}" \
There exist additional flags that can be set, namely:
-d ${maxlen} \
-n ${number_threads} \
-p ${number_permutations}
The -d
flag toggles the maximum length of intervals to be tested. For example, if d
is set to 1, only interactions between single features are tested.
The -p
flag toggles the number of permutations. If this flag is set, SiNIMin-WY is executed, i.e. Westfall-Young permutations are used to estimate family-wise error rates.
The -n
flag sets the number of processes. This parameter only results in a speed-up for permutation testing. sinimin
uses OMP to parallelize.
If you have questions concerning SiNIMin or you encounter problems when trying to build the tool under your own system, please open an issue in the issue tracker. Try to describe the issue in sufficient detail in order to make it possible for us to help you.