This code provides a method to fit the Cascade Nonlinear Additive Noise Models as well as to discover the causal direction on the data that contain the intermediate latent variables. It builds upon the pytorch and R environment, and we also publish a docker image for running this code.
Please cite "Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, Zhifeng Hao. Causal Discovery with Cascade Nonlinear Additive Noise Models. IJCAI 2019."
install.packages("devtools")
devtools::install_github("DMIRLAB-Group/CANM")
This code also needs the python3 and pytorch environment with packages torch
, scipy
, numpy
installed. You may need to select the specify python interpreter in R by using:
library(reticulate)
use_python("path/to/python")
py_config() # see the configuration
Due to the complexity of this developing environment, we also provide a Dockerfile (inst/Dockerfile
) for building this environment to run this code. For your convenience, we have published the image to the cloud https://hub.docker.com/r/qiaojie/canm. Note that, the nvidia-docker is required for using GPU.
First, fetch the docker image:
docker pull qiaojie/canm:latest
Or you may want to build it by yourself:
cd inst
docker build . -t qiaojie/canm:latest
After all, you could run the demo by using:
$ nvidia-docker run -it qiaojie/canm
$ R
> library(CANM)
> ?CANM
> set.seed(0)
> data=CANM_data(depth=2,sample_size=5000)
> lxy=CANM(data[,1],data[,2])
> lyx=CANM(data[,2],data[,1])
> if(max(lxy$train_score)>max(lyx$train_score)){
+ print("X->Y")
+ }else{
+ print("Y->X")
+ }
Or just give up the GPU accelerate and run the image using: docker run -it qiaojie/canm
This package contains the data synthetic process for CANM that present in the paper (see CANM_data()
). Here are some examples to make a quick start:
library(CANM)
set.seed(0)
data=CANM_data(depth=2,sample_size=5000)
lxy=CANM(data[,1],data[,2])
lyx=CANM(data[,2],data[,1])
if(max(lxy$train_score)>max(lyx$train_score)){
print("X->Y")
}else{
print("Y->X")
}
# IGCI method
set.seed(0)
x=rnorm(100)
y=exp(x)
f=IGCI(x,y,refMeasure=2,estimator=1)
if(f<0){
print("X->Y")
}else{
print("Y->X")
}
# ANM using XGBoost regression
set.seed(0)
x=rnorm(1000)
y=exp(x)+rnorm(1000)
result=ANM_XGB(x,y)
if(result$HSIC_xy<result$HSIC_yx){
print("X->Y")
}else{
print("Y->X")
}
If you are not an R user, you may use the inst/python/CANM.py
directly, however, it does not include the algorithm for searching the number of latent variables. So you have to specify the number of latent variables N
.