# Using rpy2 Wrapper with T1D Data

Trying causal analysis on T1D data simply using patient type as environment label:
1. T1D
2. FDR
3. SDR
4. Ctl
5. Other

## Setup

### Importing libraries

In [1]:
import numpy as np
import joblib
import pandas as pd

### Importing R wrapper library

In [2]:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri

r = robjects.r
numpy2ri.activate()

### Calling R to import the InvariantCausalPrediction package

In [3]:
ICP = importr('InvariantCausalPrediction')

## T1D Data

### Defining a function to filter the data (nan) 

In [4]:
def filter_data(X,y):
    indexes = df['Diabetes'].notna()
    return X[indexes],y[indexes]

### Defining a function to set the target labels

In [5]:
def define_labels(y):
    labels = np.zeros_like(y,dtype='int')
    labels[y=='T1D'] = 1
    return labels  

### Defining a function to set the environment labels

In [6]:
def define_indexes(y):
    indexes = np.zeros_like(y,dtype='int')
    indexes[y=='T1D'] = 0
    indexes[y=='FDR'] = 1
    indexes[y=='SDR'] = 2
    indexes[y=='OTHER'] = 3
    indexes[y=='Ctl'] = 4
    return indexes

### (Load tiny dataset to check the configuration)

### Load real T1D data

In [None]:
X = pd.read_csv("design_matrix.csv").to_numpy()

df = pd.read_csv("labels.csv")
dy = df['Diabetes']

X,dy = filter_data(X,dy)
ny = dy.to_numpy()

y = define_labels(ny)
idx = define_indexes(ny)

print('Loaded data')

### Converting into R objects and calling ICP

In [10]:
rY = robjects.vectors.FloatVector(y)
rIdx = robjects.vectors.FloatVector(idx)
model = ICP.ICP(X,rY,rIdx)


 out put of 'table(ExpInd)':
 
ExpInd

0 
1 
2 


5 
4 
1 




  one environment has just one or two observations (as supplied by 'ExpInd'); there need to be at least 3 (and ideally dozens) of observations in each environment; the output of 'table(ExpInd)' is given below to show the number of observations in each unique environment as supplied by 'ExpInd'



RRuntimeError: Error in (function (X, Y, ExpInd, alpha = 0.01, test = "normal", selection = c("lasso",  : 
  one environment has just one or two observations (as supplied by 'ExpInd'); there need to be at least 3 (and ideally dozens) of observations in each environment; the output of 'table(ExpInd)' is given below to show the number of observations in each unique environment as supplied by 'ExpInd'


### Printing the result

In [11]:
ICP.print_InvariantCausalPrediction(model)


 Invariant Linear Causal Regression at level 0.01 (including multiplicity correction for the number of variables)

 Variables: Variable_1, Variable_2 show a significant causal effect

 
 
          
  LOWER BOUND
  UPPER BOUND
  MAXIMIN EFFECT
  P-VALUE
    

Variable_1
         0.85
         1.04
            0.85
   <1e-09
 ***

Variable_2
         0.85
         1.04
            0.85
   <1e-09
 ***

Variable_3
        -0.04
         0.01
            0.00
        1
    

Variable_4
        -0.05
         0.08
            0.00
        1
    

Variable_5
         0.00
         0.30
            0.00
        1
    


---
Signif. codes:  

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1







rpy2.rinterface.NULL

### Plotting the result

In [12]:
ICP.plot_InvariantCausalPrediction(model)





rpy2.rinterface.NULL