## Examples of WAVE Ensemble Model Fitting  

This notebook illustrates examples of using WAVE. 

### 1. Import WAVE and other modules

The implementation of **WAVE class** is included in the **wave.py** file. To import WAVE class, the wave.py needs to be saved in the same directory of this notebook. WAVE class is built based on two other modlues: **numpy** and **sklearn**. These two modules are imported at the beginning of the wave.py. When we import WAVE in this notebook, numpy and sklearn are automatically imported as well. We also need to import **pandas** for data processing.

In [2]:
# import wave
from wave import *

# import pandas
import pandas as pd

A simple description of WAVE class can be found by running the following cell:

In [3]:
?WAVE()

### 2. Example: Fit a Weight-Adjusted CERP ensemble

Weight-Adjusted CERP is an ensemble method designed for high-dimensional data that applies WAVE to the CERP base ensemble. This example uses a high dimensional data called imprinting data set. The imprinting data set is included in the repo as imp.txt. 

Load up the imprinting data as a data frame:

In [4]:
imp = pd.read_csv("imp.txt", sep=" ")

Check the first 5 instances of the imprinting data:

In [5]:
imp.head()

Unnamed: 0,SIMREP.UPSS5,SIMREP.UPSC5,SIMREP.DNSC5,SIMREP.DNSS5,SIMREP.DNES5,SIMREP.DNEC5,SIMREP.BDYS10,SIMREP.BDYC10,SIMREP.UPSS10,SIMREP.UPSC10,...,s14,s15,s16,s17,s18,s19,s20,m1,m2,Y
0,0,0,2,66,34,1,66,2,0,0,...,0,3,2,2,0,0,0,0,0,1
1,0,0,0,0,0,0,629,4,66,2,...,0,0,1,1,2,0,0,3,1,1
2,0,0,1,41,275,2,325,5,283,2,...,2,0,1,1,0,0,1,0,0,1
3,0,0,0,0,40,1,0,0,0,0,...,3,0,0,0,1,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,7,1


The imprinting data has 131 instances, each of which has 1254 features. The column "Y" denotes the target variable, which can take either 0 or 1. 

In [6]:
imp.shape

(131, 1255)

Split the data set into features and targe values as train_X and train_y respectively:

In [9]:
xcols = [col for col in imp.columns if col not in ["Y"]]
# train_X is a 2-d array, train_y is a 1-d array
train_X = imp[xcols].values  #shape: (131, 1254)
train_y = imp["Y"].values  # shape: (131,)

Initialize the Weight-Adjusted CERP model:

In [10]:
# set the ensemble size to be 10
wacerp = WAVE(base_ensemble="cerp", ensemble_size=10)

In [11]:
wacerp.fit(train_X, train_y)

In [102]:
ls

examples.ipynb  README.md  wave.py


In [1]:
from wave import*

In [2]:
import pandas as pd

In [30]:
imp = pd.read_csv("imp.txt", sep=" ")

In [31]:
imp.shape

(131, 1255)

In [32]:
imp.head()

Unnamed: 0,SIMREP.UPSS5,SIMREP.UPSC5,SIMREP.DNSC5,SIMREP.DNSS5,SIMREP.DNES5,SIMREP.DNEC5,SIMREP.BDYS10,SIMREP.BDYC10,SIMREP.UPSS10,SIMREP.UPSC10,...,s14,s15,s16,s17,s18,s19,s20,m1,m2,Y
0,0,0,2,66,34,1,66,2,0,0,...,0,3,2,2,0,0,0,0,0,1
1,0,0,0,0,0,0,629,4,66,2,...,0,0,1,1,2,0,0,3,1,1
2,0,0,1,41,275,2,325,5,283,2,...,2,0,1,1,0,0,1,0,0,1
3,0,0,0,0,40,1,0,0,0,0,...,3,0,0,0,1,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,7,1


In [33]:
xcols = [col for col in imp.columns if col not in ["Y"]]

In [34]:
train_X = imp[xcols].values
train_y = imp["Y"].values

In [5]:
?pd.read_csv()

In [3]:
est = pd.read_csv("estrogen.txt", sep=" ")

In [4]:
est.shape

(232, 253)

In [5]:
est.head()

Unnamed: 0,AUTO.ID,NCTRlogRBA,AI,nvx,nedges,nrings,ncircuits,nclass,nelem,ntpaths,...,SsCl,nketone,namide,nester,ncarboxylicacid,Sketone,Samide,Sester,Scarboxylicacid,Hcarboxylicacid
0,2,-0.36,1,20,22,3,4,18,3,858,...,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
1,3,-0.82,1,18,20,3,4,16,3,692,...,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
2,4,-1.65,1,19,21,3,4,17,3,767,...,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
3,5,-2.98,1,20,22,3,4,18,3,857,...,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0
4,6,-2.31,1,15,15,1,1,13,3,173,...,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0


In [6]:
xcols = [col for col in est.columns if col not in ["AUTO.ID", "NCTRlogRBA", "AI"]]

In [7]:
train_X = est[xcols].values
train_y = est["AI"].values

In [39]:
train_X.shape

(131, 1254)

In [40]:
train_y.shape

(131,)

In [41]:
rf = WAVE(ensemble_size=500)

In [42]:
rf.fit(train_X, train_y)

In [43]:
rf

<wave.WAVE at 0x7fc73b661630>

In [44]:
rf.get_weights()

array([[ 0.00194146],
       [ 0.00203078],
       [ 0.00197631],
       [ 0.00197123],
       [ 0.00186121],
       [ 0.00187683],
       [ 0.00200609],
       [ 0.00202824],
       [ 0.00213608],
       [ 0.00175192],
       [ 0.00191277],
       [ 0.00194327],
       [ 0.00203659],
       [ 0.00189825],
       [ 0.00219272],
       [ 0.00198321],
       [ 0.00200899],
       [ 0.00198212],
       [ 0.00214733],
       [ 0.00211611],
       [ 0.0019164 ],
       [ 0.00203332],
       [ 0.00191277],
       [ 0.00198938],
       [ 0.0018968 ],
       [ 0.00195925],
       [ 0.0020315 ],
       [ 0.00190406],
       [ 0.00200173],
       [ 0.00194   ],
       [ 0.00199883],
       [ 0.00201698],
       [ 0.00207834],
       [ 0.00179149],
       [ 0.00204784],
       [ 0.00196034],
       [ 0.00222576],
       [ 0.00189716],
       [ 0.00209904],
       [ 0.00201662],
       [ 0.0021339 ],
       [ 0.00205402],
       [ 0.00214915],
       [ 0.00218328],
       [ 0.00214878],
       [ 0

In [260]:
a = rf.base_classifiers[1].predict(train_X) == train_y

In [261]:
a = rf.base_classifiers[0].predict(train_X) == train_y
a = a.astype(int)

In [262]:
for i in range(1, 50):
    b = rf.base_classifiers[i].predict(train_X) == train_y
    b = b.astype(int)
    a = np.column_stack((a, b))

In [13]:
np.identity((3))

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [51]:
cerp = WAVE(ensemble_size=10, base_ensemble="cerp", min_samples_split_cerp=5)


In [52]:
cerp.fit(train_X, train_y)

[[  69    0    0 ...,    3    0    2]
 [  39    0    0 ...,    2    0    1]
 [   0    0    0 ...,    3    0    1]
 ..., 
 [ 365    0    1 ...,    7    0    2]
 [ 633   44    0 ...,    4    0    1]
 [ 303   38    5 ...,    1 2841    2]]
[[   0 9048   20 ...,    0    0    0]
 [   0  831    9 ...,    0    0    1]
 [   0  575   11 ...,    0    0    0]
 ..., 
 [   0  952    0 ...,    0    0    5]
 [   0  848    4 ...,    0    0    8]
 [   1 1773    5 ...,    0    0    5]]
[[    0     3  2133 ...,     0   502     0]
 [    0     6     0 ...,    32   867     0]
 [    0     0     0 ...,   544   288     0]
 ..., 
 [    1    17  3604 ...,     0 19471     0]
 [    0     4  1376 ...,     0 39032     2]
 [    0     8  3943 ...,     0 54877     0]]
[[   0    0    0 ...,    0    0    3]
 [   3    0    0 ...,    0    0    0]
 [   1    0    0 ...,    0    0    1]
 ..., 
 [   0    0   32 ...,    1    0    1]
 [   0 3762    0 ...,    7    0    1]
 [   0 6860    0 ...,    3    0    0]]
[[    0     0   896 

In [53]:
cerp.get_weights()

array([[ 0.11366285],
       [ 0.09113586],
       [ 0.09113586],
       [ 0.07928683],
       [ 0.11366285],
       [ 0.06758648],
       [ 0.11366285],
       [ 0.11366285],
       [ 0.1025407 ],
       [ 0.11366285]])

In [172]:
cerp.subfeatures_list

[array([156, 229, 182, 218,  50, 107,  44,   9, 161,   7,   5,  77]),
 array([ 15,  29, 116, 166, 117, 248,  61,  62, 200,  17, 139, 188, 115]),
 array([ 30,  85,   1, 244, 184, 150, 201,  19, 152, 162, 227,  53]),
 array([ 89, 121,  74,  65, 233, 169,  70,  67, 134, 137,  13, 206]),
 array([185, 175,  83, 178,  20, 189,  25, 210, 104,  90,  59,  82, 216]),
 array([ 51, 149, 136, 238, 109, 168, 110,   0, 195, 176, 221,  34, 101,  57]),
 array([ 63, 213,  71,  69, 145, 173,  21,  97,  31,   3,  60, 141]),
 array([111,  42, 181, 163,  73, 240,  41, 172,  18,  39, 113, 190]),
 array([ 84,   8,  38, 100, 192, 106, 234, 108, 220, 170, 103, 237,  86]),
 array([155,  66, 231,  79,  23,  95, 154, 133, 153, 232, 126, 114]),
 array([146, 148,  72,  40,  52, 167,  91, 224,  75, 186, 105, 135, 120]),
 array([177, 180,  81,  27, 198,  93,  10,  45,  14, 128,  78,  58]),
 array([159,  11,  37,  36, 204,   6, 143, 223,  46, 205, 242,  64,  54]),
 array([ 88, 222, 236, 207,  76, 247, 215,  16, 246,  5

In [173]:
idxes = cerp.subfeatures_list[0]
a = cerp.base_classifiers[0].predict(train_X[:,idxes]) == train_y

In [174]:
a = a.astype(int)

In [175]:
a.shape

(232,)

In [176]:
for i in range(1, 10):
    b = cerp.base_classifiers[i].predict(train_X[:, cerp.subfeatures_list[i]]) == train_y
    b = b.astype(int)
    a = np.column_stack((a, b))

In [177]:
a[:, 3]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1])

In [178]:
a[:,1] == a[:, 3]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [179]:
np.ones((2,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [180]:
a.T.shape

(10, 232)

In [181]:
diag_k = np.zeros((k, k))
np.fill_diagonal(diag_k, 1)

In [203]:
np.identity(k)

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [264]:
n = 232
k = 50

In [265]:
X = a
J_nk = np.ones((n, k))
J_kk = np.ones((k, k))

In [266]:
diag_k = np.zeros((k, k))
np.fill_diagonal(diag_k, 1)

In [267]:
T = np.matmul(np.matmul(X.T, (J_nk-X)), (J_kk-diag_k))

In [268]:
T1 = np.dot(np.dot(X.T, (J_nk - X)), (J_kk-diag_k))

In [282]:
T.shape

(50, 50)

In [285]:
?np.savetxt()

In [286]:
np.savetxt("haha.txt",T)

In [287]:
ls

[0m[01;32mestrogen.txt[0m*   haha.txt  [01;34m__pycache__[0m/  wave.py
examples.ipynb  [01;32mimp.txt[0m*  README.md


In [270]:
X[:,1]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1])

In [293]:
eig_values, eig_vectors = np.linalg.eig(T)[0].real, np.linalg.eig(T)[1].real

In [294]:
eig_values

array([  4.47609428e+04,   4.91442929e+01,   4.44375254e+01,
         4.29418298e+01,   4.01963545e+01,   3.86315986e+01,
         3.71164043e+01,   3.55697325e+01,   3.38156976e+01,
         3.14950668e+01,   2.90861487e+01,   2.83535797e+01,
         2.83535797e+01,   2.59982611e+01,   2.43116688e+01,
         2.32485758e+01,   2.21479022e+01,   2.12290333e+01,
         3.16449069e+00,   3.83943998e+00,   3.93178404e+00,
         2.02426655e+01,   1.98839129e+01,   4.65864195e+00,
         5.27834549e+00,   5.37237857e+00,   5.72434163e+00,
         5.87552590e+00,   6.32020758e+00,   6.72926011e+00,
         1.86015926e+01,   1.86015926e+01,   7.77471350e+00,
         1.76755761e+01,   8.69583321e+00,   9.19729633e+00,
         9.24736899e+00,   9.87272584e+00,   1.02896739e+01,
         1.08182952e+01,   1.69882602e+01,   1.62695915e+01,
         1.59754141e+01,   1.52119839e+01,   1.47768109e+01,
         1.39721185e+01,   1.18382838e+01,   1.19704189e+01,
         1.31757553e+01,

In [291]:
eig_vectors.real.shape

(50, 50)

In [274]:
max_eig_value = eig_values.max()

In [275]:
max_eig_value

(44760.942777783821+0j)

In [276]:
r = 0
idxes = []

for i in range(len(eig_values)):
    if eig_values[i] == max_eig_value:
        r += 1
        idxes.append(i)

In [277]:
r, idxes

(1, [0])

In [278]:
sigma = np.zeros((k, k))

In [279]:
for i in range(r):
    u = eig_vectors[:, idxes[i]]
    u = u.reshape((k, 1))
    u = u.astype(np.float64)
    sigma += u.dot(u.T)



In [280]:
k_1 = np.ones((k, 1))
ans = (sigma.dot(k_1)) / k_1.T.dot(sigma).dot(k_1)

In [281]:
ans

array([[ 0.01968353],
       [ 0.02045047],
       [ 0.01999045],
       [ 0.01948654],
       [ 0.02001243],
       [ 0.01983693],
       [ 0.02007817],
       [ 0.02031879],
       [ 0.01979332],
       [ 0.01931137],
       [ 0.02001241],
       [ 0.02034057],
       [ 0.02003413],
       [ 0.01983715],
       [ 0.02091021],
       [ 0.02049421],
       [ 0.02001238],
       [ 0.01990282],
       [ 0.02031913],
       [ 0.01878526],
       [ 0.02018742],
       [ 0.02102015],
       [ 0.02018755],
       [ 0.0208448 ],
       [ 0.02007767],
       [ 0.01907033],
       [ 0.02167714],
       [ 0.01955224],
       [ 0.02012188],
       [ 0.02130487],
       [ 0.01953036],
       [ 0.01946452],
       [ 0.02163336],
       [ 0.02007794],
       [ 0.02058205],
       [ 0.01926717],
       [ 0.02036253],
       [ 0.01974909],
       [ 0.02025347],
       [ 0.01854487],
       [ 0.02051653],
       [ 0.019859  ],
       [ 0.02132662],
       [ 0.0195959 ],
       [ 0.01935496],
       [ 0

In [115]:
u = eig_vectors[:, 0]
u.shape

(10,)

In [118]:
u = u.reshape((k, 1))
u.shape

(10, 1)

In [120]:
sigma += u.dot(u.T)

In [121]:
sigma

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [83]:
eig_values.shape

(10,)

In [84]:
eig_vectors.shape

(10, 10)

In [101]:
eig_vectors[0]

array([ 1.        , -0.66436384, -0.66436384, -0.66436384, -0.66436384,
       -0.66436384, -0.66436384, -0.66436384, -0.66436384, -0.66436384])

In [90]:
eig_values

array([ 909.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.,    0.])

In [95]:
eig_vectors[:,0]

array([ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [105]:
?np.argmax()

In [None]:
T = np.dot

In [59]:
np.diagflat([[1,2], [3,4]])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

In [65]:
np.diagflat(J_kk)

array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  1.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.]])

In [64]:
J_kk.shape

(10, 10)

In [68]:
type(rf.base_classifiers[0].predict(train_X[1].reshape((1, -1)))[0])

numpy.int64

In [73]:
new_X = train_X[1]

In [74]:
new_X.shape

(1254,)

In [75]:
len(new_X.shape)

1

In [78]:
train_X[1].shape

(1254,)

In [80]:
?DecisionTreeClassifier()

In [76]:
train_X[1].reshape((1, -1)).shape

(1, 1254)

In [72]:
set(train_y)

{0, 1}

In [22]:
cerp = WAVE(ensemble_size=10, base_ensemble="cerp")

In [23]:
cerp.fit_base_classifiers(train_X, train_y)

In [26]:
cerp.base_classifiers[1].predict(train_X)

ValueError: Number of features of the model must match the input. Model n_features is 25 and input n_features is 250 

In [34]:
a = np.array((1,2,3))
b = np.array((2,3,4))
c = np.column_stack((a,b))
c

array([[1, 2],
       [2, 3],
       [3, 4]])

In [35]:
d = np.column_stack((c, a))
d

array([[1, 2, 1],
       [2, 3, 2],
       [3, 4, 3]])

In [295]:
?DecisionTreeClassifier()