# Test over Synthetic Data

We will generate 1000 data points. Each data point will contain 150 predictors ($x_1, x_2, \dots, x_{150}$) and the response of the system ($y_n$). To generate the random data points by sampling $X=<x_1, x_2, \dots, x_{150}>$ from the binomial distribution with $n=150$ and $p=0.5$.

Let's first import the necessary modules.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

To generate $X$, we will use a simply `numpy` trick.

In [2]:
X = np.random.rand(1000, 70)
X = np.round(X)
X

array([[1., 1., 1., ..., 1., 0., 0.],
       [1., 0., 0., ..., 0., 0., 1.],
       [0., 0., 1., ..., 1., 0., 1.],
       ...,
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 1., 0., 0.]])

Now that we have $X$, it's time to generate our synthetic system's response $Y$. To do that, first we'll need to identify the noisy function of actual predictors according to which response is made. Since our statistical method works for sparse models, we say that this underlying function depends only on the following 7 predictors out of 150 predictors:
                $$<x_{13}, x_{31}, x_{38}, x_{55}>$$

We'll represent this function by $F: \{0,1\}^7 \rightarrow \mathbb{Z}_{3} $, where $\mathbb{Z}_{3} = \{0, 1, 2\}$

In [3]:
l = []
for i in range(2 ** 4):
    l.append([i&(2**j) != 0 for j in range(3, -1, -1)])
X = np.array(l)
Y = np.random.dirichlet([0.05]*3, (2**4))
xi = np.core.defchararray.add('x', np.char.mod('%d', np.array([55, 38, 31, 13])))
yi = np.core.defchararray.add('y', np.char.mod('%d', np.arange(Y.shape[1])))
cols = np.append(xi, yi)

X.shape, Y.shape
X = np.hstack((X, Y))
df = pd.DataFrame(X, columns=cols)
df.to_csv("data/bool_func.csv")
df.to_excel("data/bool_func.xlsx")

df

Unnamed: 0,x55,x38,x31,x13,y0,y1,y2
0,0.0,0.0,0.0,0.0,0.7715613,4.817829e-10,0.2284387
1,0.0,0.0,0.0,1.0,0.0002581223,0.9997419,1.23861e-21
2,0.0,0.0,1.0,0.0,0.9999999,7.076523e-08,2.697647e-08
3,0.0,0.0,1.0,1.0,0.9032021,1.337596e-14,0.09679786
4,0.0,1.0,0.0,0.0,0.9925948,0.00467069,0.002734479
5,0.0,1.0,0.0,1.0,0.878221,6.807603999999999e-42,0.121779
6,0.0,1.0,1.0,0.0,1.298939e-11,1.0,7.649856e-39
7,0.0,1.0,1.0,1.0,0.9986412,5.024311e-07,0.001358252
8,1.0,0.0,0.0,0.0,0.0003150446,0.999685,1.792178e-25
9,1.0,0.0,0.0,1.0,2.755406e-26,1.0,1.7994990000000002e-17


In [4]:
def fun(xi):
    """Takes a row of truth table and returns the response 
    according to the function x0"""