# Simulation Background

1. Simulate the true event time vectors for both target and source, denoted as $\mathbf{Y}_s\mathbf{Y}_t$: 

*The source model is*

$$\log y_s = \mathbf{x}_s^T\boldsymbol{\omega}_s+\sigma_s\epsilon_s,\mathbf{x}_s\sim\mathcal{N}(\mathbf{0},I),\epsilon_s\sim\mathcal{N}(0, 1)$$ 

*and the target model:*

$$\log y_t = \mathbf{x}_t^T\boldsymbol{\beta}+\sigma_t\epsilon_t,\mathbf{x}_t\sim\mathcal{N}(\mathbf{0},I),\epsilon_t\sim\mathcal{N}(0, 1).$$

Generate 10000 $y_s$ and 1000 $y_t$ using $\boldsymbol{\beta},\boldsymbol{\omega}\in R^{500}$ and $\mathbf{x}_t,\mathbf{x}_s\in R^{500}$. Note that for each pair $(\omega_j,\beta_j),j=1,\cdots,500$, we have

$$(\omega_j,\beta_j)\sim^{i.i.d}\mathcal{N}\left(0,\frac{1}{p}\left(\begin{matrix}\alpha_s^{2}&\rho\alpha_s\alpha_t\\\rho\alpha_s\alpha_t&\alpha_t^{2}\end{matrix}\right)\right)$$. Repeat the process for 10 times, then set $\beta_j$ equalling to the average of the 10 $\beta_js$

*We consider right censoring,*
- Assume 20% of source populations are censored and 40% of target population are censored
- We observe $(Y_{s},\delta_{s})\ (Y_{t},\delta_{t})$, where $\delta_i,\ i\in\{s,t\}$ is the binary censoring indicator, with 1 denoting event and 0 denoting censoring.

2. Split the target data into testing and training part in a ratio of 1:9, named as $X_{target\ training}$
 and $X_{target\ testing}$
3. Apply the methods  (we use CoxKL in this setting; can also consider using Tian Gu's Angle TL,RF,Commute - BUT binary outcomes only) to obtain $\widehat{\boldsymbol{\omega}}_1,\cdots,\widehat{\boldsymbol{\omega}}_{10}$ to estimate $\widehat{\beta}$, and then obtain $X_{target\ training}\widehat{\beta}$ 
4. We then obtain $X_{target\ training}\boldsymbol{\omega}_1,\cdots,X_{target\ training}\boldsymbol{\omega}_{10}$
5. Regress $Y_{target\ testing}$ on $X_{target\ training}\boldsymbol{\omega}_1,\cdots,X_{target\ training}\boldsymbol{\omega}_{10},X_{target\ training}\widehat{\beta}$ with $L_2$ penalty.

$$\gamma=\arg\min \prod_{i=1}^nf(y_i)+\lambda\lVert\gamma\lVert_2^2$$

where $$f(y_i)$$ is the log-normal density and $$\log Y_{target\ testing}=X_{target\ training}\boldsymbol{\omega}_1\gamma_1+\cdots+X_{target\ training}\boldsymbol{\omega}_{10}\gamma_{10}+ X_{target\ training}\widehat{\beta}\gamma_{11}+\boldsymbol\epsilon$$

# Simulation Codes

In [2]:
import numpy as np

np.random.seed(123)  # Set a random seed for reproducibility

# Set the dimensions and number of repetitions
n_reps = 10
n_variables = 500
n_source = 10000
n_target = 1000

# Set the parameters
alpha_s = 1.5
alpha_t = 2.0
rho = 0.5
p = n_variables

# Set the censoring proportions
censor_prop_s = 0.2
censor_prop_t = 0.4

# Initialize the beta_j values
beta_j_values = np.zeros(n_variables)

# Loop for repetitions
omega = {}
beta = {}
X_s ={}
X_t = {}
for rep in range(n_reps):
    # Generate the omega_j and beta_j values
    params = np.random.multivariate_normal(
        mean=np.zeros(2), cov=(1/p)*np.array([[alpha_s**2, rho*alpha_s*alpha_t],
                                              [rho*alpha_s*alpha_t, alpha_t**2]]), size=n_variables)
    beta[rep] = params[:,1]
    omega[rep] = params[:,0]
    # Generate X_s and X_t
    X_s[rep] = np.random.normal(0, 1, size=(n_source,n_variables))
    X_t[rep] = np.random.normal(0, 1, size=(n_target,n_variables ))

beta = sum(beta.values())/len(beta)

Y_s = {}
Y_t = {}
# Generate true event times for source and target
for rep in range(n_reps):
    Y_s[rep] = np.exp(np.dot(X_s[rep], beta) + alpha_s * np.random.normal(0, 1, size=(n_source)))
    Y_t[rep] = np.exp(np.dot(X_t[rep], beta) + alpha_t * np.random.normal(0, 1, size=(n_target)))

# Generate censoring indicators
delta_s = {}
delta_t = {}
for rep in range(n_reps):
    delta_s[rep] = np.random.binomial(1, 1 - censor_prop_s, size=(n_source))
    delta_t[rep] = np.random.binomial(1, 1 - censor_prop_t, size=(n_target))



In [3]:
# Print the dimensions of the generated data
print("Shape of Y_s:", Y_s[0].shape)
print("Shape of Y_t:", Y_t[0].shape)
print("Shape of delta_s:", delta_s[0].shape)
print("Shape of delta_t:", delta_t[0].shape)


Shape of Y_s: (10000,)
Shape of Y_t: (1000,)
Shape of delta_s: (10000,)
Shape of delta_t: (1000,)


In [4]:
from sklearn.model_selection import train_test_split

# Example target data
target_data = X_t[0]
# Split the target data into training and testing sets
train_data, test_data = train_test_split(target_data, test_size=0.1, random_state=42)

# Print the sizes of training and testing data
print("Training data size:", len(train_data))
print("Testing data size:", len(test_data))


Training data size: 900
Testing data size: 100


In [6]:
X_t[0]

array([[ 0.21625692, -0.48105502,  0.71760162, ...,  0.53071139,
        -0.28713517,  0.06196288],
       [-0.19992898, -1.50499575, -0.31832847, ..., -0.56347882,
        -2.05242369, -0.23544354],
       [-0.10844845,  0.19106596,  0.69127869, ..., -0.74100437,
        -0.26464787,  1.46974139],
       ...,
       [-0.25055417, -0.70811843, -0.90249597, ..., -0.12292629,
         0.38296632,  0.97308045],
       [-0.72488855, -1.95825164, -0.24491906, ...,  0.80668976,
         0.33001366,  1.74317542],
       [ 0.58276882, -0.79983819,  0.85262011, ..., -0.20649288,
         0.26881924,  0.30352573]])