1. Simulate the true event time vectors for both target and source, denoted as $\mathbf{Y}_s\mathbf{Y}_t$: 

*The source model is*

$$y_s = \left(\frac{-\log U}{\lambda\exp\mathbf{x}_s^T\boldsymbol{\omega}_s}\right)^{1/\nu},U\sim \mathcal{U}(0,1)$$

where $\lambda$ and $\nu$ are parameters of Weibull distributions.

*The target model is:*

$$y_t = \left(\frac{-\log U}{\lambda\exp\mathbf{x}_t^T\boldsymbol{\omega}_t}\right)^{1/\nu}.$$

Generate 100 $y_s$ and 40 $y_t$  using $\boldsymbol{\beta},\boldsymbol{\omega}\in R^{5}$ and $\mathbf{x}_t,\mathbf{x}_s\in R^{5}$. Note that for each pair $(\omega_j,\beta_j),j=1,\cdots,5$, we have

$$(\omega_j,\beta_j)\sim^{i.i.d}\mathcal{N}\left(1,\left(\begin{matrix}\alpha_s^{2}&\rho\alpha_s\alpha_t\\\rho\alpha_s\alpha_t&\alpha_t^{2}\end{matrix}\right)\right)$$. 

*We consider right censoring,*

- Assume 20% of source populations are censored and 40% of target population are censored, the censoring time is
$$C_s,C_t\sim_{iid}Weibull(\lambda_c,\nu_c).$$

- We observe $(Y_{s},\delta_{s})\ (Y_{t},\delta_{t})$, where $\delta_i,\ i\in\{s,t\}$ is the binary censoring indicator, with 1 denoting event and 0 denoting censoring.
<font color="red">Among the five covariates, three of them are continuous, $Z_1\sim\mathcal{N}(1.05,0.0225), Z_2\sim\mathcal{N}(30,25), Z_3\sim\mathcal{N}(90,25)$. Two of them are discrete $Z_4,Z_5 \sim Ber(0.5).$</font>

- Creatinine is a waste product produced by the muscles and is filtered from the blood by the kidneys. It's commonly used as a marker for kidney function, and its levels in the bloodstream can indicate how well the kidneys are working.The normal range for creatinine in the blood varies by age, sex, and muscle mass. Here are the general reference ranges for serum creatinine in adults:
    * Men: 0.74 to 1.35 mg/dL

- The Urine Albumin-to-Creatinine Ratio (UACR) measures the amount of albumin in the urine compared to creatinine. It's a commonly used test to detect early kidney damage, especially in people with diabetes or hypertension.The normal range for UACR is: 
    * Less than 30 mg/g:
    * Normal 30-299 mg/g: Moderately increased (sometimes termed "microalbuminuria") 
    * 300 mg/g and above: Severely increased (sometimes termed "macroalbuminuria")

- The estimated glomerular filtration rate (eGFR) is a test used to assess how well the kidneys are functioning. It is estimated based on a formula that includes serum creatinine levels, age, gender, and sometimes other factors.The normal eGFR range varies by age, as kidney function can decrease naturally with age. In adults, the general breakdown for eGFR values is:

    * eGFR >90 mL/min/1.73 m²: Normal or high function
    * eGFR 60-89 mL/min/1.73 m²: Slightly decreased function; may be considered normal for some patients, especially the elderly.
    * eGFR 45-59 mL/min/1.73 m²: Mildly decreased function (stage 3a chronic kidney disease, CKD)
    * eGFR 30-44 mL/min/1.73 m²: Moderately decreased function (stage 3b CKD)
    * eGFR 15-29 mL/min/1.73 m²: Severely decreased function (stage 4 CKD)
    * eGFR <15 mL/min/1.73 m²: Kidney failure (stage 5 CKD or end-stage renal disease)

2. Split the target data into testing and training part in a ratio of 1:1, named as $Z_{target\ training}$
 and $Z_{target\ testing}$
3. Apply the methods  (we use CoxKL in this setting) to obtain $\widehat{\boldsymbol{\omega}}$ to estimate $\widehat{\beta}$, and then obtain $Z_{target\ testing}\widehat{\beta}$.
4. Compute C-index and other measures of performances.
5. Repeat step 3 and 4 using different values of $\eta$.


# Simulation Codes

In [23]:
import numpy as np
from scipy.stats import multivariate_normal, weibull_min, bernoulli
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Set random seed for reproducibility
np.random.seed(0)

# Parameters
N_s, N_t = 100, 40
dim = 5
alpha_s, alpha_t, rho = 1, 1, 0.5  # adjust these as needed
cov_matrix = [[alpha_s**2, rho*alpha_s*alpha_t], 
              [rho*alpha_s*alpha_t, alpha_t**2]]

lambda_val, nu = 1, 1  # adjust these as needed

# Generate omega, beta from multivariate normal
omega_beta = multivariate_normal.rvs([1, 1], cov_matrix, size=dim)
omega_s, omega_t = omega_beta[:, 0], omega_beta[:, 1]

# Generate covariates
Z1 = np.random.normal(1.05, 0.0225, (N_s + N_t, 1))
Z2 = np.random.normal(30, 5, (N_s + N_t, 1))
Z3 = np.random.normal(90, 5, (N_s + N_t, 1))
Z4 = bernoulli.rvs(0.5, size=(N_s + N_t, 1))
Z5 = bernoulli.rvs(0.5, size=(N_s + N_t, 1))

# Scaling only the continuous covariates
scaler = StandardScaler()
Z_continuous = np.hstack([Z1, Z2, Z3])
Z_continuous = scaler.fit_transform(Z_continuous)

X = np.hstack([Z_continuous, Z4, Z5])

X_s, X_t = X[:N_s], X[N_s:]

# Simulate event times
U = np.random.uniform(0, 1, N_s)
y_s = ((-np.log(U) / (lambda_val * np.exp(X_s.dot(omega_s))))**(1/nu))

U = np.random.uniform(0, 1, N_t)
y_t = ((-np.log(U) / (lambda_val * np.exp(X_t.dot(omega_t))))**(1/nu))

# Censoring
lambda_c, nu_c = 2.5, 1  # adjust these as needed
C_s = weibull_min.rvs(c=nu_c, scale=lambda_c, size=N_s)
C_t = weibull_min.rvs(c=nu_c, scale=lambda_c, size=N_t)

y_s_obs = np.minimum(y_s, C_s)
y_t_obs = np.minimum(y_t, C_t)

delta_s = (y_s <= C_s).astype(int) #delta =1 indicates event 
delta_t = (y_t <= C_t).astype(int)

# Split target data
X_t_train, X_t_test, y_t_train_obs, y_t_test_obs, delta_t_train, delta_t_test = train_test_split(
    X_t, y_t_obs, delta_t, test_size=0.5, random_state=42
)



In [34]:
import pandas as pd

# Assuming you've already generated X_s, X_t, y_s, and y_t from the previous code...

# Convert to DataFrames
df_X_s = pd.DataFrame(X_s)
df_X_t = pd.DataFrame(X_t)
df_y_s = pd.DataFrame(y_s, columns=['y_s'])
df_y_t = pd.DataFrame(y_t, columns=['y_t'])

# Save to CSV files
df_X_s.to_csv('X_s.csv', index=False)
df_X_t.to_csv('X_t.csv', index=False)
df_y_s.to_csv('y_s.csv', index=False)
df_y_t.to_csv('y_t.csv', index=False)
