# Parameter tunning

### proxy strength (more or less informative about U): sigma_z, sigma_v

| Corr(Z, U) | σ (sigma) |
|------------|-----------|
| 0.9        | 0.73      |
| 0.8        | 1.125     |
| 0.7        | 1.53      |
| 0.6        | 2.0       |
| 0.5        | 2.6       |
| 0.4        | 3.44      |
| 0.3        | 4.77      |
| 0.2        | 7.35      |
| 0.1        | 14.93     |

* we fix $a_z, a_v = 1.5$ for all cases.
* The table for sigma_v are the same as sigma_z.

### strength of confounding (how much U affects treatment assignment W and outcome T): gamma_u_in_w, beta_u_in_t


| level    | gamma_u_in_w | beta_u_in_t |
|----------|---------|--------|
| none     | 0.0     | 0.0    |
| weak     | 0.2     | 0.2    |
| moderate | 0.5     | 0.5    |
| strong   | 1.0     | 1.0    |
| extreme  | 2.0     | 2.0    |

### strength of treatment effect: tau_log_hr

| tau_log_hr value | Interpretation                          |
|------------------|------------------------------------------|
| 0                | No treatment effect                      |
| < 0              | Beneficial (lower hazard)      |
| > 0              | Harmful (higher hazard)        |

**possible range**

| Effect size category        | tau_log_hr range | Hazard ratio range (exp(tau)) | Interpretation |
|----------------------------|------------------|-------------------------------|----------------|
| Null / negligible          | [-0.1, 0.1]      | [0.90, 1.11]                  | Little to no effect |
| Small to moderate (typical)| [-0.3, 0.3]      | [0.74, 1.35]                  | Plausible clinical effects |
| Moderate to large          | [-0.7, 0.7]      | [0.50, 2.01]                  | Strong but still realistic |
| Extreme (use with caution) | < -1 or > 1      | < 0.37 or > 2.7               | Often unrealistic / unstable |

# Testing

In [5]:
# Add parent directory to Python path to import nc_csf module
import sys
from pathlib import Path

# Get the parent directory (where nc_csf folder is located)
parent_dir = Path.cwd().parent
if str(parent_dir) not in sys.path:
    sys.path.insert(0, str(parent_dir))
    
print(f"Added to path: {parent_dir}")

Added to path: c:\Users\17414\OneDrive\桌面\USC\negavitve-control-proxies-causual-survival-forest


In [6]:
from nc_csf.evaluate_performance import run_experiment

## Linear in treatment and outcome

### Informative proxies & strong confounding & harmful large treatment effect

In [7]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.9,
    k_t=1.5, lam_t=0.4, tau_log_hr=-0.7,beta_u_in_t=1.1,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=1.125, aW=1.5, sigma_w=1.53,
    linear_treatment=True, linear_outcome=True
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE     Slope
Baseline 0.494894 0.337712 -0.337428  -0.175691   -0.091064  1.411933 -0.024055
  NC-CSF 0.381664 0.207825 -0.148114   0.042018    0.119091  1.088888  0.005728
  Oracle 0.333073 0.209913  0.002901   0.338023    0.439230  0.950259  0.069997


### Informative proxies & weak confounding & benificial small treatment effect

In [8]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.2,
    k_t=1.2, lam_t=0.8, tau_log_hr=0.25,beta_u_in_t=0.2,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=1.125, aW=1.5, sigma_w=1.53,
    linear_treatment=True, linear_outcome=True
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.155247 0.113517 -0.093727   0.755087    0.756274  0.982336 0.894132
  NC-CSF 0.125633 0.089334 -0.043291   0.715720    0.689610  0.794953 0.700960
  Oracle 0.120716 0.086308 -0.025591   0.694556    0.626544  0.763840 0.621106


### Non informative proxies & strong confounding & benificial moderate treatment effect

In [9]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.9,
    k_t=1.3, lam_t=0.2, tau_log_hr=0.5,beta_u_in_t=1.1,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=7.35, aW=1.5, sigma_w=4.77,
    linear_treatment=True, linear_outcome=True
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.198337 0.172323 -0.149418   0.398740    0.512618  1.481067 0.289998
  NC-CSF 0.129102 0.087032 -0.032379   0.372565    0.478644  0.964058 0.175710
  Oracle 0.125630 0.069520  0.007463   0.351096    0.450981  0.938128 0.117828


### Non informative proxies & weak confounding & near null treatment effect

In [10]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.25,
    k_t=1.5, lam_t=0.3, tau_log_hr=0.12,beta_u_in_t=0.2,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=7.35, aW=1.5, sigma_w=4.77,
    linear_treatment=True, linear_outcome=True
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.030162 0.023517 -0.021932   0.673624    0.571145  1.864613 1.158585
  NC-CSF 0.019233 0.014390 -0.002777   0.500920    0.327622  1.188953 0.650542
  Oracle 0.017834 0.013211  0.002177   0.479570    0.277701  1.102521 0.543886


## Non linear in treatment & outcome

### Informative proxies & strong confounding & harmful large treatment effect

In [11]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.9,
    k_t=1.5, lam_t=0.4, tau_log_hr=-0.7,beta_u_in_t=1.1,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=1.125, aW=1.5, sigma_w=1.53,
    linear_treatment=False, linear_outcome=False
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.360168 0.192136 -0.116954   0.387867    0.423109  0.980482 0.110926
  NC-CSF 0.343759 0.193208 -0.056823   0.385099    0.428474  0.935810 0.152482
  Oracle 0.339858 0.191647 -0.036272   0.417826    0.471603  0.925192 0.114383


### Informative proxies & weak confounding & benificial small treatment effect

In [13]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.2,
    k_t=1.2, lam_t=0.8, tau_log_hr=0.25,beta_u_in_t=0.2,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=1.125, aW=1.5, sigma_w=1.53,
    linear_treatment=False, linear_outcome=False
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.104613 0.075762 -0.034569   0.768777    0.739920  0.681048 0.641254
  NC-CSF 0.109013 0.075930 -0.026375   0.726103    0.716842  0.709696 0.554875
  Oracle 0.102872 0.070289 -0.019114   0.753341    0.742354  0.669719 0.585753


### Non informative proxies & strong confounding & benificial moderate treatment effect

In [14]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.9,
    k_t=1.3, lam_t=0.2, tau_log_hr=0.5,beta_u_in_t=1.1,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=7.35, aW=1.5, sigma_w=4.77,
    linear_treatment=False, linear_outcome=False
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.138068 0.082133 -0.014376   0.290498    0.400982  0.964417 0.065150
  NC-CSF 0.142719 0.073427  0.017381   0.147254    0.250705  0.996902 0.025751
  Oracle 0.141095 0.071067  0.021433   0.230219    0.298009  0.985561 0.042949


### Non informative proxies & weak confounding & near null treatment effect

In [15]:
results_df, cate_te, pred_baseline, pred_nccsf, pred_oracle = run_experiment(
    n=5000, p_x=10, seed=123,
    a_prevalence=0.5, gamma_u_in_a=0.25,
    k_t=1.5, lam_t=0.3, tau_log_hr=0.12,beta_u_in_t=0.2,
    k_c=1.2, lam_c=1e6, beta_u_in_c=0.3, target_censor_rate=0, max_censor_calib_iter=60, censor_lam_lo=1e-8,censor_lam_hi=1e6, admin_censor_time=None,
    aZ=1.5, sigma_z=7.35, aW=1.5, sigma_w=4.77,
    linear_treatment=False, linear_outcome=False
)

print("\nEvaluation Results:")
print(results_df.to_string(index=False))


Evaluation Results:
   Model     RMSE      MAE      Bias  Pearson r  Spearman r  Rel RMSE    Slope
Baseline 0.020635 0.016273 -0.013679   0.696666    0.632786  1.321120 0.960382
  NC-CSF 0.019607 0.015113 -0.011054   0.644689    0.589667  1.255309 0.867767
  Oracle 0.019001 0.014633 -0.011388   0.698473    0.628014  1.216497 0.949803


Nonlinear treatment assignment and outcome mechanisms do not violate the negative control identification assumptions; however, they increase approximation error and may weaken finite-sample gains from confounding correction.