## Step 1: Import Libraries

In [1]:
import sys
import warnings

warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")
sys.path.insert(1, '../src/')

from fdr_control import fdr_control
from real import train_and_explain
from visualize_results import visualize_results

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
2024-08-01 21:13:51.171425: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Instructions for updating:
non-resource variables are not supported in the long term


Instructions for updating:
non-resource variables are not supported in the long term


## Step 2: Train and Explain the Model

To train and explain your chosen model using the provided code snippet, follow these steps:

1. **Set Dataset:**
   Set the `dataset` argument to the dataset you want to use. The available datasets are:
   - `enhancer`: Drosophila enhancers
   - `mortality`: Mortality
   - `diabetes`: Diabetes
   - `cal_housing`: California Housing

2. **Select Model Type:**  
   Set the `model_type` argument to the model you prefer. The available models are:
   - `nn`: Neural Network
   - `lightgbm`: LightGBM
   - `xgboost`: XGBoost
   - `fm`: Factorization Machine

3. **Choose Explanation Method (for Neural Networks):**  
   If using neural networks, set the `explainer` argument to one of the following:
   - `eg`: Expected Gradients / Hessian
   - `ig`: Integrated Gradients / Hessian
   - `topo`: Network Topology-based Explanation

4. **Select Knockoff Generation Method:**  
   Set the `knockoff` argument to the preferred method. The supported methods include:
   - `knockoffgan`: KnockoffGAN
   - `deepknockoffs`: DeepKnockoffs
   - `vaeknockoff`: VAEKnockoff
   - `knockoffsdiag`: KnockoffsDiagnostics

After running the code, the explanations will be saved in the `output/real/{dataset}/{model_type}_{knockoff}` or `output/real/{dataset}/{model_type}_{knockoff}_{explainer}` directory based on the model and knockoff method used.

In [2]:
for seed in range(1, 21):
    save_local = True if seed == 1 else False
    train_and_explain(seed, dataset='diabetes', model_type='xgboost', knockoff='knockoffsdiag', save_local=save_local)

100%|██████████| 10/10 [00:00<00:00, 908.02it/s]
10it [00:00, 18016.77it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 38026.33it/s]
10it [00:00, 28552.10it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 37449.14it/s]
10it [00:00, 22869.71it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 33527.61it/s]
10it [00:00, 39162.50it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 45197.24it/s]
10it [00:00, 48377.21it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 33078.11it/s]
10it [00:00, 38095.40it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 30840.47it/s]
10it [00:00, 40524.68it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 47339.77it/s]
10it [00:00, 51275.11it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 34749.83it/s]
10it [00:00, 31161.25it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 32066.54it/s]
10it [00:00, 38515.19it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 43151.28it/s]
10it [00:00, 35246.25it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 47989.75it/s]
10it [00:00, 38095.40it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 34127.78it/s]
10it [00:00, 40060.21it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 43018.50it/s]
10it [00:00, 34100.03it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 36503.95it/s]
10it [00:00, 53092.46it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 29351.32it/s]
10it [00:00, 35187.11it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 40485.56it/s]
10it [00:00, 29579.01it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 48770.98it/s]
10it [00:00, 43284.87it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 48714.33it/s]
10it [00:00, 47771.12it/s]


Device: cuda


100%|██████████| 10/10 [00:00<00:00, 42452.47it/s]
10it [00:00, 45051.60it/s]


Device: cuda


## Step 3: Perform FDR Control

To perform calibration and FDR control, follow these steps using the provided code snippet:

1. **Set the Data Directory:**  
   Define the `data_dir` variable to point to the directory containing the explanations generated in the previous step.

2. **Set the Target FDR Level:**  
   Specify the `target_fdr` variable to set the desired false discovery rate (FDR) level.

After running the code, the FDR control results will be saved in the `{data_dir}/fdr_control` directory.


In [3]:
fdr_control(data_dir='../output/real/diabetes/xgboost_knockoffsdiag', target_fdr=0.2)

Running FDR control


100%|██████████| 20/20 [00:04<00:00,  4.48it/s]


## Step 4: Visualize the Results

To visualize the results, run the provided code snippet. 
This will generate plots illustrating the FDR control results.
The plots will be saved in the `{data_dir}/fdr_control/figures` directory.

In [4]:
visualize_results(data_dir='../output/real/diabetes/xgboost_knockoffsdiag')

Loaded existing data
Top 10 diabetes calibrated_q_values
bmi - s5 0.55
bmi - s4 1.0
age - bp 1.0
age - s1 1.0
age - s2 1.0
age - s3 1.0
age - s4 1.0
age - s5 1.0
age - s6 1.0
sex - bmi 1.0
