# DeEsco Trial Treatment Guidelines Calculator

⚠️ **This code uses lymph version 1.0.0.clin-trial specifically!**

## Overview
This notebook generates treatment guidelines for the DeEsco trial by:
1. Setting up a lymphatic spread model with 6 lymph node levels (LNLs) per side
2. Using Bayesian risk assessment with 216 MCMC samples for uncertainty quantification  
3. Applying a 10% risk threshold with 95% confidence intervals for treatment decisions
4. Generating comprehensive treatment tables for all diagnostic combinations

**Install required version:** `pip install git+https://github.com/rmnldwg/lymph.git@1.0.0.clin-trial`

## 1. Model Setup and Configuration

**Built on lymph 1.0.0.clin-trial**

### Lymphatic Network Model
- Models lymphatic spread between 6 lymph node levels: I, II, III, IV, V, VII
- Supports both ipsilateral and contralateral spread patterns
- Handles central tumors with bilateral spread capabilities
- Uses maximum likelihood estimation for diagnostic inference

### Key Parameters:
- **Graph structure**: Defines anatomical connections between LNLs
- **Time evolution**: 10 time steps maximum with early/late progression patterns
- **Early diagnosis probability**: 30% (p=0.3) 
- **Diagnostic modalities**: 
  - `max_llh`: Perfect diagnostic (100% sensitivity/specificity) for model fitting
  - `treatment_diagnose`: Clinical diagnostic (81% sensitivity, 100% specificity)

In [83]:
import numpy as np
import pandas as pd
from scipy.special import factorial
import scipy as sp
import emcee                      # inference and backends for sample storage
from multiprocessing import Pool  # for parallelization of the inference
import lymph

graph = {
    ('tumor', 'primary')  : ['I','II', 'III', 'IV','V','VII'], 
    ('lnl'  , 'I') :       ['II'],
    ('lnl'  , 'II') :       ['III'], 
    ('lnl'  , 'III'):       ['IV'], 
    ('lnl'  , 'IV') :       ['V'],
    ('lnl'  , 'V') :       [],
    ('lnl'  , 'VII') :       []
    
}
model = lymph.models.Midline(graph_dict= graph,tumor_state = 1, unilateral_kwargs={'allowed_states':[0,1], 'max_time':10}, use_central = True, use_midext_evo = False, marginalize_unknown= False)
model.set_modality('max_llh',spec = 1,sens = 1)

# Time prior with p(early) = 0.3
def binom_pmf(k: np.ndarray, n: int, p: float):
    """Binomial PMF"""
    if p > 1. or p < 0.:
        raise ValueError("Binomial prob must be btw. 0 and 1")
    q = (1. - p)
    binom_coeff = factorial(n) / (factorial(k) * factorial(n - k))
    return binom_coeff * p**k * q**(n - k)

def late_binomial(support: np.ndarray, p: float = 0.5) -> np.ndarray:
    """Parametrized binomial distribution."""
    return binom_pmf(support, n=support[-1], p=p)

max_t = 10
model.set_distribution('early',sp.stats.binom.pmf(np.arange(max_t+1), max_t, 0.3))
model.set_distribution('late', late_binomial)
model.set_modality('treatment_diagnose', spec = 1, sens = 0.81)

## 2. Loading MCMC Parameter Samples

### Sample Source and Purpose
Loading pre-computed MCMC samples from `trial_samples_central.h5`:
- **2160 total samples** from Bayesian parameter inference
- **18 parameters** per sample (mixing + spread probabilities)
- Samples represent uncertainty in lymphatic spread parameters

### Parameters Structure:
- `mixing`: Mixing coefficient for bilateral involvement
- `ipsi_primarytoX_spread`: Primary tumor to ipsilateral LNL X spread probability
- `contra_primarytoX_spread`: Primary tumor to contralateral LNL X spread probability  
- `XtoY_spread`: LNL-to-LNL spread probabilities (I→II, II→III, III→IV, IV→V)
- `late_p`: Late progression probability parameter

In [89]:
import h5py
with h5py.File("data/samples_trial.h5", "r") as f:
    samples1 = f["chain"][...]
samples1 = samples1.reshape(-1, samples1.shape[-1])
print(samples1.shape)

(2160, 18)


In [91]:
sampled_mean = samples1.mean(axis = 0)
params = {'mixing': sampled_mean[0],
        'ipsi_primarytoI_spread': sampled_mean[1],
        'ipsi_primarytoII_spread': sampled_mean[2],
        'ipsi_primarytoIII_spread': sampled_mean[3],
        'ipsi_primarytoIV_spread': sampled_mean[4],
        'ipsi_primarytoV_spread': sampled_mean[5],
        'ipsi_primarytoVII_spread': sampled_mean[6],
        'contra_primarytoI_spread': sampled_mean[7],
        'contra_primarytoII_spread': sampled_mean[8],
        'contra_primarytoIII_spread': sampled_mean[9],
        'contra_primarytoIV_spread': sampled_mean[10],   
        'contra_primarytoV_spread': sampled_mean[11],
        'contra_primarytoVII_spread': sampled_mean[12],
        'ItoII_spread': sampled_mean[13],
        'IItoIII_spread': sampled_mean[14],
        'IIItoIV_spread': sampled_mean[15],
        'IVtoV_spread': sampled_mean[16],
        'late_p': sampled_mean[17]}
model.set_params(**params)
model.get_params()

{'midext_prob': 0.0,
 'ipsi_primarytoI_spread': 0.026599934089507535,
 'ipsi_primarytoII_spread': 0.3754362312489512,
 'ipsi_primarytoIII_spread': 0.07350634235991671,
 'ipsi_primarytoIV_spread': 0.009868764752471882,
 'ipsi_primarytoV_spread': 0.01608922143844808,
 'ipsi_primarytoVII_spread': 0.021790771223072873,
 'contra_primarytoI_spread': 0.0032833634932873815,
 'contra_primarytoII_spread': 0.025330185925201906,
 'contra_primarytoIII_spread': 0.0023198951662066233,
 'contra_primarytoIV_spread': 0.0028514226257283703,
 'contra_primarytoV_spread': 0.000656088933696782,
 'contra_primarytoVII_spread': 0.006324800116350196,
 'mixing': 0.22533811234978024,
 'ItoII_spread': 0.7470325433932157,
 'IItoIII_spread': 0.1444848004577465,
 'IIItoIV_spread': 0.16715051321273394,
 'IVtoV_spread': 0.17189301394039708,
 'late_p': 0.36914158720690937}

### Setting Model Parameters from MCMC Samples

Using the **mean values** from all MCMC samples to set the model for some example calculations

In [93]:
dataset_USZ =  pd.read_csv("data/cleanedUSZ.csv", header=[0,1,2]) #import data

maxllh =  dataset_USZ['max_llh']
t_stage = dataset_USZ['info']
ipsi = maxllh.loc[:,'ipsi'].drop(['IIa','IIb','VIII','Ib','IX','VI','X','Ia'],axis = 1)[['I','II','III','IV','V','VII']]
contra = maxllh.loc[:,'contra'].drop(['IIa','IIb','VIII','Ib','IX','VI','X','Ia'],axis = 1)[['I','II','III','IV','V','VII']]
ipsi_header = header = pd.MultiIndex.from_product([ ['ipsi'], ['I','II','III','IV','V','VII']], names=['', ''])
contra_header = pd.MultiIndex.from_product([['contra'], ['I','II','III','IV','V','VII']], names=['', ''])
ipsi.columns = ipsi_header
contra.columns = contra_header

dataset_analyze = pd.concat([t_stage,ipsi,contra],axis = 1)


### Loading Clinical Dataset

Loading the USZ clinical dataset to analyze real patient diagnostic combinations:
- `cleanedUSZ.csv`: USZ-specific subset used for validation
- Data structure includes T-stage, LNL involvement patterns, and maximum likelihood diagnoses
- Filtering to 6 relevant LNLs: I, II, III, IV, V, VII (excluding IIa, IIb, VIII, etc.)

## 3. Example Analysis - Individual Cases

Before running the full combination analysis, we'll demonstrate the treatment decision algorithm with specific diagnostic scenarios. This helps understand:
- How the `levels_to_spare` function works
- The effect of different diagnostic patterns
- Risk calculation and confidence interval usage
- Comparison between algorithm versions (old vs new)

### 3.1 Sample Preparation for Analysis

**Critical Parameter**: We use exactly **216 samples** with **step_size = 10** for reproducibility:
- From 2160 total samples, select every 10th sample
- This provides sufficient coverage while maintaining computational efficiency  
- Evenly spaced sampling ensures representative uncertainty quantification
- These 216 samples will be used for all risk calculations and confidence intervals

In [94]:
from sparing_scripts import sample_from_flattened

samples_reduced = sample_from_flattened(samples1, num_samples = 216, spaced = True, step_size = 10)

In [96]:
from sparing_scripts import risk_sampled, levels_to_spare, ci_single

diagnose = {"ipsi": {'treatment_diagnose':{
        "I": 0,
        "II": 1,
        "III": 0,
        "IV": 0,
        "V": 0,
        "VII": 0
    }},
    "contra": {'treatment_diagnose':{
        "I": 0,
        "II": 0,
        "III": 0,
        "IV": 0,
        "V": 0,
        "VII": 0
    }}}
sampled_risks, risk = risk_sampled(samples = samples_reduced, model = model, t_stage = 'early', given_diagnoses= diagnose,central = None, midline_extension= False)     
spared_lnls, total_risk, ranked_combined, treated_lnls, treated_array, treated_ipsi, treated_contra, sampled_total_risks = levels_to_spare(0.10, model, risk, sampled_risks, ci = False)
print(treated_lnls)
print(total_risk*100)
print(spared_lnls)
ci_single(sampled_total_risks)*100

[('ipsi III', 0.08275283447364559), ('ipsi II', 1.0000000000000002)]
6.629766388350969
[('contra V', 0.00042763173316175606), ('contra I', 0.0009891023055616907), ('contra IV', 0.0014467123203597062), ('contra III', 0.0015009689751277794), ('contra VII', 0.0036182001687979333), ('ipsi IV', 0.007896204657586742), ('ipsi V', 0.009609714097105536), ('ipsi VII', 0.012807244016318392), ('contra II', 0.013067004381272224), ('ipsi I', 0.018191576282428446)]


array([5.75407088, 7.54938061])

### 3.2 Example Case: Early T-stage with Ipsi Level II involvement

**Diagnostic scenario**:
- T-stage: Early
- Midline extension: False  
- Ipsilateral involvement: Level II only (diagnosed positive)
- Contralateral involvement: None

**Analysis approach**:
1. Generate 216 risk estimates using MCMC samples
2. Calculate mean risk matrix (64×64 for bilateral involvement patterns)
3. Apply `levels_to_spare` algorithm with 10% threshold
4. Use `ci = False`: Uses mean risk for threshold comparison (not CI upper bound)
5. Show treatment recommendation and confidence intervals

## 4. Comprehensive Combination Analysis

### Purpose
Generate treatment recommendations for **all possible diagnostic combinations** to create complete clinical guidelines.

### Scope
- **Non-central tumors**: 2^14 = 16,384 combinations
- **Central tumors**: 2^13 = 8,192 combinations  
- **Each combination includes**: T-stage, midline extension, and all LNL involvement patterns

### Process Overview
1. **USZ Dataset Analysis**: Validate algorithm on real patient data
2. **Full Combination Generation**: Create all theoretical diagnostic scenarios
3. **Risk Calculation**: Apply 216-sample uncertainty quantification to each combination
4. **Treatment Decision**: Use 95% CI upper bound with 10% threshold
5. **Table Export**: Generate comprehensive treatment lookup tables

### 4.1 Processing USZ Clinical Dataset

**Goal**: Extract unique diagnostic combinations from real patient data for validation.

**Process**:
1. **Group patients** by identical diagnostic patterns (T-stage + LNL involvement)
2. **Count frequencies** of each unique combination  
3. **Extract involvement patterns** for ipsilateral and contralateral sides
4. **Prepare for analysis** with the treatment algorithm

**Output**: 
- Unique diagnostic combinations from clinical practice
- Patient frequencies for each combination
- Structured format for risk analysis

In [97]:
from collections import Counter
from collections import defaultdict


# Sample array with different entry combinations
data = np.array(dataset_analyze)

entry_combinations_with_indexes = defaultdict(list)
for index, row in enumerate(data):
    combination = tuple(row)
    entry_combinations_with_indexes[combination].append(index)
USZ_counts = []
USZ_combinations = []
USZ_indexes = []
for combination, indexes in entry_combinations_with_indexes.items():
    count = len(indexes)
    USZ_indexes.append(indexes)
    USZ_counts.append(count)
    USZ_combinations.append(combination)

lnls = ['I','II', 'III', 'IV','V', 'VII']
t_stage = []
midline_extension = []
invovlvement_ipsi_USZ = []
invovlvement_contra_USZ = []
for diagnose_type in USZ_combinations:
    involved_ipsi = []
    involved_contra = []
    t_stage.append(diagnose_type[0])
    midline_extension.append(diagnose_type[1])
    for lnl_looper, involved_level in enumerate(lnls):
        if diagnose_type[lnl_looper +2] == True:
            involved_ipsi.append(involved_level) 
        if diagnose_type[lnl_looper +8] == True:
            involved_contra.append(involved_level)
    invovlvement_ipsi_USZ.append(involved_ipsi)
    invovlvement_contra_USZ.append(involved_contra)

### 4.2 USZ Dataset Analysis Results

**Analysis of real clinical combinations**:
- Uses the **corrected algorithm** (`analysis_treated_lnls_combinations`)  
- **216 samples** for uncertainty quantification
- **10% risk threshold** with **95% confidence intervals**
- **CI upper bound** used for conservative treatment decisions

**Output metrics**:
- Treatment recommendations for each clinical combination
- Risk estimates with confidence intervals  
- Top 3 spared LNLs (lowest risk levels that can be omitted)
- Frequency distribution of unique treatment strategies

In [98]:
from sparing_scripts import count_number_treatments, analysis_treated_lnls_combinations
usz_treated_lnls_no_risk, usz_treated_lnls_all, usz_treatment_array, usz_top3_spared, usz_total_risks, usz_treated_ipsi, usz_treated_contra, usz_sampled_risks_array, usz_lnls_ranked, cis = analysis_treated_lnls_combinations(combinations = USZ_combinations, model = model, samples = samples_reduced, threshold = 0.10)
usz_set_counts = count_number_treatments(usz_treated_lnls_no_risk)
len(usz_set_counts)

41

In [99]:
from sparing_scripts import ci_multiple
ci = ci_multiple(usz_sampled_risks_array)
data_export_usz = pd.DataFrame({'Percentage of patients': np.array(USZ_counts)/287,
                                'T-stage': t_stage,
                                'Midline Extension': midline_extension,
                                'Involvement Ipsi' : invovlvement_ipsi_USZ,
                                'Involvement Contra': invovlvement_contra_USZ,
                                'Treated Ipsi':  usz_treated_ipsi,
                                'Treated Contra': usz_treated_contra,
                                'risk': usz_total_risks,
                                'lower bound': ci.T[0],
                                'upper bound': ci.T[1],
                                'top 3 spared lnls risk': usz_top3_spared

})
# data_export_usz.to_csv('analyzed_usz_data_new_dataset.csv', sep = ';', index = False)
# data_export_usz.sort_values(by = 'Percentage of patients', ascending = False, inplace = True)
data_export_usz

Unnamed: 0,Percentage of patients,T-stage,Midline Extension,Involvement Ipsi,Involvement Contra,Treated Ipsi,Treated Contra,risk,lower bound,upper bound,top 3 spared lnls risk
0,0.048780,late,True,[II],[],"[III, II]",[II],0.078032,0.066303,0.091473,"[(ipsi I, 0.02018212044831703), (ipsi VII, 0.0..."
1,0.010453,early,False,[II],[II],"[III, II]","[III, II]",0.076328,0.062081,0.097589,"[(ipsi I, 0.020745164441888973), (contra I, 0...."
2,0.003484,late,True,"[I, II, III, IV, VII]","[I, II, III, IV]","[V, I, II, III, IV, VII]","[I, II, III, IV]",0.061390,0.034468,0.095027,"[(contra V, 0.049917866263852986), (contra VII..."
3,0.003484,late,True,"[II, III, IV, VII]",[],"[V, VII, II, III, IV]",[II],0.066456,0.055051,0.077370,"[(ipsi I, 0.028686386918113108), (contra III, ..."
4,0.010453,early,False,"[II, VII]",[],"[III, II, VII]",[],0.063806,0.054496,0.073761,"[(ipsi I, 0.021186904157757556), (contra II, 0..."
...,...,...,...,...,...,...,...,...,...,...,...
72,0.003484,early,False,"[II, IV]",[],"[V, III, II, IV]",[],0.062058,0.052172,0.072464,"[(ipsi I, 0.022471174486049225), (ipsi VII, 0...."
73,0.006969,late,False,"[II, III, V]",[],"[IV, II, III, V]",[],0.075202,0.062002,0.087899,"[(ipsi I, 0.02735384799828265), (ipsi VII, 0.0..."
74,0.003484,late,True,"[II, III]","[II, III, IV]","[I, IV, II, III]","[V, II, III, IV]",0.066285,0.054837,0.080623,"[(ipsi VII, 0.02303066023513181), (ipsi V, 0.0..."
75,0.003484,late,False,"[II, V]",[],"[IV, III, II, V]",[],0.066396,0.054162,0.077747,"[(ipsi I, 0.024086145154078375), (ipsi VII, 0...."


### 4.3 Generating All Possible Diagnostic Combinations

**⚠️ THIS CALCULATION TAKES A FEW HOURS**

**Complete enumeration approach**:
- Generate **2^14 = 16,384 combinations** for non-central tumors
- Each combination represents a unique diagnostic scenario
- **14 binary variables**: T-stage + midline extension + 12 LNLs (6 ipsi + 6 contra)

**Combination structure**:
1. **T-stage**: Early (0) or Late (1)  
2. **Midline extension**: False (0) or True (1)
3. **LNL involvement**: 12 binary values (0=negative, 1=positive)
   - Positions 2-7: Ipsilateral I,II,III,IV,V,VII
   - Positions 8-13: Contralateral I,II,III,IV,V,VII

**Purpose**: Create lookup tables covering every possible clinical scenario

In [40]:
from sparing_scripts import change_base

def produce_combinations_list(array):
    combinations_list = []
    for entry in array:
        combination = []
        for index, cells in enumerate(entry):
            if index == 0:
                combination.append('early') if cells == 0 else combination.append('late')
            else:
                combination.append(False) if cells == 0 else combination.append(True)
        combination = tuple(combination)
        combinations_list.append(combination)
    return(combinations_list)

combination_array = np.zeros((2**14,14))
for i in range(2**14):
    combination_array[i] = [
        int(digit) for digit in change_base(i, 2, length=14)
    ]

all_combinations = produce_combinations_list(combination_array)

### 4.4 Parallelized Analysis of All Combinations

**Computational strategy**:
- **16,384 combinations** require significant computation time
- **Multiprocessing**: Divide work across CPU cores for efficiency
- **New algorithm**: Uses corrected `levels_to_spare` with CI-based decisions

**Analysis parameters**:
- **216 MCMC samples** per combination for uncertainty quantification
- **10% risk threshold** for treatment decisions  
- **95% CI upper bound** for conservative threshold comparison
- **Parallel processing**: Utilizes multiple CPU cores to reduce computation time

**Expected output**:
- Complete treatment recommendations for all diagnostic scenarios
- Risk estimates with confidence intervals
- Comprehensive clinical lookup tables

In [None]:
import multiprocessing as mp

# Function to process a chunk of combinations
def process_combinations(chunk):
    return analysis_treated_lnls_combinations(chunk, samples_reduced, model)

# Divide the combinations into chunks
num_cores = mp.cpu_count() - 1
chunk_size = len(all_combinations) // num_cores
chunks = [all_combinations[i:i + chunk_size] for i in range(0, len(all_combinations), chunk_size)]

# Use multiprocessing to process the chunks
with mp.Pool(num_cores) as pool:
    results = pool.map(process_combinations, chunks)

# Combine the results from all chunks
treated_lnls_no_risk, treated_lnls_all, treatment_array, top3_spared, total_risks, treated_ipsi, treated_contra, sampled_risks_array, lnls_ranked, cis = zip(*results)

# Flatten the results
treated_lnls_no_risk = [item for sublist in treated_lnls_no_risk for item in sublist]
treated_lnls_all = [item for sublist in treated_lnls_all for item in sublist]
treatment_array = np.vstack(treatment_array)
top3_spared = [item for sublist in top3_spared for item in sublist]
total_risks = np.concatenate(total_risks)
treated_ipsi = [item for sublist in treated_ipsi for item in sublist]
treated_contra = [item for sublist in treated_contra for item in sublist]
sampled_risks_array = np.vstack(sampled_risks_array)
lnls_ranked = [item for sublist in lnls_ranked for item in sublist]
cis_lower = []
cis_upper = []
for item in cis:
    cis_lower.append(item[0])
    cis_upper.append(item[1])
flat_lower = [item for sublist in cis_lower for item in sublist]
flat_upper = [item for sublist in cis_upper for item in sublist]

In [38]:
sampled_risks_early_no_ext, mean_risk_early_no_ext = risk_sampled(samples_reduced, model, 'early', midline_extension = False, given_diagnoses = None) 
sampled_risks_early_ext, mean_risk_early_ext = risk_sampled(samples_reduced, model, 'early', midline_extension = True, given_diagnoses = None)
sampled_risks_late_no_ext, mean_risk_late_no_ext = risk_sampled(samples_reduced, model, 'late', midline_extension = False, given_diagnoses = None)
sampled_risks_late_ext, mean_risk_late_ext = risk_sampled(samples_reduced, model, 'late', midline_extension = True, given_diagnoses = None)



### 4.5 Baseline Risk Calculations

**Computing population-level risks** to get the "prevalence" of each diagnose

In [39]:
#generate state list
state_list = np.array(np.meshgrid(*[[0, 1]] * 14)).T.reshape(-1, 14)
state_list = state_list[np.lexsort(np.fliplr(state_list).T)]
# Reshape the risk arrays into 1x4096 arrays
mean_risk_early_noext_flat = mean_risk_early_no_ext.reshape(-1)
mean_risk_early_ext_flat = mean_risk_early_ext.reshape(-1)
mean_risk_late_noext_flat = mean_risk_late_no_ext.reshape(-1)
mean_risk_late_ext_flat = mean_risk_late_ext.reshape(-1)
#combine them
full_risks = np.hstack([mean_risk_early_noext_flat, mean_risk_early_ext_flat, mean_risk_late_noext_flat, mean_risk_late_ext_flat])/4

In [40]:
lnls = ['I','II', 'III', 'IV','V', 'VII']
t_stage = []
midline_extension = []
invovlvement_ipsi = []
invovlvement_contra = []
for diagnose_type in all_combinations:
    involved_ipsi = []
    involved_contra = []
    t_stage.append(diagnose_type[0])
    midline_extension.append(diagnose_type[1])
    for lnl_looper, involved_level in enumerate(lnls):
        if diagnose_type[lnl_looper +2] == True:
            involved_ipsi.append(involved_level) 
        if diagnose_type[lnl_looper +8] == True:
            involved_contra.append(involved_level)
    invovlvement_ipsi.append(involved_ipsi)
    invovlvement_contra.append(involved_contra)

In [None]:
data_export = pd.DataFrame({'Percentage of patients': full_risks,
                                'T-stage': t_stage,
                                'Midline Extension': midline_extension,
                                'Involvement Ipsi' : invovlvement_ipsi,
                                'Involvement Contra': invovlvement_contra,
                                'Treated Ipsi':  treated_ipsi,
                                'Treated Contra': treated_contra,
                                'risk': total_risks,
                                'lower bound': flat_lower,
                                'upper bound': flat_upper,
                                'top 3 spared lnls risk': top3_spared,
                                'lnls ranked': lnls_ranked
})
data_export.to_csv('lymph_1_midline_full_table_new_code.csv', sep = ';', index = True)

# 5. Central Tumor Analysis

This section repeats the comprehensive analysis for **central tumors** (tumors crossing the midline with bilateral spread). The methodology mirrors Section 4 but uses different tumor characteristics:

- **Tumor Stage**: Late T-stage (more advanced tumors)
- **Central Parameter**: `central = True` (enables bilateral spread modeling) 
- **Bilateral Spread**: Both ipsilateral and contralateral LNL involvement possible

## Key Differences from Midline Analysis:
- Central tumors have higher baseline risks due to bilateral spread potential
- Analysis considers 2^13 combinations (13 LNLs: 6 per side + 1 midline level)
- Risk calculations account for cross-midline tumor extension patterns

This parallel analysis ensures treatment guidelines cover both midline-only and centrally-extending tumor presentations in the DeEsco trial.

Note: Here we assume that the probability of early no extension, late no extension, early extension, and late extension are all equal for the prevalence calculation (which is generally not true)

## 5.1 Combination Generation for Central Tumors

For central tumors, we generate all possible treatment combinations for **13 LNLs**:
- **6 ipsilateral LNLs**: I, II, III, IV, V, VII (left side) 
- **6 contralateral LNLs**: I, II, III, IV, V, VII (right side)
- **1 central LNL**: VII (midline)

This creates **2^13 = 8,192 possible treatment combinations**, each representing a different clinical decision about which LNLs to treat versus spare.

In [None]:
combination_array_central = np.zeros((2**13,13))
for i in range(2**13):
    combination_array_central[i] = [
        int(digit) for digit in change_base(i, 2, length=13)
    ]

all_combinations_central = produce_combinations_list(combination_array_central)

In [55]:
import multiprocessing as mp

# Function to process a chunk of combinations
def process_combinations(chunk):
    return analysis_treated_lnls_combinations(chunk, samples_reduced, model, central = True)

# Divide the combinations into chunks
num_cores = mp.cpu_count() - 2
chunk_size = len(all_combinations_central) // num_cores
chunks = [all_combinations_central[i:i + chunk_size] for i in range(0, len(all_combinations_central), chunk_size)]

# Use multiprocessing to process the chunks
with mp.Pool(num_cores) as pool:
    results = pool.map(process_combinations, chunks)

# Combine the results from all chunks
treated_lnls_no_risk, treated_lnls_all, treatment_array, top3_spared, total_risks, treated_ipsi, treated_contra, sampled_risks_array, lnls_ranked, cis = zip(*results)

# Flatten the results
treated_lnls_no_risk = [item for sublist in treated_lnls_no_risk for item in sublist]
treated_lnls_all = [item for sublist in treated_lnls_all for item in sublist]
treatment_array = np.vstack(treatment_array)
top3_spared = [item for sublist in top3_spared for item in sublist]
total_risks = np.concatenate(total_risks)
treated_ipsi = [item for sublist in treated_ipsi for item in sublist]
treated_contra = [item for sublist in treated_contra for item in sublist]
sampled_risks_array = np.vstack(sampled_risks_array)
lnls_ranked = [item for sublist in lnls_ranked for item in sublist]
cis_lower = []
cis_upper = []
for item in cis:
    cis_lower.append(item[0])
    cis_upper.append(item[1])
flat_lower = [item for sublist in cis_lower for item in sublist]
flat_upper = [item for sublist in cis_upper for item in sublist]

In [56]:
sampled_risks_early, mean_risk_early = risk_sampled(samples_reduced, model, 'early', central = True, given_diagnoses = None) 
sampled_risks_late, mean_risk_late = risk_sampled(samples_reduced, model, 'late', central = True, given_diagnoses = None)

In [58]:
#generate state list
state_list = np.array(np.meshgrid(*[[0, 1]] * 13)).T.reshape(-1, 13)
state_list = state_list[np.lexsort(np.fliplr(state_list).T)]
# Reshape the risk arrays into 1x4096 arrays
mean_risk_early = mean_risk_early.reshape(-1)
mean_risk_late = mean_risk_late.reshape(-1)
#combine them
full_risks = np.hstack([mean_risk_early, mean_risk_late])/2

In [66]:
lnls = ['I','II', 'III', 'IV','V', 'VII']
t_stage = []
invovlvement_ipsi = []
invovlvement_contra = []
for diagnose_type in all_combinations_central:
    involved_ipsi = []
    involved_contra = []
    t_stage.append(diagnose_type[0])
    for lnl_looper, involved_level in enumerate(lnls):
        if diagnose_type[lnl_looper +1] == True:
            involved_ipsi.append(involved_level) 
        if diagnose_type[lnl_looper +7] == True:
            involved_contra.append(involved_level)
    invovlvement_ipsi.append(involved_ipsi)
    invovlvement_contra.append(involved_contra)

In [67]:
len(t_stage)

8192

In [79]:
data_export = pd.DataFrame({'Percentage of patients': full_risks,
                                'T-stage': t_stage,
                                'Involvement Ipsi' : invovlvement_ipsi,
                                'Involvement Contra': invovlvement_contra,
                                'Treated Ipsi':  treated_ipsi,
                                'Treated Contra': treated_contra,
                                'risk': total_risks,
                                'lower bound': flat_lower,
                                'upper bound': flat_upper,
                                'top 3 spared lnls risk': top3_spared,
                                'lnls ranked': lnls_ranked
})
data_export.to_csv('lymph_1_midline_full_table_central_new_code.csv', sep = ';', index = True)

# 6. Results Export

## Final Treatment Table Export

The completed analysis generates the final clinical decision tables

This CSV file contains:
- **All diagnostic scenarios**: Every possible combination of LNL involvement patterns
- **Treatment recommendations**: Which LNLs can be safely spared for each scenario
- **Risk assessments**: 95% CI upper bounds ensuring safety thresholds
- **Clinical applicability**: Ready-to-use guidelines for DeEsco trial treatment decisions

The table serves as the primary clinical reference for treatment planning in the DeEsco trial, ensuring both midline and central tumor cases receive appropriate, evidence-based care while maximizing LNL sparing opportunities.