# Federated Coxph - with Privacy-Enhancing Techniques (PETs)
This notebook explains how to perform a federated Coxph model in Vantage6. This version computes the cumulative baseline hazard, and also provides options to include PETs. The first step is to install the required libraries (when necessary)

In [None]:
# ! pip install -r requirements.txt

## Credentials and login
Provide the correct credentials below, to log into a Federated Learning message broker (server). The following variables need to be configured:
- `server_url`: the url where the service is running (without port or subfolder specification)
- `server_port`: the port number where the server can be reached
- `server_api`: the subfolder URL specification needed, default is '/api'
- `username`: username of the researcher on the message broker server
- `password`: password of the researcher on the message broker server
- `organization_key`: if encryption is enabled, the organization key needs to be provided here

In [None]:
from vantage6.client import Client
import json, time

# Load your configuration settings from a file or environment
config = {
    'server_url': 'http://host.docker.internal',
    'server_port': 5000,
    'server_api': '/api',
    'username': 'alpha-user',
    'password': 'alpha-password',
    'organization_key': None
}

client = Client(config['server_url'], config['server_port'], config['server_api'], log_level='info')
client.authenticate(username=config['username'], password=config['password'])
client.setup_encryption(config['organization_key'])
client.log_level = 'warn'

## Define the task to execute
Now we are logged into the message broker, we can post a request to execute a specific algorithm. In our case to calculate a Kaplan-Meier curve.

To make this happen, we need to specify some information regarding the algorithm to execute. These are algorithm-specific variables:
- expl_vars: The column name which represents the input features for the model
- outcome_col: The column name which represents the label to be predicted
- time_col: The column name which represents the time to event (e.g., survival time)
- organization_ids: The organizations involved in running the experiment. These are the numeric identifiers of the organizations
- baseline_hf: Set to True to include cumulative baseline hazard function in the results
- binning: Set to True to enable binning of event times for added privacy
- bin_type: The type of binning to be used. Options are 'Fixed' or 'Quantile'
- differential_privacy: Set to True to enable differential privacy
- privacy_target: Set the target of differential privacy. Options are "predictors" or "aggregates"
- sensitivity: Set the sensitivity of the Cox model coefficients for differential privacy
- epsilon: Set the epsilon value for differential privacy. The lower the value, the more noise is added

In [None]:
# Determine the first collaboration identifier
collaboration_id = client.collaboration.list()['data'][0]['id']

# Determine the organizations involved in this collaboration
organization_ids = [ ]
for org in client.organization.list(collaboration=collaboration_id)["data"]:
  organization_ids.append(org['id'])

# Define algorithm input
input_={
        "method": "central",
        "kwargs": {
            "time_col": "Survival.time", # string value  
            "outcome_col": "deadstatus.event", # string value
            "expl_vars": ['clinical.T.Stage', 'Clinical.N.Stage'],  # list of columns to be used as predictors
            "baseline_hf": True,  # Boolean value 
            "binning": True,  # Boolean value 
            "bin_type": "Fixed",  # "Fixed" or "Quantile"
            "differential_privacy": True,  # Boolean value 
            "privacy_target": "aggregates",  # "predictors" or "aggregates"
            "sensitivity": 1,  # Numeric value
            "epsilon": 0.3,  # Numeric value
            "organization_ids": organization_ids, # list of organisations 

        }
}

### Execute the task
Now we can execute the task itself. Mind the `image` parameter, which refers to a Docker image which will be pulled (=downloaded) at every data station (=node) and executed. The previously defined input is passed in the `input_` parameter.

In [None]:
task = client.task.create(
    collaboration=collaboration_id, 
    organizations=[client.organization.get()['id']],  # List your organization IDs
    name='CoxPH - with PETs',  # Give your task a specific name
    image='ghcr.io/maastrichtu-cds/v6-coxph:2.0.0',  # Specify the desired algorithm Docker image version
    description='CoxPH with PETs and baseline hazard calculation',  # Describe the task
    databases=[{'label': 'default'}],  # Use your database label
    input_=input_
)

## Download and interpret results
In the following steps, we will download the results, and interpret them both in numerical and visual form. The following cell will download the results.

In [None]:
# Retrieve the results
# import json
result = client.wait_for_results(task['id'])
# Get the results
result_info = client.result.from_task(task_id=task['id'])
print(f'Results: {result_info}')
for data in result_info['data']:
    print(data['result'])