# Federated Logistic Regression training
This notebook explains how to perform a federated logistic regression. The first step is to install the required libraries (when necessary)

In [None]:
! pip install vantage6-client==4.4.1

## Credentials and login
Provide the correct credentials below, to log into a Federated Learning message broker (server). The following variables need to be configured:
- `server_url`: the url where the service is running (without port or subfolder specification)
- `server_port`: the port number where the server can be reached
- `server_api`: the subfolder URL specification needed, default is '/api'
- `username`: username of the researcher on the message broker server
- `password`: password of the researcher on the message broker server
- `organization_key`: if encryption is enabled, the organization key needs to be provided here

In [None]:
from vantage6.client import Client
import json, time

# Load your configuration settings from a file or environment
config = {
    'server_url': 'http://host.docker.internal',
    'server_port': 5000,
    'server_api': '/api',
    'username': 'alpha-user',
    'password': 'alpha-password',
    'organization_key': None
}

client = Client(config['server_url'], config['server_port'], config['server_api'], log_level='info')
client.authenticate(username=config['username'], password=config['password'])
client.setup_encryption(config['organization_key'])
client.log_level = 'warn'

## Define the task to execute
Now we are logged into the message broker, we can post a request to execute a specific algorithm. In our case to perform a federated learning execution to calculate a logistic regression.

To make this happen, we need to specify some information regarding the algorithm to execute. These are algorithm-specific variables:
- `predictors`: The column name which represents the input features for the logistic regression
- `outcome`: the column name which represents the label to be predicted
- `classes`: options for the output labels to be predicted
- `max_iter`: the maximum number of iterations for the algorithm to execute
- `delta`: the loss threshold as a separate stopping criteria

In [None]:
# Determine the first collaboration identifier
collaboration_id = client.collaboration.list()['data'][0]['id']

# Determine the organizations involved in this collaboration
organization_ids = [ ]
for org in client.organization.list(collaboration=collaboration_id)["data"]:
  organization_ids.append(org['id'])

# Define algorithm input
input_ = {
    'method': 'master',
    'master': True,
    'kwargs': {
        'org_ids': organization_ids,          # organisations to run algorithm
        'predictors': ['clinical.T.Stage', 'Clinical.N.Stage'], # columns to be used as predictors
        'outcome': 'deadstatus.event',       # column to be used as outcome
        'classes': [0, 1],          # classes to be predicted
        'max_iter': 15,             # maximum number of iterations to perform
        'delta': 0.01,              # threshold loss difference for convergence
    }
}

### Execute the task
Now we can execute the task itself. Mind the `image` parameter, which refers to a Docker image which will be pulled (=downloaded) at every data station (=node) and executed. The previously defined input is passed in the `input_` parameter.

In [None]:
# Send the task to the central server
task = client.task.create(
    collaboration=collaboration_id,
    organizations=[client.organization.get()['id']],
    name='v6-logistic-regression-py',
    image='ghcr.io/maastrichtu-cds/v6-logistic-regression-py:latest',
    description='run logistic regression',
    databases=[{'label': 'default'}],  # Use your database label
    input_=input_,
)

## Download and interpret results
In the following steps, we will download the results, present the parameters of the logistic regression.

In [None]:
# Retrieve the results
task_info = client.task.get(task['id'], include_results=True)
while not task_info['status']=='completed':
    print("No result (yet) to be retrieved, waiting")
    time.sleep(5)
    task_info = client.task.get(task['id'], include_results=True)
result = json.loads(client.result.from_task(task['id'])["data"][0]["result"])
print(json.dumps(result, indent=2))