## AMLD Workshop Notebook 3

In this notebook, you will combine what you learned in notebooks 1 and 2 to train a federated machine learning task.

### Setup

In [None]:
import pandas as pd
import uuid

from tuneinsight import Diapason, models
from tuneinsight.computations import HybridFL
import tuneinsight.utils.time_tools as time

from ti_models.factories.ti_trainer_factory import get_premade_ti_trainer

#### Create clients

In [None]:
from amld_setup import *

%env TI_USERNAME=
%env TI_PASSWORD=

In [None]:
client = Diapason.from_env()

In [None]:
client.healthcheck()

#### Create and share the project

Projects are the main unit of collaboration in Tune Insight projects. In a project, you will define the computation to run in a federated setting, and set the datasource used by your instance. Other participants will also choose the data used by their instance. Once everything is set up, the federated analysis can be run using data from all instances, without centralizing the data.

In [None]:
PROJECT_NAME = f"project-3-{uuid.uuid4()}"

project = client.new_project(name=PROJECT_NAME, clear_if_exists=True)
project.share()

## Load the dataset

In [None]:
data_path = "data/data_0.csv"

In [None]:
df = pd.read_csv(data_path)
df

In [None]:
# Feel free to play around with the data if you want.

### Split the dataset into training and validation sets

In [None]:
df["split"] = "train"
df.loc[df.sample(frac=0.2, random_state=42).index, "split"] = "val"
df

Upload the data to the instance and set it on the project.

In [None]:
datasource = client.new_datasource(dataframe=df, name=f"patient_data_{uuid.uuid4()}", clear_if_exists=True)

In [None]:
project.set_datasource(datasource)

### Task Definition

In this notebook, we will define a machine learning task with the `ti-models` library, similar to what you did in notebook 2.

In [None]:
# This is a preset trainer, but any pytorch model can be created into the same format.
trainer = get_premade_ti_trainer("logreg", input_dim=4, n_classes=2)

In [None]:
print(trainer)

### Setup the parameters of the model

In [None]:
params = models.HybridFLGenericParams(
    fl_rounds=2,
    num_workers=2,
    strategy = models.aggregation_strategy.AggregationStrategy.CONSTANT
)

ml_params = models.HybridFLMachineLearningParams(
    local_epochs=1,
    batch_size=64,
    learning_rate=0.02,
    momentum=0.9
)

Define the computation (Hybrid Federated Learning) on the project.

In [None]:
hybrid_fl = HybridFL(
    project=project,
    task_id = "logreg",
    trainer=trainer,
    params=params,
    spec_params= ml_params,
)
hybrid_fl.max_timeout = 300 * 60 * time.SECOND

Clients authorize the project

In [None]:
project.request_authorization()

Here you can get a quick summary of the project:

In [None]:
project.display_overview()

In [None]:
project.display_datasources()

## Run the training

This will run a federated learning on the network of four instances (three contributing, and a coordinating root node).

Note: you might experience the following error

> `InternalError: error happened internally: unexpected error: please contact the instance's administrator`

if that's the case, there is an issue with the instance -- the test stops here. Thank you for participating!

In [None]:
hybrid_fl.run()

### Retrieve the result history

In [None]:
results = project.fetch_results()[-1]

In [None]:
import json
json.loads(results[1].history.metrics[-1])