## AMLD Workshop Notebook 1

In this notebook, you will learn how to connect to a Tune Insight instance from Chorus, create a project, load data, and run a basic analysis.

### Setup

In [None]:
import pandas as pd
import uuid

from tuneinsight import Diapason  # The Tune Insight client

#### Connecting to the instance

In [None]:
from amld_setup import *  # This defines variables to connect to the instance

# Enter your credentials here:
# DO NOT use quotes ("") around your username and password"
%env TI_USERNAME=
%env TI_PASSWORD=

In [None]:
client = Diapason.from_env()

In [None]:
client.healthcheck()

#### Create and share the project

Projects are the main unit of collaboration in Tune Insight projects. In a project, you will define the computation to run in a federated setting, and set the datasource used by your instance. Other participants will also choose the data used by their instance. Once everything is set up, the federated analysis can be run using data from all instances, without centralizing the data.

In [None]:
PROJECT_NAME = f"project-1-{uuid.uuid4()}"

project = client.new_project(name=PROJECT_NAME, clear_if_exists=True)
project.share()

## Load the dataset

The data that we will use in this workshop was provided to you as `data_0.csv`. We'll go into the data in more detail in notebook 2. In this notebook, we will simply load the data and use it for a very simple analysis to make sure everything works.

In [None]:
data_path = "data/data_0.csv"

In [None]:
# You can explore the data here.

df = pd.read_csv(data_path)

df

Upload the data to the instance and set it on the project.

In [None]:
datasource = client.new_csv_datasource(csv=data_path, name=f"patient_data_{uuid.uuid4()}", clear_if_exists=True)

In [None]:
project.set_datasource(datasource)

### Task Definition

In this notebook, we will run a very simple computation: computing the average value of each column.

In [None]:
from tuneinsight.computations import Aggregation

In [None]:
task = Aggregation(
    project=project,
    columns=list(df.columns),
    average=True,
)

### Authorization and summary

Because we are working in a federated setting, each participant needs to authorize the project before it can be run. In this workshop, we have configured the instances to automatically approve aggregation and machine learning projects. In the real world, this means that the project would go through manual review at each participating instance.

In [None]:
project.request_authorization()

We can now show a summary of the shared project's state.

In [None]:
project.display_overview()

In [None]:
project.display_datasources()

## Run the computation

The project is properly configured and each participant has loaded their data and authorized the project. We can now run a federated aggregation.

In [None]:
results = task.run()

In [None]:
results