## AMLD Workshop Notebook 2

In this notebook, you will explore your dataset and build a simple predictive model locally. Here, you will only work locally, trying to build the best model possible on the small amount of data that you have access to.

## üéØ OBJECTIVE

Explore a clinical dataset and train a baseline predictive model using only locally available data, establishing an initial performance reference.

<div style="background-color: rgba(182, 255, 18, 0.15); border-left: 5px solid #B6FF12; padding: 15px; margin: 10px 0;">
<h3> Real-world context</h3>
<p>Clinical AI development typically begins with local experimentation. Researchers work with limited datasets from their own institution to understand data characteristics, identify quality issues, and test modeling approaches. This phase is often exploratory and iterative, focused as much on learning the data as on achieving performance.

While this approach enables rapid prototyping, it also exposes its limits: small sample sizes, institutional bias, and reduced generalizability. Still, local training remains a critical first step‚Äîit defines the modeling strategy that may later be scaled to broader, multi-institution collaboration.

CHORUS supports this workflow by allowing researchers to experiment freely within a secure environment, ensuring that all data access, computation, and outputs remain governed and traceable, without resorting to unsecured local machines or ad-hoc setups.
</p>
</div>

## ‚öôÔ∏è IMPLEMENTATION

In [None]:
import pandas as pd
import torch
from torch.utils.data import Dataset
from ti_models.factories.ti_trainer_factory import get_premade_ti_trainer

## Explore the dataset

The data provided consists of records for patients with prostate cancer. (A longer description will be given for the workshop)


Feel free to play around with the data!

In [None]:
df = pd.read_csv("data/data_0.csv")

In [None]:
df

In [None]:
df = df.dropna()

In [None]:
# This is up to you!

## Train a logistic regression

Here, we will train a logistic regression using the `ti-models` library. This library is designed to write models that can be trained both locally and in a federated setting.

In [None]:
trainer = get_premade_ti_trainer("logreg", input_dim=4, n_classes=2)

In [None]:
class PandasDataset(Dataset):
    """Dataset from dataframe."""

    def __init__(self, df, input_dim):
        """
        Args:
            df: input dataframe. The first column should be the targets
            input_dim: number of dimensions considered as input features
        """
        self.df = df
        self.input_dim = input_dim

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]

        return torch.from_numpy(row.iloc[1:self.input_dim+1].to_numpy(dtype="float32")), torch.tensor(row.iloc[0], dtype=torch.long)

trainer.train(PandasDataset(df, input_dim=4))

In [None]:
trainer.metrics.get_results()

Feel free to write your own model.

In [None]:
# This is up to you!