### Training the Cell Type Detector

The `train_cell_type_detector` function is designed to facilitate the training of a machine learning model capable of differentiating between PD-L1 positive, PD-L1 negative and other cells within patches of Whole Slide Images (WSIs), as part of the APEDIA project.

#### Parameters
- **data_df_path** (*str*): Path to the DataFrame containing dataset information. The DataFrame should include columns for patch paths, mask images, and instance segmentation images. It must also contain a `cv_split` column to specify the cross-validation split for data points, assigning each row an integer (e.g., 0 to 4) to denote its split group.
- **output_dir** (*str*): Directory path where training outputs, such as trained model weights and logs, will be stored.
- **cv_split** (*int*, default: 0): Index of the cross-validation split used for segregating training and validation datasets.
- **mask_col** (*str*, default: 'path_seg_one_match'): Column name in the DataFrame containing paths to mask images.
- **patch_col** (*str*, default: 'path_patch_png'): Column name in the DataFrame containing paths to patch images.
- **instance_seg_col** (*str*, default: 'path_exact_one_match'): Column name for paths to instance segmentation images.
- **bs** (*int*, default: 8): Batch size for training.
- **epochs** (*int*, default: 25): Number of training epochs.
- **lr** (*float*, default: 0.001): Learning rate for the optimizer during training.
- **aug_mult** (*float*, default: 1.0): Multiplier to adjust the intensity and diversity of data augmentation applied during training.
- **encoder_name** (*str*, default: 'timm-efficientnet-b5'): Name of the encoder used in the U-Net architecture.
- **num_classes** (*int*, default: 4): Number of distinct cell types (classes) the model should learn to identify.
- **device** (*str*, default: 'cuda'): The computing device ('cuda' or 'cpu') on which training will be performed.
- **do_elastic** (*bool*, default: False): Whether to include elastic transformations as part of data augmentation.
- **do_not_analyze** (*bool*, default: False): If set to True, skips the analysis phase after training.
- **label_smoothing** (*float*, default: 0.0): Degree of label smoothing applied, aiding in regularization.
- **weight_decay** (*float*, default: 0.01): Weight decay factor for the optimizer to counteract overfitting.
- **disable_tqdm** (*bool*, default: False): If True, disables progress bars during training.
- **class_weights** (*list of float*): Weights for each class to address class imbalance during loss calculation.

#### Usage
To train the cell type detector, define the parameters in the `train_cell_type_detection_params` dictionary as shown above, ensuring that mandatory parameters like `data_df_path` and `output_dir` are set to valid paths before initiating the training.

#### CLI
APEDIA train cell type detector can also be invoked via the command line, for example:
```
apedia train_cell_type_detector --data_df_path "/path/to/your/dataframe.feather" --output_dir "/path/to/your/output/directory" --epochs 25 --cv_split 0 --mask_col "path_seg_one_match" --patch_col "path_patch_png" --instance_seg_col "path_exact_one_match"
```

## Import the `train_cell_type_detector` function

In [None]:
from apedia.train_cell_type_detector import train_cell_type_detector

## Set the parameters

In [None]:
# Train the cell type detector
# Set the parameters

train_cell_type_detection_params = {
    "df_path": "/path/to/your/dataset/example_patch_df.feather",
    "output_dir": "/path/to/your/output/directory",
    # The other parameters can be adjusted as needed
    "cv_split": 0,
    "mask_col": 'path_seg_one_match',
    "patch_col": 'path_patch_png',
    "instance_seg_col": 'path_exact_one_match',
    "bs": 8,
    "epochs": 25,
    "lr": 0.001,
    "aug_mult": 1.0,
    "encoder_name": 'timm-efficientnet-b5',
    "num_input_channels": 3,
    "num_classes": 4,
    "num_workers": 4,
    "device": 'cuda',
    "do_elastic": False,
    "do_not_analyze": False,
    "label_smoothing": 0.0,
    "weight_decay": 0.01,
    "disable_tqdm": False,
    "class_weights": [0.0001, 1, 1, 1]
}

## Run the training

In [None]:
# train_cell_type_detection_params['df_path'] = "/home/fabian/projects/phd/APEDIA/data/example_df_segmentation.feather"
# train_cell_type_detection_params['output_dir'] = '/home/fabian/projects/phd/APEDIA/data/outputs'
# train_cell_type_detection_params['epochs'] = 2

In [None]:
train_cell_type_detector(train_cell_type_detection_params)