# Development 

## Image to Value(s)

### Primary Focus: Explainable Object Counting in Microscopy Images
### Application: Explainable Virus Capsid Quantification
#### Challenge: Deep Learning as Black Box
#### Required Labels: Location Labels


TL;DR 🧬✨ We developed a regression model to quantify maturation states ("naked", "budding", "enveloped") of human cytomegalovirus (HCMV) during its final envelopment process i.e. secondary envelopment. Researchers can adapt the provided notebook for their own EM data analysis. 

![Teaser](./images/Teaser.png)




# Setup and Imports

*By executing the cell below, we import external libraries, which simplify the implementation of the notebook.*

In [None]:
# auto reload imports
%load_ext autoreload
%autoreload 2

# imports from the template 
from deepEM.Utils import create_text_widget, print_info, find_file
from deepEM.Logger import Logger
from deepEM.ModelTuner import ModelTuner

# costum implementation
from src.ModelTrainer import ModelTrainer


# import all required libraries
from pathlib import Path 
import os



# 1. Data

## 1.1. Data Acquisition
For exemplary purposes, we utilize existing data from [1]. However, we strongly encourage researchers to provide their own datasets tailored to their laboratory's specific needs. This approach enables training models optimized for the lab's unique sample preparation techniques and microscope attributes, such as detector configurations. Providing a different dataset can also address different types of application, like counting of different objects within an EM image.

----
*[1] Shaga Devan, Kavitha, et al. "Improved automatic detection of herpesvirus secondary envelopment stages in electron microscopy by augmenting training data with synthetic labelled images generated by a generative adversarial network." Cellular Microbiology 23.2 (2021): e13280.*

## 1.2. Data Annotation

This notebook requires annotations of object locations along with their corresponding classes.

The example dataset includes annotations for the locations of virus capsids within an image, categorized by their envelopment stages, $C=[’naked’,’budding’,’enveloped’]$.

To adapt the application of this notebook, EM researchers can provide their own dataset. 
In the following, we outline an exemplary workflow for generating annotation labels for the specific task of predicting the number of virus capsids and their corresponding envelopment stages.

For data annotation we recomment using the <a href="https://www.cvat.ai/">CVAT</a> (Computer Vision Annotation Tool) tool. For further instructions, we refer to our <a href="">Getting Started</a> page.



### 1.2.1. Create a New Task

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-1.png" alt="Create New Task1" width="500">
  <br>
</p>
When starting CVAT, you first need to create a new task. You can give it a name, add annotation types and upload your data.

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-2.png" alt="Create New Task2" width="500">
  <br>
</p>

Next, click on the `Add label` button. Name it based on the class you want to annotate. In our case one of *"naked", "budding", "enveloped"*. As annotation type choose `Points`. You should also pick a color, as this will simplify the annotation process. For adding new class click `Continue`. Once you added all nessecary classes click `Cancel`. 


<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-3.png" alt="Create New Task3" width="500">
  <br>
</p>

Now you can upload the data you wish to annotate. Finally, click `Submit & Open` to continue with the annotation of the uploaded data. 

### 1.2.2. Annotation

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-4.png" alt="Annotate1" width="500">
  <br>
</p>

This will open following view. Click on the job (in this view the `Job #1150022`) to start the annotation job. 

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-5.png" alt="Annotate2" width="500">
  <br>
</p>

To then annotate your data, select the `Draw new points` tool. Select the Label you wish to annotate from the dropdown menue. Then click `Shape` to annotate individual virus capsids with the label class. (Track will allow you to place annotations over multiple frames, which is helpful when annotating videos, tomograms or similar). You can use the arrows on the top middle to navigate through all of your data and to see your annotation progress. 

### 1.2.3. Save Annotation

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-6.png" alt="Save1" width="500">
  <br>
</p>

Once you are done annotating data, click on the `Menu` and select `Export job dataset`. 

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-7.png" alt="Save2" width="500">
  <br>
</p>

During export select the `CVAT for Images 1.1` format and give the folder a name. It will prepare the dataset for download. If you have the annotated images stored locally, there is no need to enable `Save Images`. 

<p align="center">
  <img src="https://viscom-ulm.github.io/DeepEM/static/images/explainable-virus-quantification/CVAT-8.png" alt="Save3" width="500">
  <br>
</p>

In the horizontal menu bar at the top go to `Requests`. It will show a request Export Annotations. On the right of this request click on the three dots on the right to download your exported, annotated data. This will download a .zip file containing the annotation file in .xml format. The name of the file should be "annotations.xml".



## 1.3. Data Preprocessing

The provided notebook requires all images to be in `.tif` format, containing single 2D images. If this is not the case for your data, you can use the [ImageJ](https://imagej.net/ij/) software. 
ImageJ is an open-source, Java-based image processing software that runs on multiple platforms and offers a wide range of features, including automation with macros, extensive community support, and a large library of tools and plugins.

In the following we showcase an example usecase to import and export data into required formats. 



<p align="center">
  <img src="./images/ImageJ-1.png" alt="ImageJ1" width="500">
  <br>
</p>

This tool allows to `import` a large amount of different, commonly used file formats im EM. 

<p align="center">
  <img src="./images/ImageJ-2.png" alt="ImageJ2" width="500">
  <br>
</p>

Using the provided `Save As..` functionality allows to save the imported files as a `Image Sequence` in .tif format or single `.tif` files.

## 1.4. Data Structuring

The provided notebook requires that all training, validation and testing data is placed within a single folder. Splitting the data into train, test and validation will be done during runtime. 

Additionally, the generated `annotations.xml` should be put in the same folder as the .tif images.

You can check the exemplary data provided at `data/tem-herpes/` for clarification.

An example with five images and the corresponding annotation is shown below: 

```
/data/tem-herpes/
├── image_001.tif
├── image_002.tif
├── image_003.tif
├── image_004.tif
├── image_005.tif
└── annotations.xml

```

> *Execute the cell below to show a text form. Within this text form you need to define the path to your training data (i.e. `data/tem-herpes/`).*


In [None]:
data_widget = create_text_widget("Data Path:","./data/tem-herpes","Enter the path to your data folder.")
display(*data_widget)

> *Execute the cell below to set and check the provided Data Path from the text form above.*

In [None]:
data_path = data_widget[0].value
print(f"[INFO]::Data path was set to: {data_path}")

# 2. Model Training

## 2.1. Setup Logging

By executing the cell below, we setup the logging directory for the hyperparameter search, model training and evaluation. 
The logger creates a folder at `./logs/<datafoldername>-<currentdatetime>/`. 

For each training run there will be one subfolder within the log directory. Training runs of hyperparameter sweeps are called `Sweep_<idx>`, while the subfolder of the final training run is called `TrainingRun`. During evaluation there will be one more subfolder created called `Evaluate`. 

Within each subfolder folder there will be logging of: 

- the used hyperparameters, (`<log-path>/<subfolder>/hyperparameters.json`)
- the best performing model checkpoint based on the validation loss (`<log-path>/<subfolder>/checkpoints/best_model.pth`)
- the last model checkpoint (`<log-path>/<subfolder>/checkpoints/latest_model.pth`)
- visualizations of training and validation curves (`<log-path>/<subfolder>/plots/training_curves.png`)
- qualitative visualization of sampled validation images (`<log-path>/<subfolder>/samples/`)
- results on test metrics (`<log-path>/<subfolder>/test_results.txt`)
- qualitative visualization of sampled test images (`<log-path>/<subfolder>/samples/`)


Sample visualizations of this use case include the model input, validation labels, predictions, and a GradCAM overlay. GradCAM can be interpreted as an heat map, giving an intuition about "where the model looks" to make its prediction.

> *Exectue the cell below to setup the logger.*


In [None]:
logger = Logger(data_path)

## 2.2. Hyperparameter Tuning

Hyperparameters in deep learning are configurable settings that determine how a model is trained. Unlike model parameters, which are learned from data, hyperparameters are set before training. During hyperparameter tuning, a predefined range of hyperparameters is explored by training the model multiple times with different configurations. The performance of each configuration is evaluated on a validation set, and the best-performing hyperparameters are selected for further training.

Since hyperparameter search requires multiple training runs, it can be highly time- and resource-intensive. To mitigate this, training runs are often limited to fewer epochs or a smaller subset of the training data.

Our playground provides users with an automated hyperparameter search based on grid search. This means the model is trained multiple times using all possible combinations of selected hyperparameters. The search space—the set of hyperparameters to explore—is initially defined by deep learning (DL) experts. Additionally, DL experts provide explanations for each parameter, empowering curious electron microscopy (EM) specialists to adjust the search space according to their specific needs. Our logging also gives approximates of the remaining times for a single sweep run as well as the full sweep. However, these times can be inaccurate, especially at the beginning of the sweep.

A hyperparameter search is not strictly required, but the choice of hyperparameters can significantly impact a deep neural network's performance. To ensure optimal results, we strongly recommend performing a hyperparameter search, especially when training with your own annotated data. Note that, if you interrupt the hyperparameter search, it will not have finished the search over the full search space, leading to suboptimal results.

In order to do so, you can adapt the form below. Each sweep hyperparameter should be separated by `,`. Floating point values should be written like `0.1`. 

> *Execute the cell below to show the form of the hyperparameter search space.*

In [None]:
# hyperparameter search
model_trainer = ModelTrainer(data_path, logger)

hyperparameter_tuner = ModelTuner(model_trainer, data_path, logger)
form = hyperparameter_tuner.create_hyperparameter_widgets()
display(form)


> **[Optional]** *If you wish to run a hyperparameter sweep based on the parameters above, please execute the cell below.*

In [None]:
best_config = None
hyperparameter_tuner.update_config(form)
print("Sweep config:")
for k in hyperparameter_tuner.config['hyperparameter'].keys():
    print(f"\t{k}: {hyperparameter_tuner.config['hyperparameter'][k]['values']} (default: {hyperparameter_tuner.config['hyperparameter'][k]['default']})")
best_config = hyperparameter_tuner.tune()

Our automatic hyperparameter tuning is able to find the best performing set of hyperparameters based on the setting shown above. 

However, there can be scenarios, where additional flexibility is required. Therefore, you are able to change these hyperparameters in the following. 

**WARNING** This setting is for advanced users only. Please only change parameters here, if you know what you are doing. 

> *Execute the cell below to show and possibly adapt the currently chosen hyperparameters.*

In [None]:
form = hyperparameter_tuner.edit_hyperparameters()
display(form)

> *Execute the cell below to set the hyperparameters for your training run based on the form above. Note that you should only change the values in the form above, if you know what you are doing.*

In [None]:
best_config = hyperparameter_tuner.update_hyperparameters(form)
print_info(f"Will use following hyperparameters for future training: {best_config}")

## 2.3. Training and Validation

In this section we train and validate the model based on the provided data and hyperparameters resulting from the previous sweep.

Training in deep learning is the process where a model learns patterns from labeled data (the one provided at the top of this notebook) by optimizing its parameters through backpropagation. 
Validation involves using a separate dataset to evaluate the model's performance during training, ensuring it generalizes well to unseen data.

Training and validating a model can take a lot of time (ranging from minutes to hours, days or even weeks) depending on the model, the training procedure and the dataset. Our logging module provides approximate times for training, which you can see below the executed training cell or at the `log.txt` within the current log directory (i.e. `<log-dir>/TrainingRun/`). However, these times can be inaccurate, especially at the beginning of training. 

If no sweep was conducted (not recommended!), the default parameters, defined by the DL expert will be used. 

In case the training run was cancelled, it can be resumed from a previous checkpoint. To do so, you need to provide a model checkpoint in the text form below. You can find these checkpoints inside the runs logging directory (`<log-dir>/TrainingRun/checkpoints/latest_model.pth`). If you do not wish to resume training, you can leave the text form below empty.

> *Execute the cell below to show a text form. If you wish to resume training, you need to provide a model path from a previous checkpoint.*

In [None]:
resume_widget = create_text_widget("Resume Training:","","If you wish to resume an earlier training, enter the path to the latest_model.pth file here.")
display(*resume_widget)

> *Execute the cell below to prepare the model and data for training.*

In [None]:
resume_training = resume_widget[0].value
if(resume_training):
    resume_training = Path(resume_training)
    if(resume_training.is_dir()):
        resume_training = Path(os.path.join(resume_training,"latest_model.pth"))
    if(not resume_training.is_file()):
        logger.log_error(f"Could not find resume path at {resume_training}. Will start training from scatch.")
        resume_training = None
else:
    resume_training = None
logger.init("TrainingRun")
model_trainer.resume_from_checkpoint = resume_training
model_trainer.prepare(best_config)


> **[Optional]** *If you wish to train a model, execute the cell below. Note that this can take a while.* 

In [None]:
model_trainer.fit()

# 3. Model Evaluation
Evaluation in deep learning is the process of evaluating a trained model on a separate, unseen dataset to measure its final performance. It provides an unbiased assessment of the model's ability to generalize to new data.

## 3.1. Choose Model 

In this section we choose the model for testing. 
If you leave the `Model Path` empty in the text form below, it will use the last model trained.
Otherwise, you can define the path to the models best weights at `<log-path>/TrainingRun/checkpoints/best_model.pth` or by providing a path to a directory, which contains `best_model.pth` (like `<log-path>/TrainingRun/`). This allows you to also test shared models or previousely trained models.

> *Execute the cell below to show the text form for selecting a model for testing.*

In [None]:
model_widget = create_text_widget("Model Path:","","If you wish to test a specific model, you can here define the path to its checkpoint. (For example: logs/tem-herpes_2025-02-03_11-42-43/TrainingRun/checkpoints)")
display(*model_widget)

## 3.2. Evaluate
We finally evaluate the provided model on the test set. We investigate following metrics: 

#### **False Positive (FP)**
- **Definition**: A **false positive** occurs when the model **detects an object that is not actually there**.  
- **Example**: Imagine an electron microscopy image where the model highlights a structure as a capsid, but in reality, it is just noise. This would be a **false positive** because the detection is incorrect.  
- **Explanation**: The **FP value** represents the **average number of incorrectly detected objects** per image patch—cases where the model mistakenly finds objects that do not exist.  

---

#### **False Negative (FN)**
- **Definition**: A **false negative** occurs when the model **fails to detect an actual object** that is present in the image.  
- **Example**: Suppose there is a capsid in the image, but the model **does not recognize it**. This is a **false negative** because an actual object was missed.  
- **Explanation**: The **FN value** represents the **average number of real objects that the model failed to detect** per image patch.  

---

#### **True Positive (TP)**
- **Definition**: A **true positive** occurs when the model correctly detects an object that is actually present.  
- **Example**: If there is a capsid in the image and the model correctly detects it, this is a **true positive**.  
- **Explanation**: The **TP value** represents the **average number of correctly detected objects** per image patch—cases where the model successfully identifies real objects.  

---

#### **Mean Absolute Error (MAE)**
- **Definition**: The **Mean Absolute Error (MAE)** is a metric that measures the **average absolute difference** between the predicted and actual number of objects per image patch.  
- **Explanation**:  
  - MAE **measures how far off** the model’s predictions are from the true values on average.  
  - A **lower MAE** indicates better performance, as it means the predicted object count is closer to the actual count.  
- **Example**:  
  - If an image patch contains **5 objects** but the model predicts **7**, the absolute error is **|5 - 7| = 2**.  
  - The MAE across all patches provides an overall error measure.  

---

#### **Summary**
| Metric  | Meaning | Interpretation |
|---------|---------|---------|
| **FP**  | Wrongly detected objects (model detects something that isn't there). | Lower is better |
| **FN**  | Missed objects (model fails to detect an actual object). | Lower is better | 
| **TP**  | Correctly detected objects (model successfully identifies real objects). | Higher is better |
| **MAE** | Average absolute difference between predicted and actual object counts. | Lower is better |

Each metric is computed as the average for each input patch. (For example TP=2 means that the model correctly predicts 2 virus capsides per input image patch.) Additionally, we give the metrics as sum of all classes and additionally report the metrics for each class individually.

We further visualize the input image, the model prediction and use GradCAM to highlight areas where the model was "looking" to make its predictions. 
These visualizations are saved to `<log-path>/TrainingRun/samples/test_*`. 

> **[Optional]** *If you wish to evaluate a model (recommended), execute the cell below.*

In [None]:
from pathlib import Path 
start_evaluation = False
eval_model = model_widget[0].value
if(eval_model):
    eval_model = Path(eval_model)
    if(eval_model.is_dir()):
        eval_model = Path(find_file(eval_model, "best_model.pth")) 
    if(not eval_model.is_file()):
        logger.log_error(f"Could not find model at {eval_model}. Make sure to train a model before evaluation.")
        eval_model = None
    else: 
        start_evaluation = True
else:
    recent_logs = logger.get_most_recent_logs()
    eval_model = ""
    for dataname, log_path in recent_logs.items():
        if(dataname == Path(data_path).stem):
            eval_model = Path(log_path+"/TrainingRun/checkpoints/best_model.pth")
            if(not eval_model.is_file()):
                logger.log_error(f"Cound not find a trained model at {eval_model}. Make sure you fully train a model first before evaluating.")
            else:
                logger.log_info(f"Found most recent log at {eval_model}")
                start_evaluation = True
        else: 
            continue
    if(not start_evaluation):
        logger.log_error("Cound not find a trained model. Make sure you train a model first before evaluating.")
      
if(start_evaluation):
    model_trainer.load_checkpoint(eval_model)
    model_trainer.test()      

 



Once you have trained and evaluated your model on the labeled data, you can try using it on unseen, unlabeled data for the support of your EM data analysis. To do so, open the `2_Inference.ipynb` and follow the steps provided.

Additionally, you can share your training code and model weights with other collegues. An easy way on how to do this can be found on our website under ["Getting Started - 5. Collaboration"](https://viscom-ulm.github.io/DeepEM/getting-started.html).