# Getting Started with PhenoBench 

In this short tutorial, we will show you the main parts of the dataset. We assume that you have downloaded the PhenoBench dataset from the [dataset website](https://www.phenobench.org/dataset.html) and placed it in your home directory at `~/data/PhenoBench`; this folder contains the train, val, and test set of the dataset.

## Installation

For installing the devkit, you simply have to type:
 
 ```pip install phenobench```
 
 This command will install the needed dependencies and provide a simple dataset class, which could also be used to implement your own PyTorch, Tensorflow, or other custom dataloader -- it can also be the basis for your very custom processing pipeline.

## Loading the data

In the PhenoBench devkit, we have the dataset class called `PhenoBench`, which provides functionally to read the data from a given `root` directory. Here, `root` corresponds to the aforementioned dataset directory, i.e., `~/data/PhenoBench`, which contains the following directories:

<pre>
PhenoBench
├── test
│   └── images
├── train
│   ├── images
│   ├── leaf_instances
│   ├── leaf_visibility
│   ├── plant_instances
│   ├── plant_visibility
│   └── semantics
└── val
    ├── images
    ├── leaf_instances
    ├── leaf_visibility
    ├── plant_instances
    ├── plant_visibility
    └── semantics
</pre>

Note that only train and val contain annotations for the specific `target_types`:

- **semantics**: pixel-wise semantics as integers, such that `1` corresponds to crop, `2` corresponds to `weed` with additional partial labels with class id `3` for partial crops and `4` for partial weeds. Due to the annotation process, where we label complete plants, we can determine if the area of plant pixels in a cropped image is below 50% and mark these areas as partially visible.
- **plant_instances**: pixel-wise instance id as integers. With `make_unique_ids = True` the dataloader will remap all instances to the range `[1, N + M]`, where `N` is the number of crop plants and `M` is the number of weed instances. Without `make_unique_id`, the ids correspond to the arbitrary instance id of the global image. 
- **leaf_instances**: pixel-wise instance id as integers. With `make_unique_ids = True` the dataloader will remap all instances to the range `[1, N + M]`, where `N` is the number of crop plants and `M` is the number of weed instances. Without `make_unique_id`, the ids correspond to the arbitrary instance id of the global image. 
- **plant_visibility**: pixel-wise visibility mask, where visibility is given in range `[0,1]` encoding the percentage of pixels visible in the image.
- **leaf_visibility**: pixel-wise visibility mask, where visibility is given in range `[0,1]` encoding the percentage of pixels visible in the image.

Additionally, there are "meta-targets" that are generated from the pixel-wise instance annotation:

- **plant_bboxes**: plant bounding boxes represented as list of tuples containing `{"label", "center", "width", "height", "corner"}`, where `center` corresponds to the center of the bounding box, `width` and `height` refer tot he width/height of the bounding box, and `corner` to the corner coordinate of the upper-left corner.
- **leaf_bboxes**: leaf bounding boxes represented as list of tuples containing `{"label", "center", "width", "height", "corner"}`, where `center` corresponds to the center of the bounding box, `width` and `height` refer tot he width/height of the bounding box, and `corner` to the corner coordinate of the upper-left corner.

Enough theory ... Let's actually visualize some example images and annotations from the dataset.

In [None]:
from phenobench import PhenoBench
from pprint import pprint


train_data = PhenoBench("~/data/PhenoBench", 
                        target_types=["semantics", "plant_instances", "leaf_instances", "plant_bboxes", "leaf_bboxes"])


print(
    f"PhenoBench ({train_data.split} split) contains {len(train_data)} images. We loaded the following targets: {train_data.target_types}."
)
print("The first entry contains the following fields:")
pprint([f"{k} -> {type(v)}" for k, v in train_data[0].items()])


## Visualizing the Data

Besides the generic loader of the dataset, we also provide some visualization functions that can be used in combination with matplotlib. These drawing functions can be used to visualize the dataset, but also to render predictions.

All drawing functions are available in the package `phenobench.visualization` and here we show some examples generated from the dataset itself.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os

from phenobench.visualization import draw_semantics, draw_instances, draw_bboxes


n_samples = 4
n_rows = 4
fig, axes = plt.subplots(ncols=n_samples, nrows=n_rows, figsize=(3 * n_samples, 3 * n_rows))

indexes = np.random.choice(len(train_data), n_samples)

for i in range(n_rows):
    for j in range(n_samples):
        axes[i, j].set_axis_off()

for id, idx in enumerate(indexes):
    axes[0, id].set_title(os.path.splitext(train_data[idx]["image_name"])[0])

    draw_semantics(axes[0, id], train_data[idx]["image"], train_data[idx]["semantics"], alpha=0.5)
    draw_instances(axes[1, id], train_data[idx]["image"], train_data[idx]["plant_instances"], alpha=0.5)
    draw_instances(axes[2, id], train_data[idx]["image"], train_data[idx]["leaf_instances"], alpha=0.5)
    draw_bboxes(axes[3, id], train_data[idx]["image"], train_data[idx]["plant_bboxes"])


## Conclusion

If you have issues with the usage of the data, but also want to provide feedback regarding the functionality, feel free to open an issues or write us an email.


Good luck with the training of your models and we are looking forward to the amazing things you will do with our data.