# Tobac

## Outline

- I. Introduction


- II. Principles of tobac
    - II.1. Feature detection
    - II.2. Segmentation
    - II.3. Trajectory linking


- III. Data input
    - III.1. Type of data
    - III.2. Data structure


- IV. Data output


## I. Introduction

To use tobac with the object-based method, data from geostationary satellite retrievels is needed.
In this case, the data is based on outgoing longwave radiation (OLR). </p>
With tobac, we can **detect** and **track** convective cells by using three steps: </p>
- Feature detection
- Segmentation
- Trajectory linking

With a working feature detection and tracking it is possible to analyse convective cells in its spatial and temporal evolution. 

## II. Principles of tobac
### II.1. Feature detection

The feature detection works with any two-dimensional field from the input data. In this case, the field is described by TOA **outgoing longwave radiation (OLR)** in the form of radiances (W m-2).

The easiest way to identify features is to determine and label regions above a specific threshold.</p>

By using e.g. a Gaussian filter, the input data can be smoothed to make the process more reliable. </p>

Regions above a specific threshold can lead to interconnected regions combining several features. To identify these connected regions separately, tobac uses *erosion* techniques based on the *morphology* module in the *skimage* package (skimage.morphology.binary_erosion). </p>
It shrinks the regions from the edges, thus removing the connecting ridges between connected features. (Senf et al. 2018) </p>

Describing a feature at one specific point in time: </p>
- The geometric centre of a feature is strongly affected by the shape of boundary. This depends very much on the selected threshold value.
- Better: Using a weighted mean that weights every pixel by the difference between the value itself and the threshold value has proven to perform best for a robust feature detection. 

Using a single threshold can lead to problematic results:
1. Restrictive threshold: Clouds in their initial (CI) or decaying stage will not be captured.
2. Weak-restrictive threshold: Can lead to large unconfined regions around a deep convection area or an merging of several features into one.

<img src='./pics/tobac_feature_detection.png' style = 'width: 300px'/>

A **step-wise approach** with a range of threshold values can resolve these conflicting requirements. </p>
With this approach, the feature identification starts from a weaker-restrictive threshold to a very-restrictive threshold. For each threshold value, features are identified in the same way. It labels the regions for the least restrctive threshold and replace features that were found based on a less restrictive threshold value in the surrounding region. </p>
This allows tobac to detect cloud initiation and decay but also local features with cold cloud tops within weaker-threshold areas. </p>
The detection based on OLR performs with the treshold values 250, 225, 200, 175 and 150 W m-2.

### II.2. Segmentation

After features and feature centres are identified, segmentation is used to associate areas with each feature. The implemented segmentation is using ***watershedding*** techniques based on the *morphology* module (skimage.morphology.watershed). This technique performs with a fixed threshold value (250 W m-2). Watershedding treats the input field as a topographic map and separates the field into different regions similar to individual watersheds or catchment basins in a geological context. (e.g. Heiblum et al., 2016a; Fiolleau and Roca, 2013; Senf et al., 2018)

The segmentation process starts by setting a marker at the position of each feature identified in the detection step in an array. The array is otherwise filled with zeros. The algorithm then fills all the area (2-D) based on the input field starting from these markers until reaching the threshold value. </p>
The border runs along the watershed line if two or more cloud objects are directly connected.

### II.3. Trajectory linking

Analyzing the temporal evolution of deep convection properties the information about the trajectory is needed. </p>
Individual features and associated areas in each time step have to be linked into cloud trajectories. By implementing *trackpy* as linking method, a convective cell tracking becomes possible. </p>
The principle is very easy - the linking determines which of the features detected is identical to an existing feature in the previous time step. The movement between two time steps is predicted based on the velocities in a number of previous time steps. The result is a predicted position in the field. As the predicted position has a certain uncertainty the search is restricted to a circular search region centred around the predicted position. </p>

<img src='./pics/tobac_trajectory_linking.png' style = 'width: 500px'/>

Problem case - newly initialised trajectories: </p>
If a trajectory is newly intialised, no information about the velocity from previous time steps is available. In this case, the algorithm uses the average velocity of the nearest tracked objects. </p>

A velocity threshold restricts the deviation of the future position from the linear extrapolation of the trajectory. So, the circular search range depends on the time step of the input data. </p>
Variations in the shape of the regions lead to shifts of the feature position by a few grid cells. Thereby, it can happen that the new feature position is not in the search range of the predicted positon. To prevent this, a minimum radius for the search range is used as a lower limit for the search region (2 km). </p>
Both parameters (upper/lower limit of the search region) are physical quantities and independent of the temporal and spatial resolution of the input data. </p>

Clouds that have been tracked for a very short time can be excluded by implement a threshold for the minimum lifetime. 

## III. Data input
### III.1. Type of data

The input data are provided in a format of *Iris* cubes or *xarray* data arrays. These data includes metadata, such as units and coordinates. This allows tobac to controll the setup independently of the tempoal/spatial resolution or dimension of the input data. </p>

Top-of-atmosphere outgoing longwave radiation (OLR) is used to track individual convective cells in satellite retrievals. OLR retrievals have the benefit that they do not depend on aspects of a complicated radiative transfer model. 
For calculationg OLR, radiances L from two different channels are used, a water vapour channel and a channel in the infrared window. </p>

| satellite      | Channel |
| :----------- | :----------- |
| GOES-13      | WV, 5.8-7.3 µm       |
|    | WIN, 10.2-11.2 µm        |
| MSG-5   | WV, 5.35-7.15 µm        |
|    | WIN, 9.8-11.8 µm        |


$$
OLR = 11.44L_{WIN} + 9.04L_{WV} + \frac{9.11L_{WV}}{L_{WIN}} - \frac{86.36}{L_{WIN}} - 0.14L^2_{WV} + 111.12
$$

### III.2. Data structure

The input data needs to be a netCDF format. It is structured in one variable and three dimensions. The OLR variable is a three-dimensional array with the dimensions of time, latitude and longitude.

<img src='./pics/tobac_data_input_structure.png' style = 'width: 300px'/>

The *time* dimension is given in a numeric UNIX format with the information about *axis*, *units*, *standard_name* and *calendar*. Longitude and latitude are given in geographic coordinates such as NAD83 or WGS84 and contain the information about *axis*, *units*, *standard_name*, *datum* and *spacing*.

## IV. Data output

Each principle (detection, segmentation, tracking) have one output which is given as *pandas* data frames. The output table of the *tracking* contains the tracked cell centres and trajectories. The mask of cloud areas is given as *Iris* cube or *xarray* data arrays. 
Input and output files have the same metadata, along with additional information from the tracking process, e.g. a time coordinate relative to the initiation of an individual convective cell. This allows for further analyses. 