Evaluation Benchmarks

📚 This guide explains the details of the evaluation benchmarks on the AOT Dataset 🚀. UPDATED 24 August 2021.

Reference: https://www.aicrowd.com/challenges/airborne-object-tracking-challenge#benchmarks

Airborne Detection and Tracking benchmark

Airborne detection and tracking task is essentially an online multi-object tracking with private detections (i.e., detections generated by the algorithm and not provided from external input). There is a wide range of evaluation metrics for multi-object tracking, however the unique nature of the problem imposes certain requirements that help us to define specific metrics for Airborne Detection and Tracking Benchmark. Those requirements and metrics are outlined below. To ensure safe autonomous flight, the drone should be able to detect a possible collision with an approaching airborne object and maneuver to prevent it. However, unless there is a detected possible collision, the best way to ensure a safe flight is to follow the originally planned route. Deviating from the planned route increases the chances of encounters with other airborne objects and static obstacles, previously not captured by the drone camera. As such, false alarms that might trigger unnecessary maneuvers should be avoided, which imposes a very low budget of false alarms (high precision detection). Another consideration is that while early detection is generally desired, relying only on information from early stages of the encounter might not be indicative of the future motion of the detected airborne object. Therefore, an effective alert must be based on detection (or tracking) that is not too early to allow accurate prediction of future motion, and yet early enough to allow time to maneuver. Typically, such temporal window will depend on a closing velocity between the drone and the other airborne object. However, for simplicity, we will refer to the distance between the drone and the encountered airborne object, to establish when the detections must occur. Finally, to capture sufficient information for future motion prediction, the object should be tracked for several seconds.

To summarize, the requirements for desired solutions are:

Very low number of false alarms
Detections of the airborne object within the distance that allows maneuver (i.e., not too close) and is informative for future motion prediction (i.e., not too far away)
Tracking the airborne object for sufficient time to allow future motion prediction

Next, we define airborne metrics that will evaluate if the above terms are met.

The airborne metrics measures:

Encounter-Level Detection Rate (EDR) - number of successfully detected encounters divided by the total number of encounters that should be detected, where an encounter is defined as a temporal sequence (a subset of frames) in which the same planned aircraft (airborne object) is visible (i.e., is manually labeled) and is located within the pre-defined range of distances. The encounter is successfully detected if:
- Its respective airborne object is tracked for at least 3 seconds within the encounter duration.
- The detection and 3 second tracking occur before the airborne object is within 300m to the drone or within the first 3 seconds of the encounter.
False Alarm Rate (HFAR) per hour - a number of unique reported track ids, which correspond to at least one false positive airborne report, divided by total number of hours in the dataset

I expect a HFAR budget of 5 false alarm per 10 hours of $flight = 0.5$

Frame-level Airborne Object Detection Benchmark

While the first benchmark of this project involves tracking, I also submit results for frame-level airborne object detection benchmark. The frame-level metrics will measure:

Average frame-level detection rate (AFDR) - a ratio between the number of the detected airborne objects and all the airborne objects that should be detected. For the purpose of this calculation, all the planned airborne aircraft within 700m distance will be considered.
False positives per image (FPPI) - a ratio between the number of false positive airborne reports and the number of images in the dataset.

Additional details on detection evaluation and false alarms calculation

I elaborate on a definition of encounters that form the set of encounters for detection and tracking benchmark. Recall that a planned aircraft is equipped with GPS during data collection and therefore provides GPS measurements associated with its physical location. I further define, a valid airborne encounter as an encounter with planned aircraft during which the maximum distance to the aircraft is at most UPPER_BOUND_MAX_DIST. The upper bound on the maximum distance ensures that the detection will be benchmarked with respect to airborne objects that are not too far away from the camera. In addition, an upper bound on the minimum distance in the encounter is defined as UPPER_BOUND_MIN_DIST (to disregard encounters that do not get sufficiently close to the camera). Note that dataset videos and the provided ground truth labels might contain other airborne objects that are not planned, or planned airborne objects that do not belong to valid encounters. The airborne metrics does not consider those objects for detection rate calculation and treats them as ‘don’t care’ (i.e., those detections will not be counter towards false alarms). Frame-level metrics consider non-planned objects and planned objects at range > 700m as don't care.

Any airborne report (as defined in Table 3) that does not match an airborne object is considered a false positive and is counted once per the same track id as a false alarm. The reason behind it is that a false alarm might trigger a potential maneuver and hence false positives that occur later and correspond to the same object has lower overall impact in real scenarios.

The definitions of successful detection and false positive depend on the matching criteria between the bounding box produced by the detector and the ground truth bounding box. A common matching measure for object detection is Intersection over Union (IoU). However, IoU is sensitive to small bounding boxes, and since our dataset contains very small objects, I propose to use extended IoU, defined as:

$eIoU = IoU(gt _ bbox, det _ bbox), area_{gt} \geq MIN _ OBJECT _ AREA$

$eIoU = IoU(gdilate_{min _ area}(gt _ bbox), dilate_{min _ area(det _bbox)}), area_{gt} < MIN _OBJECT_AREA$

In words:

If the $ground_truth_area >= MIN_OBJECT_AREA$ extended $IoU = IoU$
If the $ground_truth_area < MIN_OBJECT_AREA$, the ground truth bounding box is dilated to have at least minimum $area = MIN_OBJECT_AREA$, and all the detections (matched against this ground truth) are dilated to have at least minimum $area = MIN_OBJECT_AREA$. The dilation operation will maintain aspect ratio of the bounding boxes.

The reported bounding box is considered a match, if the eIoU between the reported bounding box and the ground truth bounding box is greater than $IS_MATCH_MIN_IOU_THRESH$.

If the eIoU between the reported bounding box and any ground truth is less than $IS_NO_MATCH_MAX_IOU_THRESH$ the reported bounding box is considered a false positive.

Any other case that falls in between the two thresholds is considered neutral (‘don’t care’), due to possible inaccuracies in ground truth labeling.

Please refer to Tables 3-4 for further clarifications on the terms mentioned in this section.

Term	Definition
Bounding box	An airborne object with GPS (in the currently available datasets - Helicopter1, Airplane1) and manulally labeled ground truth bounding box in the image.
Encounter	1) An interval of time of at least MIN_SECS with a planned airborne object 2) The segment can have gaps of length <= 0.1 * MIN_SECS, during which the ground truth might be missing or the object is at a farther range / not visible in the image 3) A single encounter can include one airborne object only
Valid encounter (should be detected)	The encounter with airborne object, such that: minimum distance to the object <= UPPER_BOUND_MIN_DIST maximum distance to the object <= UPPER_BOUND_MAX_DIST
Airborne report	Predicted bounding box, frame id, detection confidence score Optional: track id. If not provided detection id will be used
False positive airborne report	An airborne report that cannot be matched to ANY airborne object (i.e. eIoU with any airborne object is below IS_NO_MATCH_MAX_IOU_THRESH)
Detected Airborne Object	An airborne object that can be matched with an airborne report
Frame level detection rate per encounter	A ratio between the number of frames in which a specific airborne object is detected out of all the frames that this object should be detected in the considered temporal window of frames.

Table 3: Glossary

Constant	Units	Value	Comments
UPPER_BOUND_MIN_DIST	m	330
UPPER_BOUND_MAX_DIST	m	700
MIN_OBJECT_AREA	pixels	100	At the ground truth resolution
IS_MATCH_MIN_IOU_THRESH	N/A	0.2
IS_NO_MATCH_MAX_IOU_THRESH	N/A	0.02	0< IS_NO_MATCH_MAX_IOU_THRESH <= IS_MATCH_MIN_IOU_THRESH
MIN_SECS		3.00

Table 4: Constants

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Benchmarks

Airborne Detection and Tracking benchmark

Frame-level Airborne Object Detection Benchmark

Additional details on detection evaluation and false alarms calculation

Clone this wiki locally