/
NonMaximumSuppression.proto
187 lines (173 loc) · 7.42 KB
/
NonMaximumSuppression.proto
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
// Copyright (c) 2018, Apple Inc. All rights reserved.
//
// Use of this source code is governed by a BSD-3-clause license that can be
// found in LICENSE.txt or at https://opensource.org/licenses/BSD-3-Clause
syntax = "proto3";
option optimize_for = LITE_RUNTIME;
import public "DataStructures.proto";
package CoreML.Specification;
/*
* Non-maximum suppression of axis-aligned bounding boxes.
*
* This is used primarily for object detectors that tend to produce multiple
* boxes around a single object. This is a byproduct of the detector's
* robustness to spatial translation. If there are two or more bounding boxes
* that are very similar to one another, the algorithm should return only a
* single representative.
*
* Similarity between two bounding boxes is measured by intersection-over-union
* (IOU), the fraction between the area of intersection and area of the union.
* Here is an example where the areas can be calculated by hand by counting glyphs::
*
* +-------+ +-------+
* | | | |
* | +------+ +--+ | +---+
* | | | | | | | |
* +-------+ | +--+ +----+ |
* | | | |
* +------+ +------+
* Intersection Union
* IOU: 0.16 = 12 / 73
*
* All IOU scores are fractions between 0.0 (fully disjoint) and 1.0 (perfect
* overlap). The standard algorithm (PickTop) is defined as follows:
*
* 1. Sort boxes by descending order of confidence
* 2. Take the top one and mark it as keep
* 3. Suppress (mark it as discard) all boxes within a fixed IOU radius of the
* keep box
* 4. Go to 2 and repeat on the subset of boxes not already kept or discarded
* 5. When all boxes are processed, output only the ones marked as keep
*
* Before the algorithm, boxes that fall below the confidence threshold are
* discarded.
*/
message NonMaximumSuppression {
// Suppression methods:
/*
* Pick the bounding box of the top confidence, suppress all within a radius.
*/
message PickTop {
/*
* Suppression is only done among predictions with the same label
* (argmax of the confidence).
*/
bool perClass = 1;
}
/*
* Choose which underlying suppression method to use
*/
oneof SuppressionMethod {
PickTop pickTop = 1;
}
/*
* Optional class label mapping.
*/
oneof ClassLabels {
StringVector stringClassLabels = 100;
Int64Vector int64ClassLabels = 101;
}
/*
* This defines the radius of suppression. A box is considered to be within
* the radius of another box if their IOU score is less than this value.
*/
double iouThreshold = 110;
/*
* Remove bounding boxes below this threshold. The algorithm run-time is
* proportional to the square of the number of incoming bounding boxes
* (O(N^2)). This threshold is a way to reduce N to make the algorithm
* faster. The confidence threshold can be any non-negative value. Negative
* confidences are not allowed, since if the output shape is specified to be
* larger than boxes after suppression, the unused boxes are filled with
* zero confidence. If the prediction is handled by Core Vision, it is also
* important that confidences are defined with the following semantics:
*
* 1. Confidences should be between 0 and 1
* 2. The sum of the confidences for a prediction should not exceed 1, but is
* allowed to be less than 1
* 3. The sum of the confidences will be interpreted as the confidence of
* any object (e.g. if the confidences for two classes are 0.2 and 0.4,
it means there is a 60% (0.2 + 0.4) confidence that an object is
present)
*/
double confidenceThreshold = 111;
/*
* Set the name of the confidence input.
*
* The input should be a multi-array of type double and shape N x C. N is
* the number of boxes and C the number of classes. Each row describes the
* confidences of each object category being present at that particular
* location. Confidences should be nonnegative, where 0.0 means the highest
* certainty the object is not present.
*
* Specifying shape is optional.
*/
string confidenceInputFeatureName = 200;
/*
* Set the name of the coordinates input.
*
* The input should be a multi-array of type double and shape N x 4. The
* rows correspond to the rows of the confidence matrix. The four values
* describe (in order):
*
* - x (center location of the box along the horizontal axis)
* - y (center location of the box along the vertical axis)
* - width (size of box along the horizontal axis)
* - height (size of box on along the vertical axis)
*
* Specifying shape is optional.
*/
string coordinatesInputFeatureName = 201;
/*
* The iouThreshold can be optionally overridden by specifying this string
* and providing a corresponding input of type double. This allows changing
* the value of the parameter during run-time.
*
* The input should be a scalar double between 0.0 and 1.0. Setting it to 1.0
* means there will be no suppression based on IOU.
*/
string iouThresholdInputFeatureName = 202;
/*
* The confidenceThreshold can be optionally overridden by specifying this
* string and providing a corresponding input. This allows changing the
* value of the parameter during run-time, which can aid setting it just
* right for a particular use case.
*
* The input should be a scalar double with nonnegative value.
*/
string confidenceThresholdInputFeatureName = 203;
/*
* Set the name of the confidence output. The output will be the same type
* and shape as the corresponding input. The only difference is that the
* number of rows may have been reduced.
*
* Specifying shape is optional. One reason to specify shape is to limit
* the number of output boxes. This can be done is several ways:
*
* Fixed shape:
* The output can be pinned to a fixed set of boxes. If this number is larger
* than the number of boxes that would have been returned, the output is padded
* with zeros for both confidence and coordinates. Specifying a fixed shape
* can be done by setting either shape (deprecated) or allowedShapes set to
* fixedsize.
*
* Min/max:
* It is also possible to set both a minimum and a maximum. The same zero-padding
* as for fixed shape is applied when necessary. Setting min/max is done by defining
* two allowedShapes, where the first dimension uses a rangeofsizes defining lowerbound
* and upperbound.
*/
string confidenceOutputFeatureName = 210;
/*
* Set the name of the coordinates output. The output will be the same type
* and shape as the corresponding input. The only difference is that the
* number of rows may have been reduced.
*
* Specifying shape is optional. See confidence output for a more detailed
* description. Note that to achieve either fixed shape output or a
* constraint range of boxes, only one of confidence or coordinates need to
* set a shape. Both shapes are allowed to be defined, but in such case they
* have to be consistent along dimension 0.
*/
string coordinatesOutputFeatureName = 211;
}