-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce the time of post-processing? #11
Comments
It looks like all the priorboxes are created, even the ones who are associated with an anchor without a match, then all the locations and all the priorboxes are send to decode and it's only later that the uninteresting predictions are discarded (line 114 through 124). I think it's possible to do the opposite, first discard uninteresting predictions, then generate only the priorboxes corresponding to the interesting propositions and then decode them. This might provide the speedup you seek. |
Personally, I run the code in C++ and have my own code that generates prior boxes once and caches the result. I also only decode boxes that meet my threshold requirements. This means it is under a millisecond to decode rather than 150ms. Same code I used for Faceboxes but there has been variations. // This generates the 'd' input array used in decodeBox
std::vector<std::array<float, 3>> generateDefaultBoxRetina(int width, int height)
{
std::vector<std::array<float, 3>> boxes;
const static std::vector<int> feature_map_sizes = {8, 16, 32};
const static std::vector<std::vector<int>> min_sizes = {{16, 32}, {64, 128}, {256, 512}};
for (int e = 0; e < feature_map_sizes.size(); ++e)
{
float fmap = (float)feature_map_sizes.at(e);
const int maxH = (int)std::ceil((float)height / fmap);
const int maxW = (int)std::ceil((float)width / fmap);
for (int h = 0; h < maxH; ++h)
{
for (int w = 0; w < maxW; ++w)
{
for (const auto &min_size : min_sizes.at(e))
{
const float cx = (w + 0.5f) * fmap;
const float cy = (h + 0.5f) * fmap;
boxes.push_back({cx, cy, (float)min_size});
}
}
}
}
return boxes;
}
std::tuple<cv::Rect2f, float, std::vector<cv::Point2f>>
decodeBox(const std::vector<float> &p, const std::vector<float> &l, const std::array<float, 3> &d, int width, int height, const float &c)
{
// Hardcoded variance values
const static float vxy = 0.1f;
const static float vwh = 0.2f;
// cX, cY
const auto cx = p[0] * vxy * d[2] + d[0];
auto cy = p[1] * vxy * d[2] + d[1];
// Size
const auto sx = std::exp(p[2] * vwh) * d[2];
const auto sy = std::exp(p[3] * vwh) * d[2];
std::vector<cv::Point2f> landmarks;
if (!l.empty())
{
landmarks.reserve(10);
for (int i = 0; i < 10; i += 2)
{
auto cx = d[0] + l.at(i) * vxy * d[2];
auto cy = d[1] + l.at(i + 1) * vxy * d[2];
landmarks.push_back({cx / (float)width, cy / (float)height});
}
}
return {{(cx - (sx / 2.0f)) / (float)width, (cy - (sy / 2.0f)) / (float)height, sx / (float)width, sy / (float)height}, c, landmarks};
}
|
Although the network forward time only about 5ms, But the post-processing time on my laptop is up to 150 ms.
The main time-consuming places are 98 to 102 lines in detect.py, generating the prior box take up a lot of time, What can we do to reduce it? thanks
The text was updated successfully, but these errors were encountered: