Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job->get() takes a long time(~200ms with default settings on a small image #143

Open
danielj-genesis opened this issue Dec 13, 2022 · 1 comment

Comments

@danielj-genesis
Copy link

danielj-genesis commented Dec 13, 2022

Describe the problem
I wrote a c++ file to read video frames(854x480) from an input video and am calling a siftjob on each frame. The code is below

// main.cpp
#include <opencv2/opencv.hpp>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <popsift/popsift.h>
#include <popsift/features.h>
#include <stdio.h>
#include <string>
#include <iostream>

using namespace cv;
using namespace std;
using std::string;

int main()
{
    cudaDeviceReset();
    std::clock_t start;
    popsift::Config config;
    PopSift PopSift(
        config, 
        popsift::Config::ExtractingMode,
        false ? PopSift::FloatImages : PopSift::ByteImages
    );

    string filename = "input.mp4";
    VideoCapture cap(filename); 
    
    Mat frame;
    if (!cap.isOpened())
	{
	       std::cerr << "Couldn't open capture." << std::endl;
       	       return -1;
	}
	for (;;) 
	{
	       cap >> frame;
	       if (frame.empty()) break;
               unsigned char* dataMat = frame.data;
          
               SiftJob* job = PopSift.enqueue(frame.cols, frame.rows, dataMat);
               start = std::clock();
               popsift::Features* feature_list = job->get();
               std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
	       char c = cv::waitKey(10);
	       if (c == 27) break;
	}
	cap.release();
	return 0;
}

Here's my printed device information

Device information:
    Name: NVIDIA GeForce GTX 1660
    Compute Capability:    7.5
    Total device mem:      6441992192 B 6291008 kB 6143 MB
    Per-block shared mem:  49152
    Warp size:             32
    Max threads per block: 1024
    Max threads per SM(X): 1024
    Max block sizes:       {1024,1024,64}
    Max grid sizes:        {2147483647,65535,65535}
    Number of SM(x)s:      22
    Concurrent kernels:    yes
    Mapping host memory:   yes
    Unified addressing:    yes

On my gtx1660 the time for job->get() takes ~200ms on the 854x480px frame.

If I add:

    config.setDownsampling(0);
    config.setFilterMaxExtrema(false);

I can bring down the time to ~65ms
Which is still slower than the python version of this code using

sift = cv2.SIFT_create()
sift.detect(frame,None)

Which only takes ~50ms

PopSift should be running much much faster according the paper? Is there something I'm missing?

@Azhng
Copy link
Contributor

Azhng commented Feb 21, 2023

Is this the latency of a single frame? Or it's an average latency?

My guess is that CUDA lazily allocate and initializes a lot of internal objects, and transferring data from RAM to VRAM also has high latency.

You would have to "warm-up" the popsift queue before it can achieve it's maximum throughput.

This is how the neighbouring VulkanSIFT measured the performance too as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants