Support asynchronous SIFT extraction #22

griwodz · 2017-03-18T19:59:11Z

This is a big change. It changes how the popsift library should be called, but backward compatibility is mostly in place.

However, the performance and functionality gain a lot.

Run asynchronously, i.e. PopSift is now multithreaded: the caller create a SiftJob and can continue immediately. It does not block until it tries to access the features of the SiftJob.
The "pipe" approach has been removed. Instead, you can pass arbitrary files. If they are bigger, memory will be reallocated (leading to a delay). If they are smaller, only cudaArrays and textures are reallocated, with much smaller delay. Identical sizes require no reallocation.
The default operation resembles VLFeat. However, you can choose:
3.1. direct-downscaling to all octaves from the input image (with correct handling of initial blur).
3.2. downscaling according to VLFeat, OpenCV or Celebrandil (9-wide and 15-wide static filters).
3.3. descriptor extraction modes: "loop" horizontal striding like VLFeat, "iloop" interpolated horizontal striding, "grid" extract a fixed grid directly in the extrema's orientation, "igrid" the same with hardware interpolation, "notile" is similar to grid with fewer reads.
CMakeLists.txt lists all c and h files explicitly.
bugfix in VLFeat-style smoothing of the orientation histogram
conceptual improvement, descriptors are extracted from the refined level instead of level where the extremum was found
some support for CUDA NVTX CPU profiling, as a CMake OPTION
use explicit memory pinning and aligned memory for results to avoid copying
demo application requires libDevIL to test multi-file processing

Tested on the complete "big_set" and on several subsets. Tested with three 1920x1080 images from flickr. Tested with highres CCTag test content that shows only tags (this is horribly slow).

- added alternative grid mode descriptor extraction - added interpolated grid to descriptor extraction methods - blur layers now in Array structure using surfaces - add 0.5 shift to all tex2DLayered calls - static list of files for CMake - don't process feature point by level - use explicit .0f in ec init - level correction implemented - __any needs :: namespace on mac - BUGFIX: very old orientation bug discovered and fixed - get gradiant from rotated coordinates - processing directories works - no real pipelining yet - reallocating memory for several frames - merged extrema and descriptor arrays for all octaves - merged extrema and descriptor arrays for all octaves - extrema/desc single alloc in pyramid class replaces many in octave class - remove PL variants of descriptor extraction: minimal effect - closing in on single extremum/ori array - add option for NVTX profiling - made extrema counters and pointers global - Prefix sum: make total available in first warp - save descriptors for all octaves at once - print octave in output-descriptor file - major bug in normf usage - desc array simplified - extrema array simplified - dynamic reallocation of extrema array - maintain separate feature array for download - downloading directly into temporarily pinned host mem - memalign replacement function for mac - avoid useless host copy for Feature buffer - renamed device constant orientations to max_orientations - avoid useless host copy for Extrema buffer - avoid useless host copy for feat-to-ext mapping - avoid useless host copy for first-stage extrema - option to suppress descriptor writing to disk - re-introduce a separate download stream - remove dynamic parallelism - move descriptor normalization to download stream - slightly more stream synchronization - new event structure for dog after blurring - multithreaded processing of image sequences - mac needs additional boost include line

simogasp · 2017-03-18T20:04:24Z

@griwodz is now boost a required dependency of the library? Because in the last PR i moved find boost only for the samples...
You may need to add boost thread to the apt-get of travis in .travis.yaml

simogasp · 2017-03-18T20:30:25Z

CMakeLists.txt

-endif()
+# set(CUDA_NVCC_FLAGS_DEBUG   "${CUDA_NVCC_FLAGS_DEBUG};-G;-g")
+if(APPLE)
+elseif(NOT WIN32)


? question about NOT WIN32 or about commenting -G -g?
The first is an artifact of merging 2 ifs.
The second is because -g is added automatically in Debug mode, so only -G needs to added explicitly.

yes the question was about the elseif. Maybe just change the if() - elseif() into if(NOT WIN32 AND NOT APPLE), unless there is something missing for APPLE to be add?

Actually, we should always include debug symbols. The only difference between Release and Debug builds should be whether optimization is enabled *at some level) or completely disabled. Disk is cheap these days, no need to save space by not generating debug info.

simogasp · 2017-03-18T20:32:14Z

CMakeLists.txt

+set(CUDA_NVCC_FLAGS         "${CUDA_NVCC_FLAGS};--keep")
+set(CUDA_NVCC_FLAGS         "${CUDA_NVCC_FLAGS};--source-in-ptx")
+endif(APPLE)
+set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE};-O3")


this two cannot go together because of the bug in FindCUDA.cmake as explained in the removed comment

Interestingly, the previous fix (not the new one) stopped the build on my own machine.

Strange, I just restored the original version :-)
I know that this fix may cause problems for windows and Visual studio, but i hope the windoze experts @zvrba @yann-lty may come up with a proper solution :-)
Maybe just

if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR MSVC)

this way it is set only in debug mode or always for visual studio? I know it is not elegant and all but...

CUDA VS projects generated by CMake are next to useless: they're good if you're going to build the project once, but are useless for development. The project is not "derived" from the CUDA project template, meaning that you can't adjust CUDACC options in the VS GUI. For some reason it also tends to recompile the whole project even if you change just one file. It also tends to not compile things that need to be recompiled, leading to spurious link errors.

For development on PopSift I'd manually create a VS project based on the CUDA template. To just use it as a library, I'd compile it once, install it and forget it.

simogasp · 2017-03-18T22:18:04Z

for the failing in travis, I have no idea...
it says /usr/include/boost/thread/sync_bounded_queue.hpp(247): error: too many arguments in function call and actually it's true that in the header there is no function full() taking parameters (http://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp)

  template <typename ValueType>
  bool sync_bounded_queue<ValueType>::full() const
  {
    lock_guard<mutex> lk(mtx_);
    return full(lk);  // <------- THIS LINE (247)
  }

Maybe cuda 8 compiles because somehow nvcc is more loose on that kind of error?
Shall we try a different version of boost? for me it compiles fine with 1.58

simogasp · 2017-03-18T22:34:13Z

@griwodz Ok using boost1.55 seems to fix that problem for every version.
On the other hand, Cuda 7.0 fails with a bunch of errors

/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(243) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed
/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(244) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed
/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(245) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed

https://travis-ci.org/alicevision/popsift/jobs/212559011#L838

griwodz · 2017-03-19T06:47:23Z

Boost is now required because the library part is multi-threaded to make it asynchronous.

The Boost sync queue could be replaced with explicit mutex/cond, but with boost sync_queue, I avoid my own bugs. However, the sync_queue interface appears unstable between boost versions.

The CUDA 7 problem was unexpected. I'll take a look at prep_features (which is new code).

griwodz · 2017-03-19T07:05:35Z

powf instead of pow fixed CUDA 7 problem for travis

Add backward compatibility and update README

griwodz · 2017-03-19T08:11:55Z

I updated the README.md and added backwards compatibility with the old blocking API.

zvrba · 2017-03-19T11:23:57Z

The boost documentation explicitly says that the sync queue interface is experimental and subject to change.

simogasp · 2017-03-19T13:01:48Z

CMakeLists.txt

+
+find_package(Boost 1.53.0 REQUIRED COMPONENTS system thread)
+if(WIN32)
+ADD_DEFINITIONS("-DBOOST_ALL_NO_LIB")


simogasp · 2017-03-19T13:02:00Z

CMakeLists.txt

+# set(CUDA_NVCC_FLAGS_DEBUG   "${CUDA_NVCC_FLAGS_DEBUG};-G;-g")
+if(APPLE)
+elseif(NOT WIN32)
+set(CUDA_NVCC_FLAGS         "${CUDA_NVCC_FLAGS};-Xcompiler;-rdynamic;-lineinfo")


simogasp · 2017-03-19T13:04:06Z

README.md

@@ -11,7 +11,7 @@ Build

 PopSift has been developed and tested on Linux machines, mostly a variant of Ubuntu, but compiles on MacOSX as well. It comes as a CMake project and requires at least CUDA 7.0. It is known to compile and work with NVidia cards of compute capability 3.0 (including the GT 650M), but the code is developed with the compute capability 5.2 card GTX 980 Ti in mind.


... CMake project and requires at least CUDA 7.0 and Boost >= 1.55.

simogasp · 2017-03-19T13:08:12Z

src/CMakeLists.txt


 # BUILD_INTERFACE allows to include the directory with source only when target is
 # built in the building tree (ie, not from an install location)
 target_include_directories(popsift 
+            PRIVATE ${Boost_INCLUDE_DIRS}


PUBLIC, as popsift.h, which is part of the files of the API, uses it in the #includes.
It is important because the include paths of boost must be propagated when using popsift as third library. If the hosting application does not use boost (hence it has no findpackage(boost) in its cmake) it will only compile in the lucky case of boost being in system directories.

griwodz · 2017-03-19T13:44:26Z

Does anyone have a setup to check quickly if this breaks PopSift in OpenMVG ?

griwodz · 2017-03-19T13:47:03Z

I know that sync_queue and sync_bounded_queue are experimental in Boost. sync_queue API seems stable, sync_bounded_queue does not. I could build my own versions, but it seems wasteful. I need extremely basic producer-consumer functionality.

fabiencastan added the in progress label Mar 18, 2017

simogasp requested changes Mar 18, 2017

View reviewed changes

simogasp added 2 commits March 18, 2017 21:49

[cmake] fix to avoid nvcc complaining about ""

285e9fa

[ci] added required libboost-thread-dev

d12dad3

simogasp approved these changes Mar 18, 2017

View reviewed changes

[ci] test with boost1.55

76b4f92

[cmake] fix optional make for example

282df12

griwodz force-pushed the develop-vartest branch from b3a64c5 to 282df12 Compare March 19, 2017 06:42

[cuda] powf instead of pow for cuda 7

62b5073

Carsten Griwodz added 4 commits March 19, 2017 08:32

improve backware compatibility

fd7785a

updated README

03115ad

add deprecated synchronous api

ad5a748

Merge pull request #23 from alicevision/develop-vartest-bc

56e336e

Add backward compatibility and update README

griwodz self-assigned this Mar 19, 2017

simogasp requested changes Mar 19, 2017

View reviewed changes

simogasp added this to the 2017_03 milestone Mar 19, 2017

simogasp added lead:SRL type:enhancement labels Mar 19, 2017

improved cmakes and readme

02f3f13

simogasp approved these changes Mar 19, 2017

View reviewed changes

griwodz merged commit 36c6f85 into develop Mar 20, 2017

fabiencastan removed the in progress label Mar 20, 2017

simogasp deleted the develop-vartest branch March 20, 2017 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support asynchronous SIFT extraction #22

Support asynchronous SIFT extraction #22

griwodz commented Mar 18, 2017 •

edited by simogasp

Loading

simogasp commented Mar 18, 2017

simogasp Mar 18, 2017

griwodz Mar 19, 2017

simogasp Mar 19, 2017

zvrba Mar 19, 2017

simogasp Mar 18, 2017

griwodz Mar 19, 2017

simogasp Mar 19, 2017

zvrba Mar 19, 2017

simogasp commented Mar 18, 2017

simogasp commented Mar 18, 2017

griwodz commented Mar 19, 2017 •

edited

Loading

griwodz commented Mar 19, 2017

griwodz commented Mar 19, 2017

zvrba commented Mar 19, 2017

simogasp Mar 19, 2017

simogasp Mar 19, 2017

simogasp Mar 19, 2017

simogasp Mar 19, 2017

griwodz commented Mar 19, 2017

griwodz commented Mar 19, 2017

		@@ -11,7 +11,7 @@ Build

		PopSift has been developed and tested on Linux machines, mostly a variant of Ubuntu, but compiles on MacOSX as well. It comes as a CMake project and requires at least CUDA 7.0. It is known to compile and work with NVidia cards of compute capability 3.0 (including the GT 650M), but the code is developed with the compute capability 5.2 card GTX 980 Ti in mind.

Support asynchronous SIFT extraction #22

Support asynchronous SIFT extraction #22

Conversation

griwodz commented Mar 18, 2017 • edited by simogasp Loading

simogasp commented Mar 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simogasp commented Mar 18, 2017

simogasp commented Mar 18, 2017

griwodz commented Mar 19, 2017 • edited Loading

griwodz commented Mar 19, 2017

griwodz commented Mar 19, 2017

zvrba commented Mar 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

griwodz commented Mar 19, 2017

griwodz commented Mar 19, 2017

griwodz commented Mar 18, 2017 •

edited by simogasp

Loading

griwodz commented Mar 19, 2017 •

edited

Loading