Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support asynchronous SIFT extraction #22

Merged
merged 11 commits into from
Mar 20, 2017
Merged

Support asynchronous SIFT extraction #22

merged 11 commits into from
Mar 20, 2017

Conversation

griwodz
Copy link
Member

@griwodz griwodz commented Mar 18, 2017

This is a big change. It changes how the popsift library should be called, but backward compatibility is mostly in place.

However, the performance and functionality gain a lot.

  1. Run asynchronously, i.e. PopSift is now multithreaded: the caller create a SiftJob and can continue immediately. It does not block until it tries to access the features of the SiftJob.
  2. The "pipe" approach has been removed. Instead, you can pass arbitrary files. If they are bigger, memory will be reallocated (leading to a delay). If they are smaller, only cudaArrays and textures are reallocated, with much smaller delay. Identical sizes require no reallocation.
  3. The default operation resembles VLFeat. However, you can choose:
    3.1. direct-downscaling to all octaves from the input image (with correct handling of initial blur).
    3.2. downscaling according to VLFeat, OpenCV or Celebrandil (9-wide and 15-wide static filters).
    3.3. descriptor extraction modes: "loop" horizontal striding like VLFeat, "iloop" interpolated horizontal striding, "grid" extract a fixed grid directly in the extrema's orientation, "igrid" the same with hardware interpolation, "notile" is similar to grid with fewer reads.
  4. CMakeLists.txt lists all c and h files explicitly.
  5. bugfix in VLFeat-style smoothing of the orientation histogram
  6. conceptual improvement, descriptors are extracted from the refined level instead of level where the extremum was found
  7. some support for CUDA NVTX CPU profiling, as a CMake OPTION
  8. use explicit memory pinning and aligned memory for results to avoid copying
  9. demo application requires libDevIL to test multi-file processing

Tested on the complete "big_set" and on several subsets. Tested with three 1920x1080 images from flickr. Tested with highres CCTag test content that shows only tags (this is horribly slow).

- added alternative grid mode descriptor extraction
- added interpolated grid to descriptor extraction methods
- blur layers now in Array structure using surfaces
- add 0.5 shift to all tex2DLayered calls
- static list of files for CMake
- don't process feature point by level
- use explicit .0f in ec init
- level correction implemented
- __any needs :: namespace on mac
- BUGFIX: very old orientation bug discovered and fixed
- get gradiant from rotated coordinates
- processing directories works - no real pipelining yet
- reallocating memory for several frames
- merged extrema and descriptor arrays for all octaves
- merged extrema and descriptor arrays for all octaves
- extrema/desc single alloc in pyramid class replaces many in octave class
- remove PL variants of descriptor extraction: minimal effect
- closing in on single extremum/ori array
- add option for NVTX profiling
- made extrema counters and pointers global
- Prefix sum: make total available in first warp
- save descriptors for all octaves at once
- print octave in output-descriptor file
- major bug in normf usage
- desc array simplified
- extrema array simplified
- dynamic reallocation of extrema array
- maintain separate feature array for download
- downloading directly into temporarily pinned host mem
- memalign replacement function for mac
- avoid useless host copy for Feature buffer
- renamed device constant orientations to max_orientations
- avoid useless host copy for Extrema buffer
- avoid useless host copy for feat-to-ext mapping
- avoid useless host copy for first-stage extrema
- option to suppress descriptor writing to disk
- re-introduce a separate download stream
- remove dynamic parallelism
- move descriptor normalization to download stream
- slightly more stream synchronization
- new event structure for dog after blurring
- multithreaded processing of image sequences
- mac needs additional boost include line
@simogasp
Copy link
Member

@griwodz is now boost a required dependency of the library? Because in the last PR i moved find boost only for the samples...
You may need to add boost thread to the apt-get of travis in .travis.yaml

CMakeLists.txt Outdated
endif()
# set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG};-G;-g")
if(APPLE)
elseif(NOT WIN32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? question about NOT WIN32 or about commenting -G -g?
The first is an artifact of merging 2 ifs.
The second is because -g is added automatically in Debug mode, so only -G needs to added explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the question was about the elseif. Maybe just change the if() - elseif() into if(NOT WIN32 AND NOT APPLE), unless there is something missing for APPLE to be add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should always include debug symbols. The only difference between Release and Debug builds should be whether optimization is enabled *at some level) or completely disabled. Disk is cheap these days, no need to save space by not generating debug info.

CMakeLists.txt Outdated
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};--keep")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};--source-in-ptx")
endif(APPLE)
set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE};-O3")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this two cannot go together because of the bug in FindCUDA.cmake as explained in the removed comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the previous fix (not the new one) stopped the build on my own machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, I just restored the original version :-)
I know that this fix may cause problems for windows and Visual studio, but i hope the windoze experts @zvrba @yann-lty may come up with a proper solution :-)
Maybe just

if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR MSVC)

this way it is set only in debug mode or always for visual studio? I know it is not elegant and all but...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA VS projects generated by CMake are next to useless: they're good if you're going to build the project once, but are useless for development. The project is not "derived" from the CUDA project template, meaning that you can't adjust CUDACC options in the VS GUI. For some reason it also tends to recompile the whole project even if you change just one file. It also tends to not compile things that need to be recompiled, leading to spurious link errors.

For development on PopSift I'd manually create a VS project based on the CUDA template. To just use it as a library, I'd compile it once, install it and forget it.

@simogasp
Copy link
Member

for the failing in travis, I have no idea...
it says /usr/include/boost/thread/sync_bounded_queue.hpp(247): error: too many arguments in function call and actually it's true that in the header there is no function full() taking parameters (http://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp)

  template <typename ValueType>
  bool sync_bounded_queue<ValueType>::full() const
  {
    lock_guard<mutex> lk(mtx_);
    return full(lk);  // <------- THIS LINE (247)
  }

Maybe cuda 8 compiles because somehow nvcc is more loose on that kind of error?
Shall we try a different version of boost? for me it compiles fine with 1.58

@simogasp
Copy link
Member

@griwodz Ok using boost1.55 seems to fix that problem for every version.
On the other hand, Cuda 7.0 fails with a bunch of errors

/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(243) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed
/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(244) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed
/home/travis/build/alicevision/popsift/src/popsift/sift_pyramid.cu(245) (col. 17): error: calling a __host__ function("std::pow<double, int> ") from a __global__ function("popsift::prep_features") is not allowed

https://travis-ci.org/alicevision/popsift/jobs/212559011#L838

@griwodz
Copy link
Member Author

griwodz commented Mar 19, 2017

Boost is now required because the library part is multi-threaded to make it asynchronous.

The Boost sync queue could be replaced with explicit mutex/cond, but with boost sync_queue, I avoid my own bugs. However, the sync_queue interface appears unstable between boost versions.

The CUDA 7 problem was unexpected. I'll take a look at prep_features (which is new code).

@griwodz
Copy link
Member Author

griwodz commented Mar 19, 2017

powf instead of pow fixed CUDA 7 problem for travis

@griwodz
Copy link
Member Author

griwodz commented Mar 19, 2017

I updated the README.md and added backwards compatibility with the old blocking API.

@griwodz griwodz self-assigned this Mar 19, 2017
@zvrba
Copy link
Member

zvrba commented Mar 19, 2017

The boost documentation explicitly says that the sync queue interface is experimental and subject to change.

CMakeLists.txt Outdated

find_package(Boost 1.53.0 REQUIRED COMPONENTS system thread)
if(WIN32)
ADD_DEFINITIONS("-DBOOST_ALL_NO_LIB")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

CMakeLists.txt Outdated
# set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG};-G;-g")
if(APPLE)
elseif(NOT WIN32)
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xcompiler;-rdynamic;-lineinfo")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

README.md Outdated
@@ -11,7 +11,7 @@ Build

PopSift has been developed and tested on Linux machines, mostly a variant of Ubuntu, but compiles on MacOSX as well. It comes as a CMake project and requires at least CUDA 7.0. It is known to compile and work with NVidia cards of compute capability 3.0 (including the GT 650M), but the code is developed with the compute capability 5.2 card GTX 980 Ti in mind.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... CMake project and requires at least CUDA 7.0 and Boost >= 1.55.


# BUILD_INTERFACE allows to include the directory with source only when target is
# built in the building tree (ie, not from an install location)
target_include_directories(popsift
PRIVATE ${Boost_INCLUDE_DIRS}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUBLIC, as popsift.h, which is part of the files of the API, uses it in the #includes.
It is important because the include paths of boost must be propagated when using popsift as third library. If the hosting application does not use boost (hence it has no findpackage(boost) in its cmake) it will only compile in the lucky case of boost being in system directories.

@griwodz
Copy link
Member Author

griwodz commented Mar 19, 2017

Does anyone have a setup to check quickly if this breaks PopSift in OpenMVG ?

@griwodz
Copy link
Member Author

griwodz commented Mar 19, 2017

I know that sync_queue and sync_bounded_queue are experimental in Boost. sync_queue API seems stable, sync_bounded_queue does not. I could build my own versions, but it seems wasteful. I need extremely basic producer-consumer functionality.

@griwodz griwodz merged commit 36c6f85 into develop Mar 20, 2017
@simogasp simogasp deleted the develop-vartest branch March 20, 2017 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants