-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support asynchronous SIFT extraction #22
Conversation
- added alternative grid mode descriptor extraction - added interpolated grid to descriptor extraction methods - blur layers now in Array structure using surfaces - add 0.5 shift to all tex2DLayered calls - static list of files for CMake - don't process feature point by level - use explicit .0f in ec init - level correction implemented - __any needs :: namespace on mac - BUGFIX: very old orientation bug discovered and fixed - get gradiant from rotated coordinates - processing directories works - no real pipelining yet - reallocating memory for several frames - merged extrema and descriptor arrays for all octaves - merged extrema and descriptor arrays for all octaves - extrema/desc single alloc in pyramid class replaces many in octave class - remove PL variants of descriptor extraction: minimal effect - closing in on single extremum/ori array - add option for NVTX profiling - made extrema counters and pointers global - Prefix sum: make total available in first warp - save descriptors for all octaves at once - print octave in output-descriptor file - major bug in normf usage - desc array simplified - extrema array simplified - dynamic reallocation of extrema array - maintain separate feature array for download - downloading directly into temporarily pinned host mem - memalign replacement function for mac - avoid useless host copy for Feature buffer - renamed device constant orientations to max_orientations - avoid useless host copy for Extrema buffer - avoid useless host copy for feat-to-ext mapping - avoid useless host copy for first-stage extrema - option to suppress descriptor writing to disk - re-introduce a separate download stream - remove dynamic parallelism - move descriptor normalization to download stream - slightly more stream synchronization - new event structure for dog after blurring - multithreaded processing of image sequences - mac needs additional boost include line
@griwodz is now boost a required dependency of the library? Because in the last PR i moved find boost only for the samples... |
CMakeLists.txt
Outdated
endif() | ||
# set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG};-G;-g") | ||
if(APPLE) | ||
elseif(NOT WIN32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? question about NOT WIN32 or about commenting -G -g?
The first is an artifact of merging 2 ifs.
The second is because -g is added automatically in Debug mode, so only -G needs to added explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the question was about the elseif
. Maybe just change the if() - elseif() into if(NOT WIN32 AND NOT APPLE)
, unless there is something missing for APPLE to be add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we should always include debug symbols. The only difference between Release and Debug builds should be whether optimization is enabled *at some level) or completely disabled. Disk is cheap these days, no need to save space by not generating debug info.
CMakeLists.txt
Outdated
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};--keep") | ||
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};--source-in-ptx") | ||
endif(APPLE) | ||
set(CUDA_NVCC_FLAGS_RELEASE "${CUDA_NVCC_FLAGS_RELEASE};-O3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this two cannot go together because of the bug in FindCUDA.cmake as explained in the removed comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the previous fix (not the new one) stopped the build on my own machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, I just restored the original version :-)
I know that this fix may cause problems for windows and Visual studio, but i hope the windoze experts @zvrba @yann-lty may come up with a proper solution :-)
Maybe just
if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR MSVC)
this way it is set only in debug mode or always for visual studio? I know it is not elegant and all but...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA VS projects generated by CMake are next to useless: they're good if you're going to build the project once, but are useless for development. The project is not "derived" from the CUDA project template, meaning that you can't adjust CUDACC options in the VS GUI. For some reason it also tends to recompile the whole project even if you change just one file. It also tends to not compile things that need to be recompiled, leading to spurious link errors.
For development on PopSift I'd manually create a VS project based on the CUDA template. To just use it as a library, I'd compile it once, install it and forget it.
for the failing in travis, I have no idea... template <typename ValueType>
bool sync_bounded_queue<ValueType>::full() const
{
lock_guard<mutex> lk(mtx_);
return full(lk); // <------- THIS LINE (247)
} Maybe cuda 8 compiles because somehow nvcc is more loose on that kind of error? |
@griwodz Ok using boost1.55 seems to fix that problem for every version.
https://travis-ci.org/alicevision/popsift/jobs/212559011#L838 |
b3a64c5
to
282df12
Compare
Boost is now required because the library part is multi-threaded to make it asynchronous. The Boost sync queue could be replaced with explicit mutex/cond, but with boost sync_queue, I avoid my own bugs. However, the sync_queue interface appears unstable between boost versions. The CUDA 7 problem was unexpected. I'll take a look at prep_features (which is new code). |
powf instead of pow fixed CUDA 7 problem for travis |
Add backward compatibility and update README
I updated the README.md and added backwards compatibility with the old blocking API. |
The boost documentation explicitly says that the sync queue interface is experimental and subject to change. |
CMakeLists.txt
Outdated
|
||
find_package(Boost 1.53.0 REQUIRED COMPONENTS system thread) | ||
if(WIN32) | ||
ADD_DEFINITIONS("-DBOOST_ALL_NO_LIB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
CMakeLists.txt
Outdated
# set(CUDA_NVCC_FLAGS_DEBUG "${CUDA_NVCC_FLAGS_DEBUG};-G;-g") | ||
if(APPLE) | ||
elseif(NOT WIN32) | ||
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xcompiler;-rdynamic;-lineinfo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
README.md
Outdated
@@ -11,7 +11,7 @@ Build | |||
|
|||
PopSift has been developed and tested on Linux machines, mostly a variant of Ubuntu, but compiles on MacOSX as well. It comes as a CMake project and requires at least CUDA 7.0. It is known to compile and work with NVidia cards of compute capability 3.0 (including the GT 650M), but the code is developed with the compute capability 5.2 card GTX 980 Ti in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... CMake project and requires at least CUDA 7.0
and Boost >= 1.55.
src/CMakeLists.txt
Outdated
|
||
# BUILD_INTERFACE allows to include the directory with source only when target is | ||
# built in the building tree (ie, not from an install location) | ||
target_include_directories(popsift | ||
PRIVATE ${Boost_INCLUDE_DIRS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PUBLIC
, as popsift.h
, which is part of the files of the API, uses it in the #includes
.
It is important because the include paths of boost must be propagated when using popsift as third library. If the hosting application does not use boost (hence it has no findpackage(boost) in its cmake) it will only compile in the lucky case of boost being in system directories.
Does anyone have a setup to check quickly if this breaks PopSift in OpenMVG ? |
I know that sync_queue and sync_bounded_queue are experimental in Boost. sync_queue API seems stable, sync_bounded_queue does not. I could build my own versions, but it seems wasteful. I need extremely basic producer-consumer functionality. |
This is a big change. It changes how the popsift library should be called, but backward compatibility is mostly in place.
However, the performance and functionality gain a lot.
3.1. direct-downscaling to all octaves from the input image (with correct handling of initial blur).
3.2. downscaling according to VLFeat, OpenCV or Celebrandil (9-wide and 15-wide static filters).
3.3. descriptor extraction modes: "loop" horizontal striding like VLFeat, "iloop" interpolated horizontal striding, "grid" extract a fixed grid directly in the extrema's orientation, "igrid" the same with hardware interpolation, "notile" is similar to grid with fewer reads.
Tested on the complete "big_set" and on several subsets. Tested with three 1920x1080 images from flickr. Tested with highres CCTag test content that shows only tags (this is horribly slow).