Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for device selection and multiple GPUs #121

Merged
merged 2 commits into from
Jan 5, 2021

Conversation

mitjap
Copy link
Contributor

@mitjap mitjap commented Nov 20, 2020

Description

This pull request enables user to select on which GPU algorithm should run and makes it possible to run on multiple GPUs.

Features list

  • Select CUDA device
  • Ability to run on multiple CUDA devices

Implementation remarks

For use of multiple GPUs this implementation requires multiple PopSift instances. Main issue was that algorithm uses global state with extern variables. I made those thread_local which enables each thread to have its own value for specific device.

I have not tested matching (ProcessingMode == MatchingMode).

@mitjap
Copy link
Contributor Author

mitjap commented Nov 20, 2020

In case you decide not to accept this PR, you should at least fix this very minor memory leak.

cudaFree( _d_extrema_num_blocks );

Copy link
Member

@griwodz griwodz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not certain if the thread_local can fail for anybody. It can probably not fail for hct, hbuf and so on because those are only used in a thread spawned by PopSift.

I understand that the thread_local forces you to init the filter and the configuration in the extraction/match threads. That leads to more frequent configuration calls, is that problematic?

@@ -313,14 +339,21 @@ void PopSift::extractDownloadLoop( )

job->setFeatures( features );
}

private_unit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correct that you want to delete the Pyramid every time you have downloaded the features? That would crash if several images have been queued for feature extraction, wouldn't it?

Copy link
Contributor Author

@mitjap mitjap Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyramid is deleted only after pipeline stops using PopSift::uninit function. Note that private_unit (this is actually a typo and should be private_uninit) is called outside while loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add private_uninit also to the matchPrepareLoop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.

{
cudaSetDevice(_device);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good thing to do. Perhaps with an error check when a user chooses an non-existant device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Do you think using device_prop_t::set(int, bool) would be good? Another solution would be manual check with

POP_CUDA_FATAL_TEST( cudaSetDevice( currentDevice ), "Cannot set device" );

or maybe just POP_CHK.

@mitjap
Copy link
Contributor Author

mitjap commented Nov 20, 2020

I understand that the thread_local forces you to init the filter and the configuration in the extraction/match threads. That leads to more frequent configuration calls, is that problematic?

Filter function checks if configuration differs so I don't expect it to call any CUDA functions.

Second applyConfiguration() which is within while loop is to support following usage:

PopSift sift(PopSift::ByteImages, device); // initializes with default configuration in spawned thread
sift.configure(config); // this configuration is applied when first image is enqueued.

@mitjap
Copy link
Contributor Author

mitjap commented Nov 20, 2020

I am not certain if the thread_local can fail for anybody. It can probably not fail for hct, hbuf and so on because those are only used in a thread spawned by PopSift.

I'm sorry I don't quite understand what it is you want to say here.

@mitjap
Copy link
Contributor Author

mitjap commented Nov 20, 2020

For more robust interface maybe there should be another call to cudaSetDevice(_device) in PopSift::uninit function so that we make sure that device images are properly deleted.

@mitjap mitjap requested a review from griwodz November 24, 2020 10:42
Copy link
Member

@griwodz griwodz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that it took me so long before I could review the code. I hope that your improvements have been working well for you.

I have two remaining change requests before approving:
(1) I think that private_unit() should also be called at the end of matchPrepareLoop()
(2) private_uninit would be a better name than private_unit

@simogasp
Copy link
Member

@mitjap if u can also please update CHANGES.md under v1.0.0 and add one line with the content of this PR

@griwodz
Copy link
Member

griwodz commented Jan 4, 2021

@simogasp Should I merge this PR and follow up with the additional fixes? Can't push to the original branch.

@mitjap
Copy link
Contributor Author

mitjap commented Jan 4, 2021

If by "pushing to original branch" you mean my branch I think I enabled pushing for popsift maintainers. I can make code changes as requested but at the moment I don't have the time to test it properly.

@griwodz
Copy link
Member

griwodz commented Jan 4, 2021

@mitjap Thanks, I'll give it another try tomorrow. I had cloned your repo and tried to push the uninit in match and it didn't work, but I may have made some other mistake.

@mitjap
Copy link
Contributor Author

mitjap commented Jan 4, 2021

I made the requested changes to the code.

Copy link
Member

@griwodz griwodz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fixes!

@griwodz griwodz merged commit 5bbd332 into alicevision:develop Jan 5, 2021
@mitjap mitjap deleted the multi_gpu branch January 5, 2021 13:49
@simogasp simogasp modified the milestones: v1.0.0, v0.9.1 Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda issues related to cuda versions type:enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants