Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework DeviceGuard to restore original context upon the exit #882

Merged
merged 1 commit into from May 29, 2019
Merged

Rework DeviceGuard to restore original context upon the exit #882

merged 1 commit into from May 29, 2019

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented May 14, 2019

  • some libraries are not using PrimiarContext while DALI does. When
    cudaSetDevice is called PrimaryContext is created and is set as
    the current one, the old one is lost while other apps may still need it.
    Adds saving of current context and restores it when DeviceGuard
    is destroyed
  • reduced usage of CUContext as it is not needed in most cases

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [739710]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [739710]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [741836]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [741836]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [742169]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [742169]: BUILD PASSED

@@ -22,7 +22,7 @@ namespace dali {
class CUContext {
public:
CUContext();
explicit CUContext(CUdevice device, unsigned int flags = 0);
explicit CUContext(int device_id_, unsigned int flags = 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it. You're mixing runtime and driver APIs at CUContext API level. It's very likely to cause trouble unless CUdevice and device ordinal from CUDA runtime are the same (I don't think they are).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing CUContext as it is no longer needed.

@JanuszL
Copy link
Contributor Author

JanuszL commented May 24, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [759101]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [759101]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [759125]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [759125]: BUILD PASSED

}
CUDA_CALL(cudaEventDestroy(master_event_));
NVJPEG_CALL(nvjpegDestroy(handle_));
} catch (...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything we throw inherits from std::exception, so we can leverage it and print some meaningful diagnostic before aborting.

Suggested change
} catch (...) {
} catch (const std::exception &e) {
std::cerr << "Fatal error: exception in ~nvJPEGDecoder():\n" << e.what() << std::endl;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
#pragma GCC diagnostic pop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1886,9 +1886,9 @@ extern tcuMemcpy *cuMemcpy;
extern tcuMemcpyPeer *cuMemcpyPeer;
#endif

extern tcuCtxCreate *cuCtxCreate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to where other Ctx functions are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JanuszL
Copy link
Contributor Author

JanuszL commented May 28, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [761644]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [761644]: BUILD PASSED

int count = 1;

cuInitChecked();
cudaGetDeviceCount(&count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be beneficial to error check all the calls as well?
CUDA_CALL(cuda*) and CUDA_CALL(cu*)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

CUDA_CALL(cudaGetDevice(&original_device_));
}
DeviceGuard();

/// @brief Saves current device id, sets a new one and switches back
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should write here that it's supposed to be no-op for -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update docs/comments, otherwise looks ok.

@JanuszL JanuszL requested a review from klecki May 28, 2019 16:04
@JanuszL
Copy link
Contributor Author

JanuszL commented May 29, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [762911]: BUILD STARTED

- some libraries are not using PrimiarContext while DALI does. When
  cudaSetDevice is called PrimaryContext is created and is set as
  the current one, the old one is lost while other apps may still need it.
  Adds saving of current context and restores it when DeviceGuard
  is destroyed
- removed CUContext as it is not needed and can be replaced by the DeviceGuard

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented May 29, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [762930]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [762911]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [762930]: BUILD PASSED

@JanuszL JanuszL merged commit f7eb22b into NVIDIA:master May 29, 2019
@JanuszL JanuszL deleted the rework_guard_next branch May 29, 2019 09:26
haoxintong pushed a commit to haoxintong/DALI that referenced this pull request Jul 16, 2019
)

- some libraries are not using PrimiarContext while DALI does. When
  cudaSetDevice is called PrimaryContext is created and is set as
  the current one, the old one is lost while other apps may still need it.
  Adds saving of current context and restores it when DeviceGuard
  is destroyed
- removed CUContext as it is not needed and can be replaced by the DeviceGuard

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants