Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix numpy reader crash #4466

Merged
merged 5 commits into from
Nov 29, 2022
Merged

Fix numpy reader crash #4466

merged 5 commits into from
Nov 29, 2022

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Nov 24, 2022

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

The new libcufile doesn't exit cleanly (for some reasons) and has problems with library use count tracking.
This PR does the following:

  • it improves the stub generator to support functions that are redefined with C macros
  • it adds support for cuFileDriverClose_v2
  • it uses library reference counting when the new library is detected - it happens in GDSMem and in NumpyReaderGPU
  • it deletes the pipelines in the test so that the pipeline doesn't linger when the temporary directory with the test data goes out of scope and is deleted

Additional information:

Affected modules and functionalities:

NumpyReaderGPU
dynlink stubs

Key points relevant for the review:

Tests:

GDSMemTest (GTest)
NumpyReader tests (reader/test_numpy.py)

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@@ -16,6 +16,7 @@
#define DALI_IMGCODEC_DECODERS_MEMORY_POOL_H_

#include <atomic>
#include <array>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed to compile with libc++ 12

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
… test.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591353]: BUILD STARTED

@@ -43,6 +44,12 @@ NumpyReaderGPU::NumpyReaderGPU(const OpSpec& spec)
kmgr_transpose_.Resize<TransposeKernel>(1);
}

NumpyReaderGPU::~NumpyReaderGPU() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved as-is from the header.


void Close() {
if ((fd != -1) && (fdd != -1))
if (cufh) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the real value we're interested in instead of relying on side effects.

@@ -147,6 +147,11 @@ MmapedFileStream::MmapedFileStream(const std::string& path, bool read_ahead) :
DALI_ENFORCE(p_ != nullptr, "Could not open file " + path + ": " + std::strerror(errno));
}

MmapedFileStream::~MmapedFileStream() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from the header.

// set the path to current path
path_ = path;
}

StdCUFileStream::~StdCUFileStream() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from the header.

@@ -29,6 +29,11 @@ StdFileStream::StdFileStream(const std::string& path) : FileStream(path) {
DALI_ENFORCE(fp_ != nullptr, "Could not open file " + path + ": " + std::strerror(errno));
}

StdFileStream::~StdFileStream() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from the header.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591353]: BUILD FAILED

Comment on lines -100 to -103
with open(cursor.location.file.name, 'r', encoding='latin-1') as file:
start = cursor.extent.start.offset
end = cursor.extent.end.offset
declaration = file.read()[start:end]
Copy link
Contributor Author

@mzient mzient Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copied the declaration from the original file - it didn't work correctly, because there might have been some macros - and they could be used in different order, yielding different results:

#define cuFileDriverClose cuFileDriverClose_v2
CUFileError_t cuFileDriverClose();

or

CUFileError_t cuFileDriverClose_v2();
#define cuFileDriverClose cuFileDriverClose_v2

These would produce the same preprocessed output, but the declaration would be different.

Comment on lines +27 to +30
def function_header(return_type, name, args):
arg_str = ", ".join([f"{arg_type} {arg_name}" for arg_type, arg_name in args])
ret = f"{return_type} {name}({arg_str})"
return ret
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreate the function header with the actual function name (punching through macros).

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591414]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591873]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591414]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6591873]: BUILD PASSED

std::swap(fd, other.fd);
std::swap(fdd, other.fdd);
std::swap(cufh, other.cufh);
other.Close();
Copy link
Collaborator

@banasraf banasraf Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to explicitly call Close? Wouldn't it just be called when destructing other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read somewhere that implementing move assignment as a pure swap isn't a good practice, because it prolongs the life of the object being overwritten in a surprising way. This is especially important in case of resources like files, which have global side effects. Consider:

File a = File::open("a.dat", "rw");
File b = File::open("b.dat", "rw");

a = std::move(b);  // the user would expect that "a.dat" is closed, but this assignment actually did a swap
File c = File::open("a.dat", "rw");  // file busy, because the moved-out object `b` is keeping the file open.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. That's interesting. I didn't know how the move works, then. I thought b would be destroyed just after evaluation of this assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We learn something new every day ;) The object still "lives", even though it shouldn't be used for anything other than destruction. I'm not sure if the compiler is allowed to destroy a moved-out local variable before the scope end, but there are certainly cases where you can move-out objects with more permanent storage - so they cannot be destroyed (calling a destructor multiple times is an error). For example, you can loop over an array and move every element from it to some other container and then destroy the array. The destructors will be called when the array is destroyed, not when each object in the array is moved out.

@mzient mzient merged commit c8ecda0 into NVIDIA:main Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants