-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate the OS file-max limit in the VideoReader #1659
Conversation
!build |
CI MESSAGE: [1072512]: BUILD STARTED |
CI MESSAGE: [1072512]: BUILD PASSED |
@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> { | |||
|
|||
volatile bool stop_; | |||
std::vector<file_meta> file_info_; | |||
|
|||
constexpr static size_t MAX_OPEN_FILE = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an educated guess limit, but maybe it should be just 1/5 of /proc/sys/fs/file-max or have some global counter. What if user opens files in the same process from python? Or spawns more than one VideoReader?
Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have it as an argument to the reader so that the user has control of it. Also, I'd help with testing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe recently_opened_
and open_files_
should be global to all VideoLoaders - so the limit is global as well. If we have it per operator then I see someone creating multiple VideoReader with the limits that sum up above the OS limit and that leads to a crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand we need to have some locking and this can influence the overall perf.
@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> { | |||
|
|||
volatile bool stop_; | |||
std::vector<file_meta> file_info_; | |||
|
|||
constexpr static size_t MAX_OPEN_FILE = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have it as an argument to the reader so that the user has control of it. Also, I'd help with testing this
@@ -117,3 +121,13 @@ def test_multi_gpu_video_pipeline(): | |||
for p in pipes: | |||
p.build() | |||
p.run() | |||
|
|||
# checks if the VideoReader can handle more than OS max open file limit of opened files at once | |||
def test_plenty_video_files(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_plenty_video_files(): | |
def test_plenty_of_video_files(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# checks if the VideoReader can handle more than OS max open file limit of opened files at once | ||
def test_plenty_video_files(): | ||
# make sure that there is one sequence per video file | ||
pipe = VideoPipe(batch_size=BATCH_SIZE, data=PLENTY_VIDEO_FILES, step=1000000, sequence_length=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you incorporate my suggestion of having a configurable max number of open video files then you could test this easily by setting a small number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. This step
is set to make sure we just read one frame per each video as we just want to traverse through all videos. Having a configurable max number won't help either as the real limit is the OS. All we need to test if the limit set inside DALI is smaller than the OS one it works (and it doesn't matter if it is OS limit - 1 or just 1).
@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> { | |||
|
|||
volatile bool stop_; | |||
std::vector<file_meta> file_info_; | |||
|
|||
constexpr static size_t MAX_OPEN_FILE = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr static size_t MAX_OPEN_FILE = 200; | |
constexpr static size_t kMaxOpenFiles = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
recently_opened_.push_back(filename); | ||
if (recently_opened_.size() > MAX_OPEN_FILE) { | ||
// close the oldest file | ||
auto& file_to_close = open_files_[recently_opened_.front()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you remove the entry from open_files_
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is enough to file_to_close.open = false
.
@@ -24,7 +24,8 @@ | |||
#include <limits> | |||
#include <sstream> | |||
|
|||
inline int gcd(int a, int b) { | |||
|
|||
static inline int gcd(int a, int b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? An inline function has internal linkage, no need for static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- introduces LRU to close opened VideoFiles just to prevent hitting the OS file-max limit - when the number of the opened files crosses the threshold the oldest one is closed Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [1078926]: BUILD STARTED |
CI MESSAGE: [1078926]: BUILD FAILED |
if (ret == 0) { | ||
return ftell(file_data->file_stream); | ||
} else { | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ambiguous - the standard doesn't say that the value returned on error is negative. Perhaps we should throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the standard. It is what FFmpeg expects, and expects some error code which is < 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that fseek
does not guarantee a negative value on error - just nonzero. You probably should return -1 or -abs(err)
to make sure the value is <0.
long int file_position; | ||
std::string filename; | ||
}; | ||
VideoFileData file_desc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: why not VideoFileDesc
or file_data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -295,6 +304,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> { | |||
|
|||
volatile bool stop_; | |||
std::vector<file_meta> file_info_; | |||
|
|||
constexpr static size_t kMaxOpenFiles = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
qa/TL0_videoreader_test/test.sh
Outdated
#create dummy files for max_opened_file_test | ||
for i in {0..1400}; | ||
do | ||
ln -s ../video_files/${split[0]}_0.${split[1]} many_video_files/${split[0]}_$i.${split[1]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create these in /tmp perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [1079892]: BUILD STARTED |
CI MESSAGE: [1079892]: BUILD PASSED |
LOG_LINE << "Opening file " << filename << std::endl; | ||
|
||
AVFormatContext* raw_fmt_ctx = nullptr; | ||
int ret = avformat_open_input(&raw_fmt_ctx, filename.c_str(), NULL, NULL); | ||
AVFormatContext* raw_fmt_ctx = avformat_alloc_context(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd create a temporary unique pointer with make_unique_av
here to avoid leaking resources on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
size_t avio_ctx_buffer_size = 4096; | ||
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size)); | ||
if (!avio_ctx_buffer) { | ||
DALI_FAIL("Failed to allocate avio_ctx_buffer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_fmt_ctx is leaked here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
DALI_FAIL("Failed to allocate avio_alloc_context"); | ||
} | ||
raw_fmt_ctx->pb = avio_ctx; | ||
int ret = avformat_open_input(&raw_fmt_ctx, NULL, NULL, NULL); | ||
if (ret < 0) { | ||
DALI_FAIL(std::string("Could not open file ") + filename + " because of " + av_err2str(ret)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and also here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, avformat_open_input frees avio_ctx_buffer in case of failure.
@@ -73,7 +74,7 @@ std::vector<dali::file_meta> get_file_label_pair(const std::string& path, | |||
} // namespace filesystem | |||
|
|||
struct OpenFile { | |||
bool open = false; | |||
bool loaded = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this new name. The file is not open - sure, but much less is it loaded. Perhaps we could use one of the other fields to determine if the descriptor is empty?
bool empty() const noexcept { return file_desc.filename.empty(); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -73,7 +74,7 @@ std::vector<dali::file_meta> get_file_label_pair(const std::string& path, | |||
} // namespace filesystem | |||
|
|||
struct OpenFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this to VideoFile
or similar, since it no longer represents an open file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -95,6 +96,13 @@ struct OpenFile { | |||
AVCodecContext* codec; | |||
#endif | |||
av_unique_ptr<AVFormatContext> fmt_ctx_; | |||
|
|||
struct VideoFileDesc { | |||
FILE *file_stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a destructor which would close those streams - and a close
function. It may be either here (in which case I'd suggest making this struct standalone) or in the enclosing class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// open file if needed | ||
if (!file_data->file_stream) { | ||
file_data->file_stream = fopen(file_data->filename.c_str(), "r"); | ||
if (!file_data->file_stream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a DALI_ENFORCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
size_t avio_ctx_buffer_size = 4096; | ||
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size)); | ||
if (!avio_ctx_buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a DALI_ENFORCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AVFormatContext* raw_fmt_ctx = nullptr; | ||
int ret = avformat_open_input(&raw_fmt_ctx, filename.c_str(), NULL, NULL); | ||
AVFormatContext* raw_fmt_ctx = avformat_alloc_context(); | ||
if (!raw_fmt_ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a DALI_ENFORCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
DALI_FAIL("Failed to allocate avio_alloc_context"); | ||
} | ||
raw_fmt_ctx->pb = avio_ctx; | ||
int ret = avformat_open_input(&raw_fmt_ctx, NULL, NULL, NULL); | ||
if (ret < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [1087535]: BUILD STARTED |
CI MESSAGE: [1087535]: BUILD PASSED |
struct OpenFile { | ||
bool open = false; | ||
struct VideoFileDesc { | ||
FILE *file_stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FILE *file_stream; | |
FILE *file_stream = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [1087720]: BUILD STARTED |
CI MESSAGE: [1087720]: BUILD PASSED |
// open file if needed | ||
if (!file_data->file_stream) { | ||
file_data->file_stream = fopen(file_data->filename.c_str(), "r"); | ||
DALI_ENFORCE(file_data->file_stream, std::string("Could not open file ") + file_data->filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you could use make_string("Could not open the file: ", file_data->filename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
DALI_ENFORCE(raw_fmt_ctx, "Failed to allocate avformat_alloc_context"); | ||
|
||
size_t avio_ctx_buffer_size = 4096; | ||
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to wrap this one with a unique_ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- uses the custom read-write-seek libav API that allows providing custom accessing function. This function opens the required file on demand. The previously accessed file is closed automatically. - it is faster than calling avformat_open_input after each file reopen - adds test to check if DALI can handle more than 1024 video files at once - moves generated video test files into /tmp/ dir Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [1093065]: BUILD STARTED |
CI MESSAGE: [1093065]: BUILD PASSED |
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Uses the custom read-write-seek libav API that allows providing custom accessing function. This function opens the required file on demand. The previously accessed file is closed automatically.
It is faster than calling avformat_open_input after each file reopen
VideoReader
get_or_open_file function
Test is added
NA
Related to #1651
JIRA TASK: []