Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate the OS file-max limit in the VideoReader #1659

Merged
merged 2 commits into from
Jan 27, 2020
Merged

Mitigate the OS file-max limit in the VideoReader #1659

merged 2 commits into from
Jan 27, 2020

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Jan 14, 2020

  • uses the custom read-write-seek libav API that allows providing custom accessing function. This function opens the required file on demand. The previously accessed file is closed automatically.
  • it is faster than calling avformat_open_input after each file reopen
  • add test to check if DALI can handle more than 1024 video files at once

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It adds new feature needed because of VideoReader can easily hit the OS file-max limit of opened files (usually 1024)

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    Uses the custom read-write-seek libav API that allows providing custom accessing function. This function opens the required file on demand. The previously accessed file is closed automatically.
    It is faster than calling avformat_open_input after each file reopen
  • Affected modules and functionalities:
    VideoReader
  • Key points relevant for the review:
    get_or_open_file function
  • Validation and testing:
    Test is added
  • Documentation (including examples):
    NA

Related to #1651

JIRA TASK: []

@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 14, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1072512]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1072512]: BUILD PASSED

@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> {

volatile bool stop_;
std::vector<file_meta> file_info_;

constexpr static size_t MAX_OPEN_FILE = 200;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an educated guess limit, but maybe it should be just 1/5 of /proc/sys/fs/file-max or have some global counter. What if user opens files in the same process from python? Or spawns more than one VideoReader?
Any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have it as an argument to the reader so that the user has control of it. Also, I'd help with testing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe recently_opened_ and open_files_ should be global to all VideoLoaders - so the limit is global as well. If we have it per operator then I see someone creating multiple VideoReader with the limits that sum up above the OS limit and that leads to a crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand we need to have some locking and this can influence the overall perf.

@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> {

volatile bool stop_;
std::vector<file_meta> file_info_;

constexpr static size_t MAX_OPEN_FILE = 200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have it as an argument to the reader so that the user has control of it. Also, I'd help with testing this

@@ -117,3 +121,13 @@ def test_multi_gpu_video_pipeline():
for p in pipes:
p.build()
p.run()

# checks if the VideoReader can handle more than OS max open file limit of opened files at once
def test_plenty_video_files():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_plenty_video_files():
def test_plenty_of_video_files():

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# checks if the VideoReader can handle more than OS max open file limit of opened files at once
def test_plenty_video_files():
# make sure that there is one sequence per video file
pipe = VideoPipe(batch_size=BATCH_SIZE, data=PLENTY_VIDEO_FILES, step=1000000, sequence_length=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you incorporate my suggestion of having a configurable max number of open video files then you could test this easily by setting a small number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. This step is set to make sure we just read one frame per each video as we just want to traverse through all videos. Having a configurable max number won't help either as the real limit is the OS. All we need to test if the limit set inside DALI is smaller than the OS one it works (and it doesn't matter if it is OS limit - 1 or just 1).

@@ -295,6 +297,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> {

volatile bool stop_;
std::vector<file_meta> file_info_;

constexpr static size_t MAX_OPEN_FILE = 200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr static size_t MAX_OPEN_FILE = 200;
constexpr static size_t kMaxOpenFiles = 200;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

recently_opened_.push_back(filename);
if (recently_opened_.size() > MAX_OPEN_FILE) {
// close the oldest file
auto& file_to_close = open_files_[recently_opened_.front()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you remove the entry from open_files_ as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is enough to file_to_close.open = false.

@@ -24,7 +24,8 @@
#include <limits>
#include <sstream>

inline int gcd(int a, int b) {

static inline int gcd(int a, int b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? An inline function has internal linkage, no need for static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- introduces LRU to close opened VideoFiles just to prevent
  hitting the OS file-max limit
- when the number of the opened files crosses the threshold the
  oldest one is closed

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 17, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1078926]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1078926]: BUILD FAILED

if (ret == 0) {
return ftell(file_data->file_stream);
} else {
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ambiguous - the standard doesn't say that the value returned on error is negative. Perhaps we should throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the standard. It is what FFmpeg expects, and expects some error code which is < 0.

Copy link
Contributor

@mzient mzient Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that fseek does not guarantee a negative value on error - just nonzero. You probably should return -1 or -abs(err) to make sure the value is <0.

long int file_position;
std::string filename;
};
VideoFileData file_desc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: why not VideoFileDesc or file_data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -295,6 +304,8 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> {

volatile bool stop_;
std::vector<file_meta> file_info_;

constexpr static size_t kMaxOpenFiles = 200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#create dummy files for max_opened_file_test
for i in {0..1400};
do
ln -s ../video_files/${split[0]}_0.${split[1]} many_video_files/${split[0]}_$i.${split[1]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create these in /tmp perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 18, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1079892]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1079892]: BUILD PASSED

LOG_LINE << "Opening file " << filename << std::endl;

AVFormatContext* raw_fmt_ctx = nullptr;
int ret = avformat_open_input(&raw_fmt_ctx, filename.c_str(), NULL, NULL);
AVFormatContext* raw_fmt_ctx = avformat_alloc_context();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd create a temporary unique pointer with make_unique_av here to avoid leaking resources on failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

size_t avio_ctx_buffer_size = 4096;
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size));
if (!avio_ctx_buffer) {
DALI_FAIL("Failed to allocate avio_ctx_buffer");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_fmt_ctx is leaked here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

DALI_FAIL("Failed to allocate avio_alloc_context");
}
raw_fmt_ctx->pb = avio_ctx;
int ret = avformat_open_input(&raw_fmt_ctx, NULL, NULL, NULL);
if (ret < 0) {
DALI_FAIL(std::string("Could not open file ") + filename + " because of " + av_err2str(ret));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and also here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, avformat_open_input frees avio_ctx_buffer in case of failure.

@@ -73,7 +74,7 @@ std::vector<dali::file_meta> get_file_label_pair(const std::string& path,
} // namespace filesystem

struct OpenFile {
bool open = false;
bool loaded = false;
Copy link
Contributor

@mzient mzient Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this new name. The file is not open - sure, but much less is it loaded. Perhaps we could use one of the other fields to determine if the descriptor is empty?

bool empty() const noexcept { return file_desc.filename.empty(); }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -73,7 +74,7 @@ std::vector<dali::file_meta> get_file_label_pair(const std::string& path,
} // namespace filesystem

struct OpenFile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to VideoFile or similar, since it no longer represents an open file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -95,6 +96,13 @@ struct OpenFile {
AVCodecContext* codec;
#endif
av_unique_ptr<AVFormatContext> fmt_ctx_;

struct VideoFileDesc {
FILE *file_stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a destructor which would close those streams - and a close function. It may be either here (in which case I'd suggest making this struct standalone) or in the enclosing class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dali/operators/reader/loader/video_loader.cc Outdated Show resolved Hide resolved
// open file if needed
if (!file_data->file_stream) {
file_data->file_stream = fopen(file_data->filename.c_str(), "r");
if (!file_data->file_stream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a DALI_ENFORCE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
size_t avio_ctx_buffer_size = 4096;
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size));
if (!avio_ctx_buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a DALI_ENFORCE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

AVFormatContext* raw_fmt_ctx = nullptr;
int ret = avformat_open_input(&raw_fmt_ctx, filename.c_str(), NULL, NULL);
AVFormatContext* raw_fmt_ctx = avformat_alloc_context();
if (!raw_fmt_ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a DALI_ENFORCE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

DALI_FAIL("Failed to allocate avio_alloc_context");
}
raw_fmt_ctx->pb = avio_ctx;
int ret = avformat_open_input(&raw_fmt_ctx, NULL, NULL, NULL);
if (ret < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jantonguirao jantonguirao self-requested a review January 23, 2020 10:01
@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 23, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1087535]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1087535]: BUILD PASSED

struct OpenFile {
bool open = false;
struct VideoFileDesc {
FILE *file_stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FILE *file_stream;
FILE *file_stream = nullptr;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 23, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1087720]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1087720]: BUILD PASSED

// open file if needed
if (!file_data->file_stream) {
file_data->file_stream = fopen(file_data->filename.c_str(), "r");
DALI_ENFORCE(file_data->file_stream, std::string("Could not open file ") + file_data->filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you could use make_string("Could not open the file: ", file_data->filename)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

DALI_ENFORCE(raw_fmt_ctx, "Failed to allocate avformat_alloc_context");

size_t avio_ctx_buffer_size = 4096;
uint8_t *avio_ctx_buffer = static_cast<uint8_t *>(av_malloc(avio_ctx_buffer_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to wrap this one with a unique_ptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- uses the custom read-write-seek libav API that allows providing
  custom accessing function. This function opens the required file on
  demand. The previously accessed file is closed automatically.
- it is faster than calling avformat_open_input after each file reopen
- adds test to check if DALI can handle more than 1024 video files at once
- moves generated video test files into /tmp/ dir

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 27, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1093065]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1093065]: BUILD PASSED

@JanuszL JanuszL merged commit 2b06c0f into NVIDIA:master Jan 27, 2020
@JanuszL JanuszL deleted the mitigate_vid_file_limit branch January 27, 2020 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants