Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

av.read() exit gracefully on corrupt files? #485

Closed
bjuncek opened this issue Feb 11, 2019 · 9 comments
Closed

av.read() exit gracefully on corrupt files? #485

bjuncek opened this issue Feb 11, 2019 · 9 comments

Comments

@bjuncek
Copy link

bjuncek commented Feb 11, 2019

I'm trying to load some videos using av.read and I am aware that dataset contains corrupt files. Most corrupt files can get filtered out by a simple trycatch as

# minimal reproducible example
for d in input_files[7161:7163]:
    try:
        vr = av.open(d)
    except:
        print("Skipping file {} - might be corrupted".format(d))

however, on a several, the file crashes with

Invalid NAL unit size (54396 > 13392).
Error splitting the input into NAL units.
Could not find codec parameters for stream 0 (Video: h264 (avc1 / 0x31637661), none, 640x360, 750 kb/s): unspecified pixel format

which is not caught in the exception, but rather completely breaks the code.
Is there a nicer way to gracefully exit or skip files upon this error?

I could do a binary search to find the videos in question but the dataset is quite large (2M examples) and it would take hours do do it by hand.

NOTE: trying to catch it with

except av.AVError:

fails as well.

@mikeboers
Copy link
Member

Could you clarify what "crash" and "breaks the code" means? Does the Python interpreter exit? Or... does Python continue on as if there is no error?

@bjuncek
Copy link
Author

bjuncek commented Feb 14, 2019

There are two scenarios:

  1. if I try to catch it in the error, without exception definition so just as
    except: some code, the code runs (although it prints the error above, the NAL file issue), but the output is nonsense, and if used within a larger framework such as pytorch the error is
 File "av/utils.pyx", line 27, in av.utils.AVError.__init__
TypeError: __init__() takes at least 3 positional arguments (2 given)
  1. If I try and catch it with av.AVError, it fails immediately with
    TypeError: __init__() takes at least 3 positional arguments (2 given)

@mikeboers
Copy link
Member

Interesting. It looks like the AVError construction is wrong. Can you post the full traceback?

@fmassa
Copy link

fmassa commented Feb 22, 2019

@mikeboers the constructor raises this error because av.AVError expects at least 2 arguments https://github.com/mikeboers/PyAV/blob/db30136b75e63d97cb8cb1a2e9ea4ef7ff777e2d/av/utils.pyx#L27
while in @bjuncek case, he was using PyTorch DataLoader, which wraps the exception and only passes a single argument to it, which is a string representing the full stack trace, see here and here for reference.

If one could make the constructor of AVError to accept a single argument (the message, as the first argument), then this would avoid @bjuncek errors that he mentioned. But this might be a BC-breaking change.

@jlaine
Copy link
Member

jlaine commented May 5, 2019

I'm not sure I understand how PyTorch factors into this: is PyAV incorrectly constructing the AVError or is PyTorch?

Settings PyTorch aside, could we have a sample file which fails without properly raising an AVError?

@fmassa
Copy link

fmassa commented May 5, 2019

The problem is in PyTorch: it captures the exception message, and raises it afterwards, but only passing a single argument. I don't believe this is really a problem with PyAV, but more a limitation on how PyTorch assumes that exceptions should have their error message as the first argument

@fmassa
Copy link

fmassa commented May 5, 2019

Note that my last link above does not correspond to the right place anymore, here is the correct line now https://github.com/pytorch/pytorch/blob/863818e05a80b970e58a9dbce09c114ecbc6879e/torch/utils/data/dataloader.py#L608

@jlaine
Copy link
Member

jlaine commented May 5, 2019

I'm sorry if I'm a bit slow here, but I don't understand the relation between PyTorch and PyAV : where is the code that uses DataLoader with PyAV?

@mikeboers
Copy link
Member

I'm not following either, and it has been a long time. Re-open if you can show us what to change with our exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants