Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mythtranscode: Fix #12602 by detecting write errors #163

Closed
wants to merge 8 commits into from

Conversation

twitham1
Copy link
Contributor

Fix for https://code.mythtv.org/trac/ticket/12602

Explanation of the patch:

  • multiplex.h:
    the mx multiplex structure gets a new "int error", the error count.
    write_out_packs and finish_mpg now return int, the error count (so 0 = success)
    currently unused: callers test the error count instead
  • multiplex.c:
    if write() fails, increment error count
    log only first 10 failures so we don't flood syslog and cause it to drop messages
    if close() fails, increment error count
  • mpeg2fix.cpp:
    copy error count to static variable and pass it to pthread_exit
    capture error count in pthread_join and log/fail if non-zero

This all causes mythtv to keep the original recording if disk fills during transcode, instead of truncating the recording. Of course user will need to go free space larger than this recording for a successful transcode.

In mpeg2fix.cpp there is also a commented section where I tried to exit on first write failure. This just deadlocked the other thread. It might be possible to cancel the other thread if you wish to abort at first write error but I was unsure how to do this. As-is, we just continue to fail and return the failure count when the reader is done. I monitored with top and the size doesn't grow so I assume the data is just being dropped rather than filling memory, which is a good thing.

@twitham1 twitham1 changed the title Fix #12602 by detecting write errors mythtranscode: Fix #12602 by detecting write errors Mar 28, 2018
@twitham1
Copy link
Contributor Author

See also #164 for equivalent pull request against master.

@@ -295,7 +295,11 @@ static void writeout_video(multiplex_t *mx)
//estimate next pts based on bitrate of this stream and data written
viu->dts = uptsdiff(viu->dts + ((nlength*viu->ptsrate)>>8), 0);

write(mx->fd_out, outbuf, written);
if (write(mx->fd_out, outbuf, written) != written) {
if (mx->error++ < 10) /* log only first few failures */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere you are increasing error count, you should do so ensuring there is no chance of wraparound in the 31 bits of the positive 32-bit integer return value. A very large file is not out of the question here, and we could get a very large number of errors should the disk be full, for instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have seen large error count in my log while testing, I think this was over 12GB of failed writes:

E MPEG2Replex mpeg2fix.cpp:504 (WaitBuffers) thread finished with 3576138 write errors

But the error count doesn't matter since in the end we bail out on any amount of errors. The odds of it rolling over to exactly zero are very small - only zero error count will keep the truncated file. If it rolls past zero, then we just log 10 more errors and don't log the actual much larger error count. Maybe simpler is just boolean mx->error = 1: we either have errors or we don't.

@@ -637,9 +641,16 @@ void finish_mpg(multiplex_t *mx)
if (mx->otype == REPLEX_MPEG2)
write(mx->fd_out, mpeg_end,4);

if (close(mx->fd_out) < 0) {
mx->error++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another spot to protect from overflow

@twitham1
Copy link
Contributor Author

twitham1 commented Mar 30, 2018

Ok, these last 2 tweaks skip 0 so error count is always true, even if it rolled over. And we succeed only on 0 and fail on any other error count.

My preference would have been to just exit on first write failure so we don't waste time with more failed writes. But when I tried this, I just deadlocked the reader thread. Pthread expert might be able to abort both threads properly. But until then, this resolves bug #12602.

#164 is available for same changes against master. After adjusting the relocated replex to external/replex, the same changes applied with no conflicts.

Thanks for your consideration!

@twitham1
Copy link
Contributor Author

twitham1 commented Jul 8, 2020

I don't use 0.27 anymore, so we can drop this and just go with #164 on master for future versions instead.

@twitham1 twitham1 closed this Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants