-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mythtranscode: Fix #12602 by detecting write errors #163
Conversation
See also #164 for equivalent pull request against master. |
@@ -295,7 +295,11 @@ static void writeout_video(multiplex_t *mx) | |||
//estimate next pts based on bitrate of this stream and data written | |||
viu->dts = uptsdiff(viu->dts + ((nlength*viu->ptsrate)>>8), 0); | |||
|
|||
write(mx->fd_out, outbuf, written); | |||
if (write(mx->fd_out, outbuf, written) != written) { | |||
if (mx->error++ < 10) /* log only first few failures */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere you are increasing error count, you should do so ensuring there is no chance of wraparound in the 31 bits of the positive 32-bit integer return value. A very large file is not out of the question here, and we could get a very large number of errors should the disk be full, for instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have seen large error count in my log while testing, I think this was over 12GB of failed writes:
E MPEG2Replex mpeg2fix.cpp:504 (WaitBuffers) thread finished with 3576138 write errors
But the error count doesn't matter since in the end we bail out on any amount of errors. The odds of it rolling over to exactly zero are very small - only zero error count will keep the truncated file. If it rolls past zero, then we just log 10 more errors and don't log the actual much larger error count. Maybe simpler is just boolean mx->error = 1: we either have errors or we don't.
@@ -637,9 +641,16 @@ void finish_mpg(multiplex_t *mx) | |||
if (mx->otype == REPLEX_MPEG2) | |||
write(mx->fd_out, mpeg_end,4); | |||
|
|||
if (close(mx->fd_out) < 0) { | |||
mx->error++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another spot to protect from overflow
Ok, these last 2 tweaks skip 0 so error count is always true, even if it rolled over. And we succeed only on 0 and fail on any other error count. My preference would have been to just exit on first write failure so we don't waste time with more failed writes. But when I tried this, I just deadlocked the reader thread. Pthread expert might be able to abort both threads properly. But until then, this resolves bug #12602. #164 is available for same changes against master. After adjusting the relocated replex to external/replex, the same changes applied with no conflicts. Thanks for your consideration! |
I don't use 0.27 anymore, so we can drop this and just go with #164 on master for future versions instead. |
Fix for https://code.mythtv.org/trac/ticket/12602
Explanation of the patch:
the mx multiplex structure gets a new "int error", the error count.
write_out_packs and finish_mpg now return int, the error count (so 0 = success)
currently unused: callers test the error count instead
if write() fails, increment error count
log only first 10 failures so we don't flood syslog and cause it to drop messages
if close() fails, increment error count
copy error count to static variable and pass it to pthread_exit
capture error count in pthread_join and log/fail if non-zero
This all causes mythtv to keep the original recording if disk fills during transcode, instead of truncating the recording. Of course user will need to go free space larger than this recording for a successful transcode.
In mpeg2fix.cpp there is also a commented section where I tried to exit on first write failure. This just deadlocked the other thread. It might be possible to cancel the other thread if you wish to abort at first write error but I was unsure how to do this. As-is, we just continue to fail and return the failure count when the reader is done. I monitored with top and the size doesn't grow so I assume the data is just being dropped rather than filling memory, which is a good thing.