New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressed Seeking Quirks in FLAC Reader #1162
Conversation
I think it was added because m_clientData is only meant to contain the data that need to be shared between the reader instance and the FLAC callbacks, not persistent reader attributes. So strictly speaking, it was not redundancy but rather extra safety / clarity ;) But well, it's ok like this too, of course.
It's more than a "potential overflow", it is an overflow everytime this code gets called. Because if we get there, we've already checked that leftovers.size() was greater than maxCount (ie. the size of the destination buffer).
What happened exactly when the reader was asked to seek at EOF, before your fix? |
*data->buffer++ = sample; | ||
// If there's no output buffer, it means that we are seeking, so just skip the output write | ||
if (data->buffer) | ||
*data->buffer++ = sample; | ||
data->remaining--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the effect of decrementing data->remaining without adding data to the buffer? Well, is it useful at all to execute this statement if there's no buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the whole function returned early when data->buffer
was null. Since I removed that check, I figured I'd put a smaller one around the point where the buffer is actually dereferenced. I know that in practice, data->remaining
will always be 0 when data->buffer
is null (since that's the seek condition), and the data->leftovers.push_back(sample)
path should always be taken then. But I'm always wary of dereferencing a pointer that hasn't been checked locally. So I just covered the single dereferencing line of code with the new check. I didn't touch the decrement because that preserves the countdown to where it will start loading the samples into the always-valid leftover buffer. But that's a bit of moot point since that should never happen anyway.
Maybe I should change the surrounding if condition to if (data->buffer && data->remaining > 0)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should change the surrounding if condition to if (data->buffer && data->remaining > 0).
That's the way to go in my opinion. I'd also change the comments because now the else
block is also for the seeking case.
{ | ||
// The "write" callback will populate the leftovers buffer with the first batch of samples from the | ||
// seek destination, and since we want that data in this typical case, we don't re-clear it afterward | ||
FLAC__stream_decoder_seek_absolute(m_decoder, sampleOffset / m_clientData.info.channelCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight improvement: I'd compute and store (and comment) sampleOffset / m_clientData.info.channelCount
before the test, instead of duplicating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this purely for readability? The result of that division op is only used once in the if case and once in the mutually-exclusive else case. It's never repeated in any single call to seek()
, so I can't see it optimizing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it is for maintainability reasons, not for optimizing anything. If the expression ever has to change in the future, there will only be one place to modify, not two. Duplicated code, as small as it is, is never a good thing. Plus, it gives a name to the expression, making more obvious what it represents.
But now that I think about it, the else
bock should use m_clientData.info.sampleCount
, because the requested sample offset may be greater than that (and in this case, subtracting one sample still leads to EOF and the case that we're trying to avoid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. I think m_clientData.info.sampleCount
was my original intention, but that somehow got left out while I was cleaning up my implementation. I'll fix that.
The effect was described by @eXpl0it3r:
Under the hood, when the FLAC decoder is asked to seek to (or beyond) the EOF, it returns a failure code, and resets its internal state, effectively seeking to 0 of its own accord. Because of this, on its next read, (What happens now is that |
Ok I see. I feel like there's a cleaner way to avoid this issue, without writing specific code for this special case, but I can't find one so I'm fine with your fix. |
If you're talking about the special case of seeking to the EOF, there is one other thing I have in mind. Instead of rigging up a real "seek" to EOF, the reader could just seek to zero and set a flag that will cause the next But if you're talking about the error case(s) in general, I think the best permanent solution would be to change the call/return sequence from |
Yes, we definitely need more robust error handling there. |
cc25032
to
c3429de
Compare
I moved the buffer check, and tweaked the EOF seek to use the sample count. I didn't pursue the "EOF flag" idea that I brought up, as nobody weighed in on it. |
@@ -133,7 +133,6 @@ class SoundFileReaderFlac : public SoundFileReader | |||
//////////////////////////////////////////////////////////// | |||
FLAC__StreamDecoder* m_decoder; ///< FLAC decoder | |||
ClientData m_clientData; ///< Structure passed to the decoder callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that m_channelCount is gone, you can remove two spaces before the comments XD
c3429de
to
697df7c
Compare
@Cobaltergeist Can you rebase this onto master? |
697df7c
to
5eaa748
Compare
Rebased. |
Good for me, if this has been tested thoroughly. |
|
Does this depend on your previous PR? We weren't able to include that in 2.4.1, because it added (or removed) functions, thus changing the ABI. If it doesn't depend on your previous changes, could you rebase the PR onto 2.4.x instead of master? |
Wait, I thought my previous change was in 2.4.1. I guess it was changed since I last looked at it because it affected the ABI. Technically this is a non-conflicting change, but it still makes the most sense alongside #1118 even if it's not a strict prerequisite. So forget my last comment, we can keep this branch as-is. That said, why is #966 in 2.4.2 if the PRs resolving it are in 2.5.0? |
Because it got added without knowing what it depends on. 😉 |
Here's another fix for a bug I've identified while working on #1118. That PR seemed to fix the behavior of WAV and OGG files, but FLAC was still behaving oddly. I did some digging and figured out why:
The problem was that the underlying FLAC decoder has 2 seeking quirks that
SoundFileReaderFlac::seek()
andstreamWrite()
overlook.First, every seek operation triggers a
streamWrite()
callback. Previously, this case would simply cause the function to return, but it turns out that it actually needs to do some work. When seeking, you'll land mid-block more often than not. And when that happens, the decoder feeds all the samples from the seek destination to the end of its block intostreamWrite()
. Ignoring those bytes is incorrect, as the next read call actually starts at the beginning of the next block. By skipping this partial block, we leave the stream at a later point than higher-level trackers (namelyInputSoundFile
) expect it to be. This causes the "early" short-read. To fix this I changedstreamWrite()
to act on these seek-decodes, and dump the output samples into theleftovers
vector, where they will be picked up by the nextSoundFileReaderFlac::read()
.Second, the FLAC decoder does not like seeking straight to EOF. The workaround is easy. I just seek to one sample before the EOF, call
FLAC__stream_decoder_skip_single_frame()
, and re-clear theleftovers
vector. In the process of fixing this one, I noticed that them_channelCount
variable added to address part of #966 is actually redundant. I removed it in favor of the already-availablem_clientData.info.channelCount
.TL;DR, FLACs play fine from the start, but if you start jumping around, samples get dropped. And if you try to overrun the file, the underlying decoder resets and you start inadvertently playing sound from the beginning of the stream when you think you're at the end.
Overall, this is a smaller change, so hopefully it won't be too tough to review. I think I also found a small potential buffer overrun in an obscure condition within
SoundFileReaderFlac::read
. That code seemingly has the potential to overflowsamples
in the unlikely case where theleftovers
buffer is larger thanmaxCount
. It's a tiny tweak to fix, so I went ahead and did that.