New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made WAV file reader no longer assume that data chunk goes till end of file to prevent reading trailing metadata as samples. #1018

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@FRex
Contributor

FRex commented Dec 9, 2015

Some programs (FL Studio) make wav files that have trailing metadata (program name, ect.) at end of file, so when SFML assumed that samples go till end of file, it read that metadata as samples which caused stuttering and OpenAL errors.
See http://en.sfml-dev.org/forums/index.php?topic=19381.0

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Dec 9, 2015

Member

Since m_dataStart + m_sampleCount * m_bytesPerSample is a constant, I would rather compute and store it (as m_dataEnd) instead of storing m_sampleCount.

And to be perfectly safe, we should check that the current position in the stream allows to read one more full sample, not just one more byte.

while (... && (m_stream->tell() + m_bytesPerSample <= m_dataEnd))
Member

LaurentGomila commented Dec 9, 2015

Since m_dataStart + m_sampleCount * m_bytesPerSample is a constant, I would rather compute and store it (as m_dataEnd) instead of storing m_sampleCount.

And to be perfectly safe, we should check that the current position in the stream allows to read one more full sample, not just one more byte.

while (... && (m_stream->tell() + m_bytesPerSample <= m_dataEnd))
@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Dec 9, 2015

Contributor

This check is slightly wrong.
It'd only make a difference in situations where stream is positioned in middle of a multi byte sample. If that happens every single thing we read is corrupt already so that's a huge bug, not something to account for. In correct situation it's the same as my check.
Also, it's harmless to enter that loop body too many times (ie.: a corrupt file that has reported data section that goes past the end of file), there are check to see if the bytes are actually read because that's how it worked before - read data in bytesPerSample pieces till a read fails or the buffer is full.

Contributor

FRex commented Dec 9, 2015

This check is slightly wrong.
It'd only make a difference in situations where stream is positioned in middle of a multi byte sample. If that happens every single thing we read is corrupt already so that's a huge bug, not something to account for. In correct situation it's the same as my check.
Also, it's harmless to enter that loop body too many times (ie.: a corrupt file that has reported data section that goes past the end of file), there are check to see if the bytes are actually read because that's how it worked before - read data in bytesPerSample pieces till a read fails or the buffer is full.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Dec 9, 2015

Member

It makes a difference in case the chunk's size is not, for some reason, an exact multiple of m_bytesPerSample. And the decode function won't report any error if another chunk follows instead of EOF, as in the situation that your patch addresses.

Sure this is not supposed to happen if the file is well-formed, but I'd rather be on the safe side, and I think it's clearer to write code that says "if I can read the next sample, read the next sample" rather than "if I can read at least one byte, read the next sample". That's what I always do whenever I have to write a reading loop, without having to think about what might or might not happen.

Member

LaurentGomila commented Dec 9, 2015

It makes a difference in case the chunk's size is not, for some reason, an exact multiple of m_bytesPerSample. And the decode function won't report any error if another chunk follows instead of EOF, as in the situation that your patch addresses.

Sure this is not supposed to happen if the file is well-formed, but I'd rather be on the safe side, and I think it's clearer to write code that says "if I can read the next sample, read the next sample" rather than "if I can read at least one byte, read the next sample". That's what I always do whenever I have to write a reading loop, without having to think about what might or might not happen.

@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Dec 9, 2015

Contributor

It does not because we get samples count by integer division of subchunk size by bytes per sample so the result is floored. Then we multiply that by bytes per sample to get real data size so the data end is always multiple of bytes per sample away from data start (data end = data start + bytes per sample + floor(subchunksize/bytes per sample) ). So stream position (since stream reads* in full chunks of bytes per sample size) is also some multiple of bytes per sample away from end and start of data - always. Our data end is always less (weird subchunk size) or equal (good file) to data start + subchunk size.

*About that.. it doesn't actually. If a read reads less bytes than requested then it doesn't seek the stream back or save these bytes for later try or anything - they are thrown away. It's like the code in decode function assumes that it either reads as much as is requested or it reached the end of file (is that wishful thinking or is that a silent property required of all input streams?). It'd be a problem if there was network stream, non blocking disc stream or something similar that can read less than requested in a 'normal' situation. Then your check would apply, but all next samples would be corrupt because of this offset.

Contributor

FRex commented Dec 9, 2015

It does not because we get samples count by integer division of subchunk size by bytes per sample so the result is floored. Then we multiply that by bytes per sample to get real data size so the data end is always multiple of bytes per sample away from data start (data end = data start + bytes per sample + floor(subchunksize/bytes per sample) ). So stream position (since stream reads* in full chunks of bytes per sample size) is also some multiple of bytes per sample away from end and start of data - always. Our data end is always less (weird subchunk size) or equal (good file) to data start + subchunk size.

*About that.. it doesn't actually. If a read reads less bytes than requested then it doesn't seek the stream back or save these bytes for later try or anything - they are thrown away. It's like the code in decode function assumes that it either reads as much as is requested or it reached the end of file (is that wishful thinking or is that a silent property required of all input streams?). It'd be a problem if there was network stream, non blocking disc stream or something similar that can read less than requested in a 'normal' situation. Then your check would apply, but all next samples would be corrupt because of this offset.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Dec 9, 2015

Member

It does not because [...]

That's true. Sorry, when I wrote my answer I had in mind to use subChunkSize directly rather than computing m_dataEnd from sample count -- I forgot to say that in my first reply.

If a read reads less bytes than requested then it doesn't seek the stream back or save these bytes for later try or anything - they are thrown away.

True. So far, SFML streams are very simple, they do not deal with this kind of situation. Robust handling of all kind of stream sources with all the possible problems they can bring, would involve much more work. This would also imply to deal with non-seekable streams, etc.
Since these classes will most likely be replaced with standard streams in SFML 3, I don't think it is worth investigating further on these potential problems.

Feel free to do whatever you prefer in this PR, this is not a blocking point anyway, it's more for nitpicking ;)

Member

LaurentGomila commented Dec 9, 2015

It does not because [...]

That's true. Sorry, when I wrote my answer I had in mind to use subChunkSize directly rather than computing m_dataEnd from sample count -- I forgot to say that in my first reply.

If a read reads less bytes than requested then it doesn't seek the stream back or save these bytes for later try or anything - they are thrown away.

True. So far, SFML streams are very simple, they do not deal with this kind of situation. Robust handling of all kind of stream sources with all the possible problems they can bring, would involve much more work. This would also imply to deal with non-seekable streams, etc.
Since these classes will most likely be replaced with standard streams in SFML 3, I don't think it is worth investigating further on these potential problems.

Feel free to do whatever you prefer in this PR, this is not a blocking point anyway, it's more for nitpicking ;)

@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Dec 9, 2015

Contributor

Since subchunk is supposed to be multiple of bytes per sample, we give subchunk/bps to the info struct as sample count and it'd cause problems or require that weird loop condition with adding bps if it wasn't (corrupt file) I'd say that we keep data end calc as is.

But I like the idea of having begin and end instead of begin and counting end from count. I'll apply that change when I'm home.

Contributor

FRex commented Dec 9, 2015

Since subchunk is supposed to be multiple of bytes per sample, we give subchunk/bps to the info struct as sample count and it'd cause problems or require that weird loop condition with adding bps if it wasn't (corrupt file) I'd say that we keep data end calc as is.

But I like the idea of having begin and end instead of begin and counting end from count. I'll apply that change when I'm home.

@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Dec 9, 2015

Contributor

Done. Feel free to merge it unless you see something wrong with it still. I could have made a mistake you know. 😝

Contributor

FRex commented Dec 9, 2015

Done. Feel free to merge it unless you see something wrong with it still. I could have made a mistake you know. 😝

@LaurentGomila

View changes

Show outdated Hide outdated src/SFML/Audio/SoundFileReaderWav.cpp
@LaurentGomila

View changes

Show outdated Hide outdated src/SFML/Audio/SoundFileReaderWav.cpp
@LaurentGomila

View changes

Show outdated Hide outdated src/SFML/Audio/SoundFileReaderWav.hpp
Made WAV file reader no longer assume that data chunk goes till end
of file to prevent reading trailing metadata as samples.
@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Dec 9, 2015

Contributor

Done.

Contributor

FRex commented Dec 9, 2015

Done.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Dec 9, 2015

Member

Looks good 👍

Thanks.

Member

LaurentGomila commented Dec 9, 2015

Looks good 👍

Thanks.

@LaurentGomila LaurentGomila added this to the 2.4 milestone Dec 9, 2015

@eXpl0it3r eXpl0it3r self-assigned this Dec 9, 2015

@mantognini

This comment has been minimized.

Show comment
Hide comment
@mantognini

mantognini Jan 18, 2016

Member

This seems okay for merging, right? (Just asking because it's not on the 2.4 "Stuff ready to be merged"-list)

Member

mantognini commented Jan 18, 2016

This seems okay for merging, right? (Just asking because it's not on the 2.4 "Stuff ready to be merged"-list)

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Feb 18, 2016

Member

This PR has been added to my merge list, meaning it will be merged soon, unless someone raises any concerns.

Member

eXpl0it3r commented Feb 18, 2016

This PR has been added to my merge list, meaning it will be merged soon, unless someone raises any concerns.

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Feb 22, 2016

Member

Merged in 698bbcc

Member

eXpl0it3r commented Feb 22, 2016

Merged in 698bbcc

@eXpl0it3r eXpl0it3r closed this Feb 22, 2016

@FRex

This comment has been minimized.

Show comment
Hide comment
@FRex

FRex Feb 22, 2016

Contributor

Yay. 👍

Contributor

FRex commented Feb 22, 2016

Yay. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment