Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous Decodes #3

Open
DavidBuchanan314 opened this issue Dec 15, 2021 · 13 comments
Open

Ambiguous Decodes #3

DavidBuchanan314 opened this issue Dec 15, 2021 · 13 comments

Comments

@DavidBuchanan314
Copy link
Owner

I have identified a potential (security?) flaw with this approach.

I think it is possible to craft a file, where:

decompress(a + b) != decompress(a) + decompress(b)

This could happen if a ends midway through a non-compressed block.

It is therefore possible for an image to have two possible interpretations, depending on whether a parallel or non-parallel decoder decodes it.

This can be mitigated by the decoder, by checking that there is no unprocessed data in each piece of the zlib stream. My implementation does not currently do this!

I realise this sounds a bit implausible, so I will try to come up with a proof-of-concept.

@DavidBuchanan314
Copy link
Owner Author

DavidBuchanan314 commented Dec 15, 2021

Proof-of-concept:

import zlib

def decompress_headerless(data):
	d = zlib.decompressobj(wbits=-15)
	result = d.decompress(data)
	result += d.flush()

	# do all the checks we can?
	assert(len(d.unconsumed_tail) == 0)
	assert(len(d.unused_data) == 0)

	return result

c = zlib.compressobj(level=9, wbits=-15)

# there is an off-by-6 bug somewhere in this rats nest, and I can't be bothered to figure it out...
TARGET_SIZE = 64 + 6

a_body = b"Hello, world!"
a_body += b" " * (TARGET_SIZE - 6 - len(a_body))

a = b""
a += c.compress(a_body)
a += c.flush(zlib.Z_FULL_FLUSH)

# declare start of a non-compressed block
blen = TARGET_SIZE // 2 + 5
a += b"\x00" # not last block
a += blen.to_bytes(2, "little")
a += (blen^0xFFFF).to_bytes(2, "little")

c2 = zlib.compressobj(level=9, wbits=-15)
b = b""
b += c2.compress(b" SECRET MESSAGE ")
b += c2.flush(zlib.Z_FULL_FLUSH)

endlen = TARGET_SIZE - len(b)# - 5
b += b"\x01" # last block
b += endlen.to_bytes(2, "little")
b += (endlen^0xFFFF).to_bytes(2, "little")
b += b"X" * (endlen - TARGET_SIZE // 2)

b += c2.compress(b" ALTERNATE MSG ")
b += c2.flush(zlib.Z_FULL_FLUSH)

endlen = TARGET_SIZE - len(b)# - 5
b += b"\x01" # last block
b += endlen.to_bytes(2, "little")
b += (endlen^0xFFFF).to_bytes(2, "little")
b += b"Y" * endlen

print("zlib part a:", a.hex())
print("zlib part b:", b.hex())

piece_a = decompress_headerless(a)
piece_b = decompress_headerless(b)

assert(len(piece_a) == 64)
assert(len(piece_b) == 64)

interpretation_1 = piece_a + piece_b
interpretation_2 = decompress_headerless(a + b)

assert(len(interpretation_1) == 128)
assert(len(interpretation_2) == 128)

print(interpretation_1)
print(interpretation_2)

# this is not true!!!
assert(interpretation_1 == interpretation_2)

"""
OUTPUT:

zlib part a: f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff
zlib part b: 520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959
b'Hello, world!                                                    SECRET MESSAGE XXXXXXXXXXXXXRp\xf4\tq\r\xf2s\x0cqU\xf0\rvW\x00\x00\x00\x00\xff\xff\x01\t\x00\xf6\xffYYYYYYYYY'
b'Hello, world!                                                   R\x08vu\x0er\rQ\xf0u\r\x0evtwU\x00\x00\x00\x00\xff\xff\x010\x00\xcf\xffXXXXXXXXXXXXX ALTERNATE MSG YYYYYYYYY'
Traceback (most recent call last):
  File "./zlib_trick.py", line 68, in <module>
    assert(interpretation_1 == interpretation_2)
AssertionError
"""

@DavidBuchanan314
Copy link
Owner Author

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

@richgel999
Copy link

OK - I get what you're saying. Parallel PNG assumes that each IDAT can be independently decoded, started at the first byte of each IDAT.

This can be mitigated by the decoder, by checking that there is no unprocessed data in each piece of the zlib stream. My implementation does not currently do this!

This is what my decoder can (and will do). So I don't see any security risk. You just need to make sure that each IDAT block's bytes are fully consumed by the decompressor. If not, you stop.

Also, any fuzzed/compliant Deflate decompressor should be able to consume any input (valid or not) without crashing, overwriting memory, etc. I don't see any security risk. Worst case, you can always fall back to trying serial decompression if the parallel decomp fails (that's what I'm considering doing). If that fails, the PNG is invalid.

I think the Parallel PNG proposal should clearly document exactly what each IDAT block should contain. It should be a requirement that each IDAT block end on a block (not bit) boundary, and that there are no extra bytes remaining in the IDAT block.

@DavidBuchanan314
Copy link
Owner Author

I realised that apple made this same mistake in their own parallel-decodable PNG implementation: https://www.da.vidbuchanan.co.uk/widgets/pngdiff/

@DavidBuchanan314
Copy link
Owner Author

And yes, this is a 100% solvable problem - but clearly even Apple didn't get it right, so it's something that needs to be approached with care.

@richgel999
Copy link

Interesting - I wasn't aware of this. It's solid existence proof that it's doable.

@elyscape
Copy link

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

Check the value of d.eof:

$ cat zlib_trick.py
import base64
import zlib

def decompress_headerless(data, info=''):
    d = zlib.decompressobj(wbits=-15)
    result = d.decompress(data)
    result += d.flush()

    assert(len(d.unconsumed_tail) == 0)
    assert(len(d.unused_data) == 0)

    print(f'{info} properly formed: {d.eof}')
    return result

a = base64.b16decode('f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff'.upper())
b = base64.b16decode('520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959'.upper())

print(f'zlib part a: {a.hex()}')
print(f'zlib part b: {b.hex()}')

piece_a = decompress_headerless(a, 'Separately, piece a')
piece_b = decompress_headerless(b, 'Separately, piece b')

interpretation_1 = piece_a + piece_b
interpretation_2 = decompress_headerless(a + b, 'Concatenated a + b')

print(interpretation_1)
print(interpretation_2)
$ python3 zlib_trick.py
zlib part a: f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff
zlib part b: 520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959
Separately, piece a properly formed: False
Separately, piece b properly formed: True
Concatenated a + b properly formed: True
b'Hello, world!                                                    SECRET MESSAGE XXXXXXXXXXXXXRp\xf4\tq\r\xf2s\x0cqU\xf0\rvW\x00\x00\x00\x00\xff\xff\x01\t\x00\xf6\xffYYYYYYYYY'
b'Hello, world!                                                   R\x08vu\x0er\rQ\xf0u\r\x0evtwU\x00\x00\x00\x00\xff\xff\x010\x00\xcf\xffXXXXXXXXXXXXX ALTERNATE MSG YYYYYYYYY'

@DavidBuchanan314
Copy link
Owner Author

Aha, thanks, that's what I was looking for!

@randy408
Copy link

I think the big problem with this will be errors specific to parallel decoding that don't count as errors when decoding linearly, therefore if you fail at parallel decoding at any point and don't fall back to linear decoding it's technically non-conformant, fun.

It's not like APNG where you fall back to the default image, you have to fall back to the standard decoding method and potentially re-decode some rows.

@seungjunProgramming
Copy link

Hi! A newb about PNG file here. What does “ a ends midway through a non-compressed block.” mean?

@DavidBuchanan314
Copy link
Owner Author

@S3Studio
Copy link

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

Check the value of d.eof:

I would like to join the conversation, and if there is any mistake in my view, please point it out.
In short, my point is that d.eof is not helpful to check incomplete data.
d.eof tells whether the last complete block is a final block, and it is not related with the block under processed, even if it only contains header.
There is an example for it:

~/Desktop/ cat test2.py    
import base64
import zlib

def compress_headerless(data, final, info=''):
    e = zlib.compressobj(level=9, wbits=-15)
    result = e.compress(data)
    result += e.flush(zlib.Z_FINISH if final else zlib.Z_FULL_FLUSH)

    print(f'{info} compressed as: {result.hex()}')
    return result

def decompress_headerless(data, info=''):
    d = zlib.decompressobj(wbits=-15)
    result = d.decompress(data)
    result += d.flush()

    print(f'{info} properly formed: {d.eof}')
    assert(len(d.unconsumed_tail) == 0)
    assert(len(d.unused_data) == 0)

    return result

a_body = b"Hello, world!"

a = compress_headerless(a_body, False, 'NotFinal block')
decompress_headerless(a, 'NotFinal block')
a += base64.b16decode('00'.upper())
decompress_headerless(a, 'NotFinal block with extra')

a2 = compress_headerless(a_body, True, 'Final block')
decompress_headerless(a2, 'Final block')
a2 += base64.b16decode('00'.upper())
decompress_headerless(a2, 'Final block with extra')

~/Desktop/ python3 test2.py
NotFinal block compressed as: f248cdc9c9d75128cf2fca495104000000ffff
NotFinal block properly formed: False
NotFinal block with extra properly formed: False
Final block compressed as: f348cdc9c9d75128cf2fca49510400
Final block properly formed: True
Final block with extra properly formed: True
Traceback (most recent call last):
  File "/Users/smallay/Desktop/test2.py", line 33, in <module>
    decompress_headerless(a2, 'Final block with extra')
  File "/Users/smallay/Desktop/test2.py", line 19, in decompress_headerless
    assert(len(d.unused_data) == 0)
AssertionError

So I think it is a bit more difficult to check incomplete data than it looks like, because the IDAT chunks of PNG always contain non-final blocks except for the last one.

@randy408
Copy link

the IDAT chunks of PNG always contain non-final blocks except for the last one.

In practice it doesn't even have to end, decoders just stop when they got enough data and skip the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants