Skip to content

Commit

Permalink
script to collide PDFs
Browse files Browse the repository at this point in the history
  • Loading branch information
angea committed Dec 19, 2018
1 parent b73cbf6 commit 3832f62
Show file tree
Hide file tree
Showing 6 changed files with 156 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
*.pdf binary
*.pd binary
*.bin binary
19 changes: 18 additions & 1 deletion collisions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -622,6 +622,7 @@ digraph {

This way, we can safely collide any pair of PDFs, no matter the page numbers, dimensions, images...


**comments**

PDF can store foreign data in two ways:
Expand Down Expand Up @@ -678,6 +679,7 @@ A true cryptographic artistic creation :)

(Note I screwed up with Adobe compatibility, but that's my fault, not UniColl's)


**colliding document structure**

Whether you use UniColl as inline comment or Chosen Prefix in a dummy stream object, the strategy is similar:
Expand All @@ -689,10 +691,24 @@ MuTool doesn't discard bogus key/values - unless asked, and keep them in the sam
so using fake dictionary entries such as `/MD5_is /REALLY_dead_now__` is perfect to align things predictably without needing another kind of comments.
However it won't keep comments in dictionaries (so no inline-comment trick)

An easy way to do the object-shuffling operation without hassle is just to merge both PDF files
via `mutool merge` then split the `/Pages` object in 2.

To make room for this object, just merge in front of the 2 documents a dummy PDF.

Optionally, create a fake reference to the dangling array
to prevent garbage collection from deleting the second set of pages.

With this [script](scripts/pdf.py),
it takes less than a second to collide the 2 public PDF papers like Spectre and Meltdown:

Examples: [spectre.pdf](examples/collision1.pdf)[meltdown.pdf](examples/collision2.pdf)

<img alt='identical prefix PDF collisions' src=pics/specdown.png width=500/>

Possible extension: chain UniColl blocks to also keep pairs of the various [non-critical objects](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=81)
that can be referenced in the Root object - such as `Outlines`, `Names`, `AcroForm` and Additional Actions (`AA`) - in the original source files.

**in PDFLaTeX**

The previous technics work with just a pair of PDF files,
Expand Down Expand Up @@ -754,6 +770,7 @@ You can define objects directly - including dummy key and values for alignments
Don't forget to normalize PDFLaTeX output - with `mutool` for example - if needed:
PDFLaTeX is hard to get reproducible builds across distributions - you may even want to hook the time on execution to get the exact hash if required.


## Uncommon strategies

Collisions are usually about 2 valid files of the same type.
Expand Down Expand Up @@ -858,7 +875,7 @@ then you can't deny that you don't have the other file (showing incriminating co

Softwares typically focus on (quick) parsing, not on detailed file analysis.

<img alt='an image showing different previews under different tabs of EnCase Forensic' src=pics/encase.png width=400/>
<img alt='different previews under different tabs of EnCase Forensic' src=pics/encase.png width=400/>

*an image showing different previews under different tabs of EnCase Forensic*

Expand Down
Binary file added collisions/scripts/dummy.pdf
Binary file not shown.
125 changes: 125 additions & 0 deletions collisions/scripts/pdf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# script to craft MD5 collisions of 2 PDFs via mutool and UniColl

# Ange Albertini 2018

import os
import sys
import hashlib

def EnclosedString(d, starts, ends):
off = d.find(starts) + len(starts)
return d[off:d.find(ends, off)]

def getCount(d):
s = EnclosedString(d, "/Count ", "/")
count = int(s)
return count

def procreate(l): # :p
return " 0 R ".join(l) + " 0 R"


if len(sys.argv) == 1:
print("PDF MD5 collider")
print("Usage: pdf.py <file1.pdf> <file2.pdf>")
sys.exit()

os.system('mutool merge -o first.pdf %s' % sys.argv[1])
os.system('mutool merge -o second.pdf %s' % sys.argv[2])
os.system('mutool merge -o merged.pdf dummy.pdf %s %s' % (sys.argv[1], sys.argv[2]))

with open("first.pdf", "rb") as f:
d1 = f.read()

with open("second.pdf", "rb") as f:
d2 = f.read()

with open("merged.pdf", "rb") as f:
dm = f.read()


COUNT1 = getCount(d1)
COUNT2 = getCount(d2)


kids = EnclosedString(dm, "/Kids[", "]")

# we skip the first dummy, and the last " 0 R" string
pages = kids[:-4].split(" 0 R ")[1:]

template = """%%PDF-1.4
1 0 obj
<<
/Type /Catalog
%% for alignements (comments will be removed by merging or cleaning)
/MD5_is__ /REALLY_dead_now__
/Pages 2 0 R
%% to make sure we don't get rid of the other pages when garbage collecting
/Fakes 3 0 R
%% placeholder for UniColl collision blocks
/0123456789ABCDEF0123456789ABCDEF012
/0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0
>>
endobj
2 0 obj
<</Type/Pages/Count %(COUNT2)i/Kids[%(KIDS2)s]>>
endobj
3 0 obj
<</Type/Pages/Count %(COUNT1)i/Kids[%(KIDS1)s]>>
endobj
4 0 obj %% overwritten - was a fake page to fool merging
<< >>
endobj
"""

KIDS1 = procreate(pages[:getCount(d1)])

KIDS2 = procreate(pages[getCount(d1):])


with open("hacked.pdf", "wb") as f:
f.write(template % locals())
# adjust parents for the first set of pages
f.write(dm[dm.find("5 0 obj"):].replace("/Parent 2 0 R", "/Parent 3 0 R", COUNT1))

# let's adjust offsets - -g to get rid of object 4 by garbage collecting
# (yes, errors will appear)
os.system('mutool clean -gggg hacked.pdf cleaned.pdf')

with open("cleaned.pdf", "rb") as f:
cleaned = f.read()

# some mutool versions do different stuff :(
cleaned = cleaned.replace(
" 65536 f \n0000000016 00000 n \n",
" 65536 f \n0000000018 00000 n \n",
1)

with open("pdf1.bin", "rb") as f:
prefix1 = f.read()

with open("pdf2.bin", "rb") as f:
prefix2 = f.read()

file1 = prefix1 + "\n" + cleaned[192:]
file2 = prefix2 + "\n" + cleaned[192:]

with open("collision1.pdf", "wb") as f:
f.write(file1)

with open("collision2.pdf", "wb") as f:
f.write(file2)

assert hashlib.md5(file1).digest() == hashlib.md5(file2).digest()

os.remove('first.pdf')
os.remove('second.pdf')
os.remove('merged.pdf')
os.remove('hacked.pdf')
os.remove('cleaned.pdf')
6 changes: 6 additions & 0 deletions collisions/scripts/pdf1.bin
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
%PDF-1.3
%%����!

1 0 obj
<</Type/Catalog/MD5_is/REALLY_dead_now__/Pages 2 0 R
%���e���X�_~����X��e%�8�����{�b���V��8�"��[��#�_p�ΖFZ�v�p�%�% ��j�dZ�c�aU���[���yN�5�+��y�ᩰ��(z�
6 changes: 6 additions & 0 deletions collisions/scripts/pdf2.bin
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
%PDF-1.3
%%����!

1 0 obj
<</Type/Catalog/MD5_is/REALLY_dead_now__/Pages 3 0 R
%���e���X�_~����X��e%�8�����{�b���V��8�"��[��#�_p�ΕFZ�v�p�%�% ��j�dZ�c�aU���[���yN�5�+��y�ᩰ��(z�

0 comments on commit 3832f62

Please sign in to comment.