script to collide PDFs

corkami · Dec 19, 2018 · 3832f62 · 3832f62
1 parent b73cbf6
commit 3832f62
Show file tree

Hide file tree

Showing 6 changed files with 156 additions and 1 deletion.
diff --git a/.gitattributes b/.gitattributes
@@ -1,2 +1,3 @@
 *.pdf binary
 *.pd binary
+*.bin binary
diff --git a/collisions/README.md b/collisions/README.md
@@ -622,6 +622,7 @@ digraph {
 
 This way, we can safely collide any pair of PDFs, no matter the page numbers, dimensions, images...
 
+
 **comments**
 
 PDF can store foreign data in two ways: 
@@ -678,6 +679,7 @@ A true cryptographic artistic creation :)
 
 (Note I screwed up with Adobe compatibility, but that's my fault, not UniColl's)
 
+
 **colliding document structure**
 
 Whether you use UniColl as inline comment or Chosen Prefix in a dummy stream object, the strategy is similar:
@@ -689,10 +691,24 @@ MuTool doesn't discard bogus key/values - unless asked, and keep them in the sam
 so using fake dictionary entries such as `/MD5_is /REALLY_dead_now__` is perfect to align things predictably without needing another kind of comments.
 However it won't keep comments in dictionaries (so no inline-comment trick)
 
+An easy way to do the object-shuffling operation without hassle is just to merge both PDF files
+via `mutool merge` then split the `/Pages` object in 2.
+
+To make room for this object, just merge in front of the 2 documents a dummy PDF.
+
+Optionally, create a fake reference to the dangling array
+to prevent garbage collection from deleting the second set of pages.
+
+With this [script](scripts/pdf.py),
+it takes less than a second to collide the 2 public PDF papers like Spectre and Meltdown:
+
 Examples: [spectre.pdf](examples/collision1.pdf) ⟷ [meltdown.pdf](examples/collision2.pdf)
 
 <img alt='identical prefix PDF collisions' src=pics/specdown.png width=500/>
 
+Possible extension: chain UniColl blocks to also keep pairs of the various [non-critical objects](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=81)
+that can be referenced in the Root object - such as `Outlines`, `Names`, `AcroForm` and Additional Actions (`AA`) - in the original source files.
+
 **in PDFLaTeX**
 
 The previous technics work with just a pair of PDF files,
@@ -754,6 +770,7 @@ You can define objects directly - including dummy key and values for alignments
 Don't forget to normalize PDFLaTeX output - with `mutool` for example - if needed:
 PDFLaTeX is hard to get reproducible builds across distributions - you may even want to hook the time on execution to get the exact hash if required.
 
+
 ## Uncommon strategies
 
 Collisions are usually about 2 valid files of the same type.
@@ -858,7 +875,7 @@ then you can't deny that you don't have the other file (showing incriminating co
 
 Softwares typically focus on (quick) parsing, not on detailed file analysis.
 
-<img alt='an image showing different previews under different tabs of EnCase Forensic' src=pics/encase.png width=400/>
+<img alt='different previews under different tabs of EnCase Forensic' src=pics/encase.png width=400/>
 
 *an image showing different previews under different tabs of EnCase Forensic*
 

diff --git a/collisions/scripts/dummy.pdf b/collisions/scripts/dummy.pdf
diff --git a/collisions/scripts/pdf.py b/collisions/scripts/pdf.py
@@ -0,0 +1,125 @@
+# script to craft MD5 collisions of 2 PDFs via mutool and UniColl
+
+# Ange Albertini 2018
+
+import os
+import sys
+import hashlib
+
+def EnclosedString(d, starts, ends):
+  off = d.find(starts) + len(starts)
+  return d[off:d.find(ends, off)]
+
+def getCount(d):
+  s = EnclosedString(d, "/Count ", "/")
+  count = int(s)
+  return count
+
+def procreate(l): # :p
+  return " 0 R ".join(l) + " 0 R"
+
+
+if len(sys.argv) == 1:
+  print("PDF MD5 collider")
+  print("Usage: pdf.py <file1.pdf> <file2.pdf>")
+  sys.exit()
+
+os.system('mutool merge -o first.pdf %s' % sys.argv[1])
+os.system('mutool merge -o second.pdf %s' % sys.argv[2])
+os.system('mutool merge -o merged.pdf dummy.pdf %s %s' % (sys.argv[1], sys.argv[2]))
+
+with open("first.pdf", "rb") as f:
+  d1 = f.read()
+
+with open("second.pdf", "rb") as f:
+  d2 = f.read()
+
+with open("merged.pdf", "rb") as f:
+  dm = f.read()
+
+
+COUNT1 = getCount(d1)
+COUNT2 = getCount(d2)
+
+
+kids = EnclosedString(dm, "/Kids[", "]")
+
+# we skip the first dummy, and the last " 0 R" string
+pages = kids[:-4].split(" 0 R ")[1:]
+
+template = """%%PDF-1.4
+
+1 0 obj
+<<
+  /Type /Catalog
+
+  %% for alignements (comments will be removed by merging or cleaning)
+  /MD5_is__ /REALLY_dead_now__
+  /Pages 2 0 R
+  %% to make sure we don't get rid of the other pages when garbage collecting
+  /Fakes 3 0 R
+  %% placeholder for UniColl collision blocks
+  /0123456789ABCDEF0123456789ABCDEF012
+  /0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0
+>>
+endobj
+
+2 0 obj
+<</Type/Pages/Count %(COUNT2)i/Kids[%(KIDS2)s]>>
+endobj 
+
+3 0 obj
+<</Type/Pages/Count %(COUNT1)i/Kids[%(KIDS1)s]>>
+endobj
+
+4 0 obj %% overwritten - was a fake page to fool merging
+<< >>
+endobj
+
+"""
+
+KIDS1 = procreate(pages[:getCount(d1)])
+
+KIDS2 = procreate(pages[getCount(d1):])
+
+
+with open("hacked.pdf", "wb") as f:
+  f.write(template % locals())
+  # adjust parents for the first set of pages
+  f.write(dm[dm.find("5 0 obj"):].replace("/Parent 2 0 R", "/Parent 3 0 R", COUNT1))
+
+# let's adjust offsets - -g to get rid of object 4 by garbage collecting
+# (yes, errors will appear)
+os.system('mutool clean -gggg hacked.pdf cleaned.pdf')
+
+with open("cleaned.pdf", "rb") as f:
+  cleaned = f.read()
+
+# some mutool versions do different stuff :(
+cleaned = cleaned.replace(
+  " 65536 f \n0000000016 00000 n \n",
+  " 65536 f \n0000000018 00000 n \n",
+  1)
+
+with open("pdf1.bin", "rb") as f:
+  prefix1 = f.read()
+
+with open("pdf2.bin", "rb") as f:
+  prefix2 = f.read()
+
+file1 = prefix1 + "\n" + cleaned[192:]
+file2 = prefix2 + "\n" + cleaned[192:]
+
+with open("collision1.pdf", "wb") as f:
+  f.write(file1)
+
+with open("collision2.pdf", "wb") as f:
+  f.write(file2)
+
+assert hashlib.md5(file1).digest() == hashlib.md5(file2).digest()
+
+os.remove('first.pdf')
+os.remove('second.pdf')
+os.remove('merged.pdf')
+os.remove('hacked.pdf')
+os.remove('cleaned.pdf')
diff --git a/collisions/scripts/pdf1.bin b/collisions/scripts/pdf1.bin
@@ -0,0 +1,6 @@
+%PDF-1.3
+%%����!
+
+1 0 obj
+<</Type/Catalog/MD5_is/REALLY_dead_now__/Pages 2 0 R
+%���e���X�_~����X��e%�8�����{�b���V��8�"��[��#�_p�ΖFZ�v�p�%�% ��j�dZ�c�aU���[���yN�5�+��y�ᩰ��(z�
diff --git a/collisions/scripts/pdf2.bin b/collisions/scripts/pdf2.bin
@@ -0,0 +1,6 @@
+%PDF-1.3
+%%����!
+
+1 0 obj
+<</Type/Catalog/MD5_is/REALLY_dead_now__/Pages 3 0 R
+%���e���X�_~����X��e%�8�����{�b���V��8�"��[��#�_p�ΕFZ�v�p�%�% ��j�dZ�c�aU���[���yN�5�+��y�ᩰ��(z�