Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working Collision? #1

Open
dxoigmn opened this issue Aug 18, 2021 · 118 comments
Open

Working Collision? #1

dxoigmn opened this issue Aug 18, 2021 · 118 comments

Comments

@dxoigmn
Copy link

@dxoigmn dxoigmn commented Aug 18, 2021

Can you verify that these two images collide?
beagle360
collision

Here's what I see from following your directions:

$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
59a34eabe31910abfb06f308
$ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
59a34eabe31910abfb06f308
@AsuharietYgvar
Copy link
Owner

@AsuharietYgvar AsuharietYgvar commented Aug 18, 2021

Yes! I can confirm that both images generate the exact same hashes on my iPhone. And they are identical to what you generated here.

Loading

@prusnak
Copy link

@prusnak prusnak commented Aug 18, 2021

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

Loading

@fuomag9
Copy link

@fuomag9 fuomag9 commented Aug 18, 2021

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

If I'm not mistaken a preimage attack was done here
Edit: this should be the working script (I haven't tested it)

Loading

@tmechen
Copy link

@tmechen tmechen commented Aug 18, 2021

If I'm not mistaken a preimage attack was done here

hmmm
"This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here" (in the comments)

Loading

@lericson
Copy link

@lericson lericson commented Aug 18, 2021

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

Loading

@lericson
Copy link

@lericson lericson commented Aug 18, 2021

"This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here"

Or they used images generated by that site as the starting point. As I noted in my previous comment, it's impossible to know without having the NeuralHash NCMEC database.

Loading

@erlenmayr
Copy link

@erlenmayr erlenmayr commented Aug 18, 2021

This is not only a collision, it is a pre-image, which breaks the algorithm even more.

Collison:
Find two random images with the same hash.

Pre-image:
Find an image with the same hash as a known, given image.

Loading

@fuomag9
Copy link

@fuomag9 fuomag9 commented Aug 18, 2021

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

If I'm not mistaken the DB with the hashes is stored locally and you can extract it from the iOS 15 beta

Edit: the hashes are stored locally but not in a way that makes them recoverable to the end user, see below

Loading

@nicolaskopp
Copy link

@nicolaskopp nicolaskopp commented Aug 18, 2021

Holy Shit.

Loading

@CRTified
Copy link

@CRTified CRTified commented Aug 18, 2021

@erlenmayr No, this is likely a second preimage attack. In a preimage attack you're just given the hash, not the image.

Loading

@gillescoolen
Copy link

@gillescoolen gillescoolen commented Aug 18, 2021

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue?
This was untrue. See this comment.

Loading

@fuomag9
Copy link

@fuomag9 fuomag9 commented Aug 18, 2021

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue?

This is from apple's pdf on the technical details of their implementation. Feel free to correct me but from my understanding the blinded hash is the CSAM hash DB

image

Edit: I was wrong and we cannot extract them

Loading

@gillescoolen
Copy link

@gillescoolen gillescoolen commented Aug 18, 2021

Wow. Couldn't they have hashed the image locally, send it to the server and compare it there?

image
My bad. The hashes on the device have gone through a blinding process, so my train of thought was not correct. Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

Loading

@fuomag9
Copy link

@fuomag9 fuomag9 commented Aug 18, 2021

https://twitter.com/angelwolf71885/status/1427922279778881538?s=20 this seems iteresting tho, a CSAM sample hash seems to exist?

Loading

@dxoigmn
Copy link
Author

@dxoigmn dxoigmn commented Aug 18, 2021

@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)?

Not sure. My sense is that seed1 seems to be mixing bits from the output of model. But we already know from the literature it is very likely. At least, I am reasonably confident one could generate a noisy gray image that outputs some desired hash value.

@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs.

No one has obtained the CSAM hashes, AFAIK. This repo is just a neural network that takes an input and produces a hash. It's as if someone released a new hash algorithm. (What you do with those hashes is a different story and the comparison of the hash with CSAM hashes is a whole other system.) This just shows that an image of a dog has the same hash value as a noisy gray image. That is, ask the model for the hash of the dog image, then ask the model how to change a gray image to make it output the same hash as the dog image. So it's a second-preimage, which we know has to be possible (by the pigeonhole principle); but it's really just a matter of feasibility. The interesting thing about neural network models (when compared to cryptographic hashes) is that they give you a gradient which tells you how you can change stuff to optimize some objective. This is the basis of deep dream, adversarial examples, and even just plain old training of neural network models.

Loading

@nicolaskopp
Copy link

@nicolaskopp nicolaskopp commented Aug 18, 2021

@tmechen

If I'm not mistaken a preimage attack was done here

hmmm
"This is so fake it's not even funny. These are just images generated by the model from https://thisartworkdoesnotexist.com . It's hilarious to see so many people falling for it here" (in the comments)

From what I understood, the story the HN comment is referring to is AI generated porn, which can match as "real" porn. Aka the "send dunes" story: https://petapixel.com/2017/12/20/uk-police-porn-spotting-ai-gets-confused-desert-photos/

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

Loading

@jeifour
Copy link

@jeifour jeifour commented Aug 18, 2021

Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

If they compare on the server then Apple is able to know the outcome of the comparison even if there is only a single match. The idea is that Apple can only know that a certain user has matching images in their iCloud account as soon as a certain threshold number of matches is reached. That's my understanding as a non-expert in this field.

Loading

@Nicceboy
Copy link

@Nicceboy Nicceboy commented Aug 18, 2021

Does anyone know why they chose to compare it on the client instead of on the server, seeing as it is only ran on images uploaded to iCloud?

If they compare on the server then Apple is able to know the outcome of the comparison even if there is only a single match. The idea is that Apple can only know that a certain user has matching images in their iCloud account as soon as a certain threshold number of matches is reached. That's my understanding as a non-expert in this field.

I think they also inject false positives for results on device side, so they don't know if there is only a single match. They know only when threshold is reached. It was in the PSI paper.

Loading

@Nicceboy
Copy link

@Nicceboy Nicceboy commented Aug 18, 2021

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

Every practical hashing algorithm is one-to-many. It is only a matter of time when collisions come. This repo contains the model from older release and we don't really know how they have tweaked parameters on Apple side or even how the huge data set is improving the accuracy. NN has many layers so training is rather important.
The goal of the algorithm is to seek matches from specific kind of material (CSAM), and that is used for training. Can we expect that it calculates hashes with similar accuracy for all kind of images? (e.g is it equally hard to make collision for CSAM material than the picture of the dog?)

Loading

@nicolaskopp
Copy link

@nicolaskopp nicolaskopp commented Aug 18, 2021

This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff.

Every practical hashing algorithm is one-to-many. It is only a matter of time when collisions come. This repo contains the model from older release and we don't really know how they have tweaked parameters on Apple side or even how the huge data set is improving the accuracy. NN has many layers so training is rather important.

Let me quote from this article of someone who can explain this better than me:

https://www.hackerfactor.com/blog/index.php?/archives/929-One-Bad-Apple.html

In the six years that I've been using these hashes at FotoForensics, I've only matched 5 of these 3 million MD5 hashes. (They really are not that useful.) In addition, one of them was definitely a false-positive. (The false-positive was a fully clothed man holding a monkey -- I think it's a rhesus macaque. No children, no nudity.)

and:

According to NCMEC, I submitted 608 reports to NCMEC in 2019, and 523 reports in 2020. In those same years, Apple submitted 205 and 265 reports (respectively). It isn't that Apple doesn't receive more picture than my service, or that they don't have more CP than I receive. Rather, it's that they don't seem to notice and therefore, don't report.

Let that sink in.

Now back to our issue:

While hash detection is a really bad idea in general for such purposes, the specific implementation from apple does not matter, because it is closed source. You literally don't know what apple is doing, with what data, and what result comes out of it.

Even if apples specific algorithm is the best in the world and does not have these drawbacks: You would not know. You would have to live in constant fear of your phone "thinking" you may are doing something wrong. That's scary.

Loading

@Nicceboy
Copy link

@Nicceboy Nicceboy commented Aug 18, 2021

Even if apples specific algorithm is the best in the world and does not have this drawbacks: You would not know. You would have to live in constant fear of your phone "thinking" you may are doing something wrong. That's scary.

That is very true. We can either give total trust or nothing at all on closed systems.

Loading

@weskerfoot
Copy link

@weskerfoot weskerfoot commented Aug 18, 2021

@fuomag9 so the list of known CP hashes is shipped on every device? Isn't this a huge security issue?

It's not. It's encrypted using elliptic-curve cryptography first. They keep the actual CSAM neuralhash db on their server.

Why do they do that? A) To prevent reversing the hashes, as mentioned. B) So they can derive the encryption keys for the image locally, which is what the blinded hash table is also used for. I'm leaving out a lot of details that are mentioned in the whitepaper they released, so read that.

Loading

@gillescoolen
Copy link

@gillescoolen gillescoolen commented Aug 18, 2021

@weskerfoot Yeah I edited my other comment with an explanation of the blinded hash.

Loading

@KuromeSan
Copy link

@KuromeSan KuromeSan commented Aug 18, 2021

all we need now is a hash of what is considered 'csam' to iphone, and start making images collide with it,
if we could somehow make collisions with arbitrary content, (eg. like a funny meme that someone would be likely to save to there phone) that would be great-

I'm not sure how to get a hash without having such a file to hash it though ..

Oh just curious, since the hashing is done on the client side, would it be possible to tell iCloud that the hash matched every time? and if so, what's stopping you from just flooding it with random shit?

Loading

@weskerfoot
Copy link

@weskerfoot weskerfoot commented Aug 18, 2021

Oh just curious, since the hashing is done on the client side, would it be possible to tell iCloud that the hash matched every time? and if so, what's stopping you from just flooding it with random shit?

In fact it already does that by design, in order to obscure how many matching images there are before the threshold is crossed where it can decrypt all matching images
image

So if you can figure out a way to generate synthetic matches on demand, you could make it think there are lots of matches, but it would soon discover they're "fake" once the threshold is crossed, since it wouldn't be able to decrypt them all. Not sure what it would do if you did that repeatedly, maybe it would cause issues.

Edit: to be clear, the inner encryption key is associated with the NeuralHash. So if you have a false positive NeuralHash output, it would trigger manual review, but you would need to actually have that, which is why they keep them a secret.

Loading

@indiv0
Copy link

@indiv0 indiv0 commented Aug 18, 2021

56C19CA9-E56E-4742-A403-A8D478ECE688

Loading

@WriteCodeEveryday
Copy link

@WriteCodeEveryday WriteCodeEveryday commented Aug 18, 2021

I wonder if this code can be ported so you can generate the collision an Android device...

Loading

@KuromeSan
Copy link

@KuromeSan KuromeSan commented Aug 18, 2021

can we take any given image and then make its hash totally different, despite the image looking basically identical?
just curious if this even works for the one thing its suppost to do.. haha

Loading

@fuomag9
Copy link

@fuomag9 fuomag9 commented Aug 18, 2021

taking a given image and then make its hash totally different, despite the image looking basically identical. can we do that?
just curious if this even works for the one thing its suppost to do..

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

Loading

@KuromeSan
Copy link

@KuromeSan KuromeSan commented Aug 18, 2021

cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that

ah yes, because pedos are being so sophisticated to use end to end encryption these days that we need to backdoor everyone's phones, because think of the children but there NOT sophisticated enough to ..checks notes.. crop images,

Loading

@GiverofMemory
Copy link

@GiverofMemory GiverofMemory commented Aug 20, 2021

I'm convinced you can take almost any image and smoothly transition it to very close to any other image while retaining its original perceptual hash value. The properties of what a perceptual hash needs to do practically guarantee it.

If you think of the space of all possible images of a given height / width, the perceptual hash should divide that space up into clustered continuous chunks where locality is determined by tiny edits. An obvious way to define the space is as one dimension for each color of each pixel with values from 0 to 255. The issue is that as we increase the number of dimensions, these clusters of identical hashes are more forced to stretch to fill that space.

To get some intuition on this: consider as you increase dimensions, the "volume" of an n-dimensional sphere divided by the "volume" of the corresponding n-dimensional cube approaches 0. This is concerning because the sphere is the best possible packing you can get, in other words the ability of well packed objects to fill space gets worse as dimensions go up. Consider that even a tiny 360 by 360 image would represent a huge 388,800 dimensional space. Instead of getting nice clusters, we're getting a kind of interwoven series of sponges for each hash value which completely fills the space, which is admittedly hard to visualize.

Not being able to tightly pack these hash values together defeats the main goals of the perceptual hash (or any perceptual hash?) to have similar images give similar values. I'm about a third of the way done with turning a picture of samwise into a bowl of frootloops with the identical perceptual hash.

So one attack vector for a Perp of Mass Deception (PMD) would be to create actual csam then create an identical hashing image that looks like something totally different and spread that second image widely to create an infinite stream of false positives. Basically if this is done for each csam image produced it renders this detection method totally moot.

And... looks like it was done #1 (comment) , #1 (comment) ,#1 (comment)

Actually according to that second linked post above, a typical computer could produce dozens of random matching images per hour. So you could easily create a dozen false positive images for every csam image you produce, making this detection method by apple totally useless and will cost apple tons of money for hiring people looking for matches.

Another attack vector is the PMD modulates thier CSAM images multiple times so each csam image has multiple hashes. Another way to throw a wrench in this detection method making everyones jobs much harder. Each image could have infinite perturbations making the database balloon to an impossibly large size from just perturbations of the same CSAM image. Also the hashes stored on the apple device are much larger than the actual neural hashes and so would take up infinite space on each device.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 20, 2021

This is not only a collision, it is a pre-image, which breaks the algorithm even more.
Collison:
Find two random images with the same hash.
Pre-image:
Find an image with the same hash as a known, given image.

I don't see how that differs at all. Basically to match two random image hashes you have to have both as known given images before hashing them.

Because this 'hash' function is entirely differentiable you can easily modify existing images to match an arbitrary hash without ever having seen the other image that the arbitrary hash matches yourself. There may not even be another image that exists.

Collisions modify both images (I gave an example with the lena/girl pair above), so they're fundamentally easier to compute. Not only do you have to have access to both images you have to modify them. If only collisions were possible than I couldn't make false matches against their CSAM database except by distributing images that get included with the database. Since preimages are possible, I can construct matches if someone gives me a list of hashes which are likely to be in it.

So your disruptive attacker that distributes images all over the place doesn't need to create or distribute abuse images, though the attack you describe would also work.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting. You place these hashes in your child abuse image database and also forward on the abuse images to other governments and NGOs and they will enter them into their databases with plausible deniability (or not even knowing at all), mooting any requirement that multiple reporting groups must have the hashes in question.

This attack can still work even when there is a second non-public perceptual hash, assuming that it's also of a flawed design like Apple's published hash function: as a state actor you have access to this second hash function (as you needed to generate the hashes, as parties are not sending child porn to apple). So you can generate images which match with both functions.

Loading

@GiverofMemory
Copy link

@GiverofMemory GiverofMemory commented Aug 21, 2021

This is not only a collision, it is a pre-image, which breaks the algorithm even more.
Collison:
Find two random images with the same hash.
Pre-image:
Find an image with the same hash as a known, given image.

I don't see how that differs at all. Basically to match two random image hashes you have to have both as known given images before hashing them.

Because this 'hash' function is entirely differentiable you can easily modify existing images to match an arbitrary hash without ever having seen the other image that the arbitrary hash matches yourself. There may not even be another image that exists.

Collisions modify both images (I gave an example with the lena/girl pair above), so they're fundamentally easier to compute. Not only do you have to have access to both images you have to modify them. If only collisions were possible than I couldn't make false matches against their CSAM database except by distributing images that get included with the database. Since preimages are possible, I can construct matches if someone gives me a list of hashes which are likely to be in it.

So your disruptive attacker that distributes images all over the place doesn't need to create or distribute abuse images, though the attack you describe would also work.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting. You place these hashes in your child abuse image database and also forward on the abuse images to other governments and NGOs and they will enter them into their databases with plausible deniability (or not even knowing at all), mooting any requirement that multiple reporting groups must have the hashes in question.

This attack can still work even when there is a second non-public perceptual hash, assuming that it's also of a flawed design like Apple's published hash function: as a state actor you have access to this second hash function (as you needed to generate the hashes, as parties are not sending child porn to apple). So you can generate images which match with both functions.

That doesn't change the fact that it is just as easy to "find" two images that match hashes as to "find" an image whose hash matches the hash of a known image. As people have shown above it takes just minutes to do either.

Perhaps a better version is this: Imagine you are an authoritarian regime that dislikes certain ideological factions or ethniticities. You identify images which may be possessed by participants in those groups. You then obtain child porn images (very easy as an authoritarian regime, you even command the creation of new images if it'll work better) and then modify the child porn images to have hashes matching the political images that you're targeting.

That is a very strong attack of a government on the people. If nothing other than to get people on a government database like america's version of a "suspected terrorist" list.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

That doesn't change the fact that it is just as easy to "find" two images that match hashes as to "find" an image whose hash matches the hash of a known image. As people have shown above it takes just minutes to do either.

It's easier to find two images that match than it is to find an image that hashes to a known hash, even in this case. But this is always the case: even for a black box arbitrary hash function finding a collision needs sqrt() the amount of work to find a preimage, even with a totally generic attack, due to the birthday paradox.

Both attacks (preimage and collision) are easy with this hash function because it has a fundamentally weak design. For stronger hash functions like, say md5, one can now compute collisions easily but a second preimage is still unachievable. The poster you were responding to was emphasizing the point that the attack was a preimage attack rather than a collision because we all expect collision attacks to be much easier, but they're less useful for the attacker because the attacker has to influence both inputs, rather than just one.

Loading

@GiverofMemory
Copy link

@GiverofMemory GiverofMemory commented Aug 21, 2021

It's easier to find two images that match than it is to find an image that hashes to a known hash, even in this case.

I just don't see it because one hash always has to be held constant when comparing it to another. You can vary both, or you can vary one twice as many times.

It depends on the type of attack, and when it is brute force - just varying inputs and comparing outputs, you can vary one or both for the same cost.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

I just don't see it because one hash always has to be held constant when comparing it to another. You can vary both, or you can vary one twice as many times.

It's a surprising fact. The reason that in a generic attack a collision is fundamentally easier is because the number of things you match against increases.

Imagine that you want to find some x such that H(x) matches either H(y) or H(z) . Clearly that's easier than finding something that matches H(y) alone because you get two chances instead of one: it takes half as many comparisons on average.

So when you want to find x,y such that H(x)==H(y) you can alternate between varying x and varying y, and each time you compare against all the previously computed values, not just the most recent. Each operation the chance of success increases because you're comparing with more and more alternatives. This takes a sqrt() of the amount of operations as only changing one input. This is due to the same mathematical fact that makes it so a room with a surprisingly few number of people is likely to have at least two people sharing a birthday: When there are 23 people the odds of two sharing the same birthday is >50% even though there are 365 days in a year.

There are fancy algorithms that eliminate the storage and search costs for comparing all the prior hashes, by using a little more computation. A good place to read about memory efficient collision algorithms is the wikipedia article on cycle detection-- because if you choose the next value to try based on the hash of the prior value, finding a collision is equivalent to finding a cycle in the graph created by feeding the output of the hash back into itself.

That applies to generic attacks that work even without any knowledge of the internal structure. With the apple hash function the hash is differentiable, so we're not limited to generic attacks. In this case, it's easier to find a collision because we're actually not limited to updating one at a time. Instead, we can construct a function that computes the difference between two images hashes and use autodiff to find the partial derivatives of the difference output with respect to all of the pixels at once and modify both images at the same step. The extra dimensions gives the attack a lot more degrees of freedom.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

Here are some more good looking examples: (prior examples here)

Altering Lena to match barbara:

image
image

$ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png
a426dae78cc63799d01adc32
$ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png
a426dae78cc63799d01adc32

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 21, 2021

Hey @gmaxwell

What happens if you resize the image down to say 1/4 then upsample back to the original size, then run the hash again? Do they still collide?

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

What happens if you resize the image down to say 1/4 then upsample back to the original size, then run the hash again? Do they still collide?

They're still fairly close, e.g. using the lena/barbara above:

$ convert -scale 90x90 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png x.png ; convert -scale 360x360 x.png 1.png
$ convert -scale 90x90 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png x.png ; convert -scale 360x360 x.png 2.png
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 1.png
a426dae78cc23399501acc32
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 2.png
a42692e78ce22b99d11ace32

I suspect the property could be met by making the search work that way. E.g. stick an upsampler on the computation graph for the model so that it works with a 90x90 image, and do the search there. Then upsample it and use it as a starting point for the full res search (or jointly test the full res and down/up path).

Is there some particular reason that this property is important? I could imagine that doing the search in a multiscale way might result in better looking images... but is there something about the apple setup that works this way?

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

I won't be posting any more preimages for the moment. I've come to learn that Apple has begun responding to this issue by telling journalists that they will deploy a different version of the hash function.

Given Apple's consistent dishonest conduct on the subject I'm concerned that they'll simply add the examples here to their training set to make sure they fix those, without resolving the fundamental weaknesses of the approach, or that they'll use improvements in the hashing function to obscure the gross recklessness of their whole proposal. I don't want to be complicit in improving a system with such a potential for human rights abuses.

I'd like to encourage people to read some of my posts on the Apple proposal to scan user's data which were made prior to the hash function being available. I'm doubtful they'll meaningfully fix the hash function-- this entire approach is flawed-- but even if they do, it hardly improves the ethics of the system at all. In my view the gross vulnerability of the hash function is mostly relevant because it speaks to a pattern of incompetence and a failure to adequately consider attacks and their consequences.

And this post written after:

Loading

@dxoigmn
Copy link
Author

@dxoigmn dxoigmn commented Aug 21, 2021

Given Apple's consistent dishonest conduct on the subject I'm concerned that they'll simply add the examples here to their training set to make sure they fix those, without resolving the fundamental weaknesses of the approach, or that they'll use improvements in the hashing function to obscure the gross recklessness of their whole proposal. I don't want to be complicit in improving a system with such a potential for human rights abuses.

FWIW, the best defense we have against these kind of attacks is adversarial training. Adversarial training is exactly what you describe: construct evasions/collisions, train on them, and repeat. My initial interest in all this was to see whether Apple employed such a training method. It's clear they did not, but I see no reason they couldn't do this for their next version of this model.

Loading

@SkyVelleity
Copy link

@SkyVelleity SkyVelleity commented Aug 21, 2021

Assuming Apple doesn't renege before the feature is released, I suspect the best way to compel them to do so is to blow it so widely open as soon after launch as possible that they have no choice but to retract the system.
Until then, perhaps waiting for their final implementation whilst rallying against it in other ways is the best option.

Still, I haven't seen an answer to what I think may be a very significant question:
If the whole point of on-device scanning is to enable E2E iCloud encryption, if Apple's announced they're gonna run a second hash on their own servers, doesn't that entirely invalidate any claimed legitimacy on-device scanning ever had?
If Apple has already backpedaled to server-side scanning (originally stating "The system is significantly more privacy-preserving than cloud-based scanning"), why not junk on-device scanning entirely?
If my understanding of their change of heart is correct, this could be a very strong argument against the system.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 21, 2021

@dxoigmn

FWIW, the best defense we have against these kind of attacks is adversarial training

I'm not sure I agree. One can build a system with fairly strong guarantees against false positives at the expense of false negatives: normalize then downsample and quantize the pixels to the lowest resolution that still leaves images reliably recognizable. Use sha256 on the result. This will give a guaranteed protection against adversarial false positives (assuming the security of sha256). It also won't catch as many fuzzed/recompressed images, though it will catch some. No perceptual hash will strongly protect against evasion by parties willing to modify their images, in any case.

The correct false positive vs false negative tradeoff that is best depends on how much you value users' privacy vs how many images you're willing to miss. I don't think apple should be deploying this system at all, so obviously I also don't think exposing users to false positives to avoid a few false negatives is a good tradeoff. :) I view the use of a complex perceptual hash as a product of decisions being made by people engaging in a one-sided risk assessment that underweight privacy harms and overweight missed images.

The simple cryptographic hash based approach I mention above also has the advantage of being extremely simple to build (or disadvantage if you consider justifying headcount an advantage). If someone really must justify headcount, the simple approach could be improved further e.g. a neural network post processing "error correction" filter that helps quantize pixels consistently, without removing a strong cryptographic protection against adversarial false positive. (e.g. a guarantee that any positive match will have all pixels +/-1 from the true image in a normalized downsample, unless sha256 is broken).

but I see no reason they couldn't do this for their next version of this model.

I expect that extensive adversarial training would make it harder, but still leave construction of preimages possible. In the domain of cryptography it's long been recognized that systems which appear to be secure because their creators attacked them and failed are often not secure at all. Competently constructed cryptosystems are built in ways where security is provable outright or tracable to a small number of well understood assumptions.

Above I describe a scheme for downsampling with "error-corrected" rounding that could give security against adversarial false positives traceable to the security of sha256 against second preimages. It would be less fuzzy, but if you are not valuing avoiding false-negatives over all other considerations, less fuzzy should be an option. :)

Even better could be done if they didn't insist on hiding the database.

@SkyVelleity

if Apple's announced they're gonna run a second hash on their own servers, doesn't that entirely invalidate any claimed legitimacy on-device scanning ever had

The idea is that this second 'secret' hash will be only applied to images that match the first. So arguably there is still some value in having the first to keep user's data hidden from apple absent false positives on the first function. I find their counter unconvincing because: Making it secret protects it much more against review than it does against attackers-- attackers may steal the function (via insider threats) or just be handed it as is the case for the state actors who produce these databases (and many of the concerns here are that state actors may be the attackers-- even if you trust your state no one is safe from a potentially unfriendly regime change in the long run). If the second function has similar structure, it will likely have similar vulnerabilities in any case.

Loading

@memiux
Copy link

@memiux memiux commented Aug 22, 2021

Lotte no Omocha!
@AsuharietYgvar Your profile pic led me to the most disturbed plot ever 👀🙈

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 22, 2021

Is there some particular reason that this property is important? I could imagine that doing the search in a multiscale way might result in better looking images... but is there something about the apple setup that works this way?

I was wondering whether true positives would still match, yet false positives would not match.

As in, if there's some micro property of the false image that causes the hash to match that can be destroyed by down/upsampling. The theory being that true positives have an image structure that matches much more closely at varying resolutions.

I don't know much about how the adversarial images are arrived at, but I just wondered if this was a property that wouldn't hold for false matches but would for true ones.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 22, 2021

I was wondering whether true positives would still match, yet false positives would not match.

Because of the way mine are constructed I wouldn't be shocked if they survived up and down sampling better that naturally constructed images: I penalize heavily towards making the changes in lower frequency components rather than higher-- probably much more than I should for the best image quality but it was pretty much the first thing I thought of. :)

I'm pretty unfamiliar with the machine learning tools I'm sure someone who knows what they're doing can do a lot better than me if they give it a good try. :)

Loading

@anishathalye
Copy link

@anishathalye anishathalye commented Aug 22, 2021

@tomhicks There is a general technique you can use if you intend for an adversarial perturbation to be robust to a particular set of transformations (e.g. down/upsampling). For many types of transformations, the adversarial examples by default won't be robust to that type of transformation (e.g. rotation), but if you know what transformation you want the example to be robust to a priori, you can make it so. At a high level, this can be done by incorporating the transformation into the training process. Suppose you have some set of transformations T you want the example to be robust to, then the optimization problem becomes that you want to find an example x such that hash(t(x)) = target for t ∈ T. I haven't tried this for NeuralHash yet, but here is a demonstration of adversarial examples (against an image classifier) that are robust to rotation.

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 22, 2021

@anishathalye interesting.

That raises two more questions, if you don't mind!

  1. Is there any difference between generating adversarial examples for misclassification as opposed to hash collisions?

  2. Is it generally the case that true matches will be more robust to a wider range of transformations than adversarial matches?

Loading

@anishathalye
Copy link

@anishathalye anishathalye commented Aug 22, 2021

  1. Is there any difference between generating adversarial examples for misclassification as opposed to hash collisions?

It is the same concept at a high level: find an input to a neural network that produces a desired output (and perhaps satisfies some extra properties, such as "looks similar to a particular image" or "doesn't look like glitch art"). There has been a lot of research in the area of adversarial examples (thousands of research papers), and researchers have demonstrated adversarial examples on all sorts of neural networks, like image classification, object detection, semantic segmentation, speech-to-text, and many others (including weird things like style transfer that may not make sense to "attack" from a security perspective).

Informally, I did find it harder to attack the neural hash algorithm compared to standard image classifiers like those trained on ImageNet. It took some amount of parameter tuning and such to find collisions for NeuralHash fairly reliably, and even then, I don't think the results look fantastic. Whereas for standard image classifiers, you can basically do whatever attack (like a single step of gradient descent) and they fall over right away; and it doesn't even take much work to get adversarial examples with imperceptibly small perturbations.

Here's one informal / intuitive explanation for why this may be the case: for NeuralHash collisions, we're trying to "encode" 96 bits of information into an image, whereas for attacking an ImageNet classifier, we're only trying to encode log2(1000) ≈ 10 bits of information. (Though of course, this is not a formal proof, nor is it even a fully satisfying intuitive explanation. The semantic segmentation or style transfer adversarial examples, for example, encode more than 96 bits of information into the image.)

Apple might have also done adversarial training (a standard technique to make networks more robust to adversarial examples); I don't remember off the top of my head if there is research on identifying whether a given trained model has been trained using such techniques (and afaik Apple hasn't published details on how NeuralHash was trained).

  1. Is it generally the case that true matches will be more robust to a wider range of transformations than adversarial matches?

In my experience, yes, this is generally true. Unless the adversarial examples have been made robust to the particular kind of transformation being considered. With the example above, of the adversarial example robust to rotation, I wouldn't be surprised if the adversarial example is more robust to rotation than the original image.

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 22, 2021

Thanks for that.

As far as I know, Apple's only details on how NeuralHash was trained are broadly: shown similar images and told to match, shown dissimilar images and told to not match. You would hope there's some adversarial training in there.

Although wouldn't that just push the problem to a different type of adversarial images?

With the example above, of the adversarial example robust to rotation, I wouldn't be surprised if the adversarial example is more robust to rotation than the original image.

Interesting. That would be interesting to see.

Loading

@KuromeSan
Copy link

@KuromeSan KuromeSan commented Aug 23, 2021

dont forget the UK Website Blocklist which was created to "stop child porn" and now is exclusively used to block sites like ThePirateBay 1337x and such
https://en.wikipedia.org/wiki/List_of_websites_blocked_in_the_United_Kingdom
https://en.wikipedia.org/wiki/Web_blocking_in_the_United_Kingdom#History

Why isnt DNSSec standard already?

Loading

@dxoigmn
Copy link
Author

@dxoigmn dxoigmn commented Aug 23, 2021

@gmaxwell

I expect that extensive adversarial training would make it harder, but still leave construction of preimages possible. In the domain of cryptography it's long been recognized that systems which appear to be secure because their creators attacked them and failed are often not secure at all. Competently constructed cryptosystems are built in ways where security is provable outright or tracable to a small number of well understood assumptions.

The point of collision resistance is to make it computationally harder, not impossible. I wish we had a proof of one-way functions, but we don't. The collision resistance of SHA256 (and friends) rely upon a similar methodology as this model: just a small set of researchers trying to understand the underlying algorithm and find more efficient methods of finding collisions.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 23, 2021

By possible I meant practically possible, though thanks for clarifying for those for whom it wasn't clear.

For cryptographic hashes we have decades of experience, powerful tools, mountains of research. Moreover, for a neutral network to be trainable it must be significantly piecewise differentiable and locally linearly approximatable-- and avoiding that kind of behavior is an explicit requirement in the design of transformations in symmetric crypto. Much of the analysis and review in symmetric cryptography goes into proving how free the function is from those behaviors. :) So I don't think it's at all reasonable to draw a parallel here: Using a neural network as a hash is more like the state of symmetric crypto 40 years ago, but with an added problem that it must have properties that we know make security hard.

Loading

@jankais3r
Copy link

@jankais3r jankais3r commented Aug 24, 2021

Apple claims that they have a second hashing algorithm running server-side to prevent false positives. That second algorithm could very well be PhotoDNA. Who's up for the challenge to find an image pair where both NeuralHash and PhotoDNA collide?

You can grab PhotoDNA here: https://github.com/jankais3r/jPhotoDNA

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 24, 2021

Using a neural network as a hash is more like the state of symmetric crypto 40 years ago, but with an added problem that it must have properties that we know make security hard.

No-one is using these hashes to secure communications between two parties, though which I think is an important distinction. This doesn't make the system "insecure" in any way.

By generating a collision, all you enable is for Apple to look at the photo that you supplied.

This makes the system marginally less useful to Apple for avoiding server work.

Loading

@jankais3r
Copy link

@jankais3r jankais3r commented Aug 24, 2021

By generating a collision, all you enable is for Apple to look at the photo that you supplied.

This makes the system marginally less useful to Apple for avoiding server work.

Except that instead of Apple employees looking at the flagged photos, totalitarian governments could easily force themselves into this manual review process. Then all they need to do is to disseminate pro-democracy images that collide with actual CSAM images in Apple's database and the system will flag dissidents for them.

Loading

@tomhicks
Copy link

@tomhicks tomhicks commented Aug 24, 2021

Why not just target the server? You're gonna need to do that anyway so you might as well just do that.

Loading

@gmaxwell
Copy link

@gmaxwell gmaxwell commented Aug 24, 2021

@tomhicks The system that apple proposes leaks the cryptographic keys to decrypt photos conditional on the user being in possession of a threshold of neuralhash matching images.

In the attack @jankais3r describes an authoritarian regime compromises apple through hacks, leaks, moles, bribes, or a National Security letter. They then either distribute to the public pro-freedom images modified to match the neuralhashes of known child porn OR they take child porn and modify it to match circulated pro-freedom messages and submit the child porn to the agencies that build the lookup databases. Your computing device processes your private files and leaks to apple the decryption keys, demonstrating your possession of the targeted political speech and allowing the attacker to begin more detailed tracking --- or just send out a group to execute you.

An analogous attack could also be performed without attacking the neuralhash: just put the ideologically identifying images in the database directly. Because the comparison is performed using a private set intersection, no one can directly tell what the database contains. However, Apple has made unfalsifiable claims of various self-enforced adhoc protection mechanisms which should make this attack harder: E.g. they claim their database is an intersection of databases from multiple governments, so absent a neuralhash attack the governments would need to collude to include non-child-abuse images, similar to how 'five eyes' countries spy on each others citizens to evade the hard legal prohibitions against spying on their own. Targeted false neuralhash matches undermine those protections.

Moreover, the potential for adversarial collisions means that if the plain-text of the database is leaked or otherwise made public, or just shared with third party reviewers, any inappropriately included material is potentially deniable due to the potential of making false matches.

This is particularly unfortunate, because there are simple and obvious schemes which, assuming the database has no false entries in it, make false positives computationally intractable with extremely high confidence and would eliminate the potential for the above attacks in a way which is independently verifiable. They have the disadvantage of having more false negatives.

Since detection here requires the target running software that betrays their self-interest and especially since it's extremely easy to change images so their neuralhashes don't match any such system is already only going to be effective against people who are not trying to evade it. Yet at no point has Apple attempted to justify the use of a function which has a meaningful amount of false positives over ones that a FP free for practical purposes. But why would they without people showing that false positives are a potential issue through attacks on the hash function like the ones people have presented in this issue?

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet