New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The kappa value is used in a way that's effectively correct, but is a theoretical atrocity #5

Closed
BenH11235 opened this Issue Dec 24, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@BenH11235
Copy link
Owner

BenH11235 commented Dec 24, 2018

(See reddit comment by /u/hiles)

We want to aim for the decryption to approximate the assumed distribution of the plaintext. What we actually do when choosing the key length is hunting for the largest kappa value, which means that implicitly, we are aiming for the decryption to approximate a degenerate distribution (100% chance of some outcome X) instead.

The result is the same, since while a wrong key length can produce something closer to a degenerate distribution than the plaintext distribution, the probability for this to actually happen with typical problem parameters is less than the probability that I have an aneurysm and die before I manage to hit "submit". But admittedly, that's no excuse.

@BenH11235 BenH11235 added the wontfix label Jan 7, 2019

@BenH11235

This comment has been minimized.

Copy link
Owner

BenH11235 commented Jan 7, 2019

So, with a stride in my step and a song in my heart, I went to fix this and make the key-guessing process theoretically beautiful. Just to make sure that this will work, I took some sample 95% confidence intervals for the kappa value when guessing the correct key length:

Sample text from full break test: ~0.06 to ~0.065
Bible example: ~0.0665 to ~0.0669

Which is just as the theory predicts. So all we have to do is compare this against the actual kappa value of the plaintext distribution we're using (the Shakespeare distribution), which is... about 0.081.

=\

Yeah. It turns out that aiming for the exact kappa value of the plaintext distribution is an excellent theoretical idea, but in the real world, you don't HAVE that value, or the distribution. You have a vague guess. We could switch the Shakespeare distribution for some other, "more real-world representing" distribution with a kappa value of ~0.06, but what if we then run across some ciphertext derived from a subtly different plaintext distribution, which has a kappa value of let's say 0.04? If we aim for the highest value, we succeed, because partitioning with the wrong key length strictly decreases the kappa value. If we aim for the theoretically correct value, we fail, because we don't have the correct value, and we never had a chance to obtain it in the first place. So there are strong theoretical reasons to do the theoretically incorrect thing.

In conclusion, aaaaaarrrgggggh

@BenH11235 BenH11235 closed this Jan 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment