Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upThe kappa value is used in a way that's effectively correct, but is a theoretical atrocity #5
Comments
BenH11235
added
the
high-level
label
Dec 24, 2018
BenH11235
added
the
wontfix
label
Jan 7, 2019
This comment has been minimized.
This comment has been minimized.
|
So, with a stride in my step and a song in my heart, I went to fix this and make the key-guessing process theoretically beautiful. Just to make sure that this will work, I took some sample 95% confidence intervals for the kappa value when guessing the correct key length: Sample text from full break test: ~0.06 to ~0.065 Which is just as the theory predicts. So all we have to do is compare this against the actual kappa value of the plaintext distribution we're using (the Shakespeare distribution), which is... about 0.081. =\ Yeah. It turns out that aiming for the exact kappa value of the plaintext distribution is an excellent theoretical idea, but in the real world, you don't HAVE that value, or the distribution. You have a vague guess. We could switch the Shakespeare distribution for some other, "more real-world representing" distribution with a kappa value of ~0.06, but what if we then run across some ciphertext derived from a subtly different plaintext distribution, which has a kappa value of let's say 0.04? If we aim for the highest value, we succeed, because partitioning with the wrong key length strictly decreases the kappa value. If we aim for the theoretically correct value, we fail, because we don't have the correct value, and we never had a chance to obtain it in the first place. So there are strong theoretical reasons to do the theoretically incorrect thing. In conclusion, aaaaaarrrgggggh |
BenH11235 commentedDec 24, 2018
•
edited
(See reddit comment by /u/hiles)
We want to aim for the decryption to approximate the assumed distribution of the plaintext. What we actually do when choosing the key length is hunting for the largest kappa value, which means that implicitly, we are aiming for the decryption to approximate a degenerate distribution (100% chance of some outcome X) instead.
The result is the same, since while a wrong key length can produce something closer to a degenerate distribution than the plaintext distribution, the probability for this to actually happen with typical problem parameters is less than the probability that I have an aneurysm and die before I manage to hit "submit". But admittedly, that's no excuse.