Bram's steganographic framework
Latest commit de17a2e Apr 23, 2014 @bramcohen Merge pull request #6 from ldo/ldo


Bram's steganographic framework

DissidentX is a censorship resistance tool.

It has the capability of steganographically encoding messages in files. Special features include:

  • Messages cannot be decoded without the key

  • A single decoder for all file types and encoding techniques, including all future ones

  • Format-specific encoders can be easily written without having to worry about information theoretic encoding or cryptography

  • Support for multiple messages to multiple keys in a single file

Primary Use

The primary use case for DissidentX is encoding messages in files on the web.

There should be a utility which scans all objects the user's web browser downloads (html files, images, css files, etc.) for messages using all of the keys the user has entered. Someone sending messages to that person provides a web service where users who have widely viewed web sites can upload their files and get back slightly modified version with messages steganographically added.

The web users should not be able to read what the messages are, and it should be possible for the service doing the encoding to not have to keep messages in plaintext.

Because encoding rates are so low, a number of the parameters to the encoding and decoding libraries have been lowered to not be appropriate for all use cases. They should be evaluated in the context of this one. The same technology should alse be used for easter egg hunts, because that's fun and provides cover traffic.

Usage guide:

Uses Python3, PyCrypto, and sha3

As examples, the command line tools line_ending_encoder and universal_decoder are included. line_ending_encoder is based on adding trailing spaces to the end of lines in a text file.

Use line_ending_encoder like this:

python3 myfile.txt key1 payload1 key2 payload2

That will modify myfile.txt, hiding payload1 to the key key1 and payload2 to the key key2.

Any number of key/payload pairs are allowed, although any given file can only support a certain total length of payloads.

The keys are assumed to be in unicode, which is correct. The payloads are also assumed to be in unicode, which is a hack to make the output pretty, and not completely general.

After you encode data with line_ending_encoder you can get it back out like this:

python3 myfile.txt key1

which will print out payload1. Likewise for payload2 and key2.

Note that line_ending_encoder only gets one bit per line, with overhead of seven bytes, and that encoding the same section of text repeatedly in a text file doesn't get extra bits.

Encoder writing guide:

The prepare_message() function takes a key and plaintext, both byte strings, and returns another key and ciphertext to be used later. This is done as a separate step to enable the use case where messages to be encoded are stored on a server already encrypted.

The pack_and_encode_messages() function takes an array of results from prepare_message() and a processed file for the messages to be stored in. The processed file is an array consisting alternately of fixed binary strings and arrays of length two giving alternate possible values for that position.

Alternates can be anything semantically valid for the file format being used. For example in human readable text files eliminating unnecessary commas in text, or alternate spellings for words, or alternative word orders can all be used. Multiple methods of generating alternates can be used in the same file.

Simple implementations are in and, both designed to work on common computer language files.

More detail on the math involved is in Explanation.txt


Q. Can someone modify the message stored in a file?

A. No. Changing even a single byte of the file will completely obliterate any message which was stored.

Q. Why did you use Python3 as a reference language?

A. Because not having distinct binary and unicode string types is barbaric.

Q. Can I get a copy of this for another language?

A. If somebody writes it. This code is being released as a reference in the hopes that other people will pick it up and run with it.

Q. Why are you doing row reduction manually in Python instead of using numpy?

A. Because I don't know how. Feel free to implement improvements.

Q. Can someone detect that a file has messages encoded in it?

A. That depends on the encoding used and the properties of the file the data is being encoded in. There's a whole field of academic literature on steganography, none of which is invalidated by this code. What this code does is vastly simplify the implementation of new steganographic techniques, and allow a universal decoder and encoding of multiple messages to different keys in the same file.

Q. How much data can be encoded in a file?

A. That's entirely dependant on the file type and specific encoding, but if you insist on a made up number, let's say a ratio of around 500:1, and the encoded message has overhead of about 7 bytes.

Q. Why can't it be given more than two alternates for one position to encode more information?

A. Because of math. See Explanation.txt for a bit more detail.

Q. Your code is horribly inefficient and can be optimized in all kinds of ways.

A. That's why it's called 'reference' code.

Q. It would be possible to pack in data more densely if alternates are required to always be the same length, or variable bytes are allowed to be set to arbitrary values.

A. Yes, but those put severe restrictions on what can be done in an encoder, and hence are less likely to be useful in practice.

Q. Why don't you use public key encryption?

A. Because bits are precious enough for that to be unwieldy, and it would disallow use of arbitrary human readable strings as keys. The symmetry is best viewed as a feature: because the value of a key is severely diminished if it's widely known, there's a reason to hoard them, which is the desired behavior.