Skip to content
AndersLindman edited this page Nov 24, 2014 · 8 revisions

Table of Contents

R30 Hash function

R30 is a hash function that uses the Rule 30 cellular automaton to generate 256-bit hash values. A hash function converts a message into a fixed size code, also called digest or hash value.

Cryptographic properties

Rule 30 is a one-dimensional binary cellular automaton rule introduced by Stephen Wolfram in 1983 that has aperiodic, chaotic behavior. This property is used in R30 to produce hash values with a pseudorandom distribution. The hash values are calculated in a way that makes them hard to invert. Stephen Wolfram writes:

"So can such rules be used for cryptography? I strongly suspect that they can, and that in fact they allow one to construct systems that are at least as secure to cryptanalysis as any that are known. ... The picture below shows evidence for this. The cells marked by dots have colors that are taken as given, and then the colors of other cells are filled in according to the average that is obtained by starting from all possible initial conditions." -- Stephen Wolfram, A New Kind of Science, pp. 603, 605.

Disclaimer: R30 is a new hash function. The cryptographic claims here are assumptions that may turn out to be false.

R30 is designed to be a close approximation to a random oracle and resistant to at least the following kinds of attacks:

  * Preimage attacks
  * Second-preimage attacks
  * Collision attacks
  * Chosen-prefix collision attacks
  * Length extension attacks
  * Brute-force attacks
  * Distinguishing attacks
  * Cryptanalysis

Random oracle

A random oracle is an oracle (a theoretical black box) that responds to every unique message with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that message is submitted. The chaotic properties of the Rule 30 cellular automaton make the hash values in R30 distributed in a highly random way regardless of message.

Preimage attacks

If an attacker knows only the hash value and tries to find a message that produces that hash value it's called a preimage attack. For R30 it's assumed that for essentially all hash values for which the message is unknown, it is computationally infeasible to find any message which hashes to that value.

Second-preimage attacks

In a second-preimage attack attempts are made to find another message that produces the same hash value as the original known message. With R30 being a close approximation to a random oracle it is computationally infeasible to find any second message which has the same hash value as any specified message.

Collision attacks

Collision attacks are attempts to find two different messages m1 and m2 such that hash(m1) = hash(m2). The pseudorandom distribution of the R30 hash values makes it computationally infeasible to find two or more messages that generate the same hash value.

Chosen-prefix collision attacks

A chosen-prefix collision attack is when attempts are made given two different prefixes p1, p2 to find two appendages m1 and m2 such that hash(p1 ∥ m1) = hash(p2 ∥ m2) (where ∥ is the concatenation operation). R30 is resistant to any modifications of the message.

Length extension attacks

Length extension attacks are a type of attack on certain types of hashes which allow inclusion of extra information. For R30 adding extra information to the message will produce vastly different hash values for all but those collisions that exist according to the pigeonhole principle.

Brute-force attacks

A brute-force attack, or exhaustive key search, is a cryptanalytic attack that can, in theory, be used against any encrypted data. The 256-bit hash values generated by R30 make the search space for brute-force attacks huge.

Distinguishing attacks

A distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. R30 generates highly random hash values that unlikely will have any resemblance to the messages.

Cryptanalysis

Cryptanalysis is the study of analyzing information systems in order to study the hidden aspects of the systems. Cryptanalysis can be used to breach cryptographic hash functions. Stephen Wolfram has pointed out that taking fewer cells in the center column of the Rule 30 cellular automaton will likely make cryptanalysis difficult. In R30 the bit value of every other cell in the center column is used for the hash value.

Algorithm

In R30, the message is used as an initial condition for the Rule 30 cellular automaton. The cellular automaton is then iterated for a number of generations proportional to the length of the message. This step is necessary in order to achieve high randomness for the hash values. The step after that is to extract a 256-bit hash value from the center column by further expanding the cellular automaton.

Rule 30 cellular automaton

Rule 30 is an an infinite one-dimensional array of cellular automaton cells with only two states is considered, with each cell in some initial state. For each iteration every cell changes state based on its current state and the state of its two neighbors.

current pattern 111 110 101 100 011 010 001 000
new state for center cell 0 0 0 1 1 1 1 0

Pseudorandom generation

Rule 30 generates seeming randomness even when the input is non-random. Stephen Wolfram proposed using its center column as a pseudorandom number generator and it passes many standard tests for randomness. Wolfram uses this rule in the Mathematica product for creating random integers.

Generating hash value

The standard Rule 30 version starts with a single cell set to 1. In R30 the initial condition is the bits for the message, i.e. the initial condition can contain an arbitrary number of bits instead of starting with a single cell.

To produce random values for the hash, the bit value of every other cell in the center column is used. If the hash values would be extracted directly from the start of the cellular automaton, then the hash values would be far from random for similar messages. Therefore several rows are calculated before the bits for the hash value are extracted.

The image below shows how rows are calculated before the 256-bit hash value is taken from further expansion of the cellular automaton. The rightmost bit of the initial condition of length N needs to influence the center column. And for the worst case scenario with a message 01010101... it requires 7N rows to reach the center column. The rightmost bit needs to be 1 in order to guarantee pseudorandomness in the hash value. In R30 this bit is added as a byte 0x01 to the initial condition.



All bytes are big endian.

Performance

Rule 30 is considered to be computationally irreducible. This means that there are no shortcuts possible and the cell values have to be calculated generation by generation. The core algorithm has a limiting behavior of O(n2) and this makes it computationally intensive for large messages. Therefore a Merkle–Damgård construction is used for dividing the message into 256-bit blocks and the hash value is calculated via the Davies–Meyer single-block-length compression function.

Messages such as containing all zeros would by themselves produce the same initial condition for different message lengths. To prevent this, the last block contains the unsigned 64 bit integer value of the total message length in number of bytes stored in the last bytes of the block like this:

bits 56-63 bits 48-55 bits 40-47 bits 32-39 bits 24-31 bits 16-23 bits 8-15 bits 0-7

Message limit

The largest message supported by R30 is 264-1 bytes.

Licensing

R30 is copyright and license free.

Examples and demo

Examples of R30 hashes:

Message R30 hash value
hello world e2c78dcfb6c51ca3e4288211f3aa352d4cc3547dbf08812df8f264df1f4918c0
hello, world b91956bc1e5b1937b48c364b199ca70879c32cc22da67e3ff8411aea894c3d9f

Live JavaScript demo of R30

Faster JavaScript implementation plus UTF-8 support