Skip to content

Techrocket9/excel-hash-lambdas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Warning: While human guided and spot-checked for correctness, most of this repo was generated by Opus 4.7

Use with caution!

excel-hash-lambdas

Pure-formula cryptographic hash functions for Excel, implemented in LAMBDA. No VBA, no add-ins, no macros, no Office Scripts. Currently ships MD5 (one self-contained LAMBDA), SHA-256 (five), and SHA3-256 (seven). SHA-256 and SHA3-256 also depend on a shared ASCII_ input-validation helper. All installed the same way: open Name Manager, paste each formula, give it a name.

=MD5_("abc")     →  900150983cd24fb0d6963f7d28e17f72
=SHA256_("abc")  →  ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
=SHA3_("abc")    →  3a985da74fe225b2045c172d6bd390bd855f086e3e9d525b46bfe24511431532

Works in any Excel that supports LAMBDA / LET / REDUCE / HSTACK (Excel 365, Excel for the web, Excel 2024+). Does not work with Google Sheets.

1. At a glance

Algorithm Public name LAMBDAs Output
MD5 =MD5_(text) 1 32 hex chars
SHA-256 =SHA256_(text) 5 64 hex chars
SHA3-256 =SHA3_(text) 7 64 hex chars

None of these is appropriate for security-sensitive use. MD5 is broken; SHA-256 and SHA3-256 are fine cryptographically but a formula in a spreadsheet is the wrong place to put a security primitive. Use these for fingerprinting, deduplication, and change detection.

Input handling

All hash LAMBDAs accept ASCII text only (codepoints 0x00 through 0x7F). Inputs containing characters outside this range — accented letters, CJK characters, emoji, etc. — return Error: non-ASCII input detected instead of a hash. This is a deliberate restriction: Excel's CODE function returns platform-codepage bytes for non-ASCII characters, which would produce different (and incorrect) hashes on Mac vs. Windows. Rejecting non-ASCII at the boundary is safer than silently producing a wrong digest. UTF-8 support is feasible but would require a larger rewrite that exceeds the 2084-char defined-name cap; not currently planned.

2. Algorithms

  • md5/ — one LAMBDA, self-contained (fits the Name Manager 2084-char cap with the ASCII guard inlined, so no shared helper is required). Single-line formula, formatted version, test vectors. See the MD5 internals section below for the deep walkthrough and the original Excel quirks list.
  • sha256/ — five decomposed LAMBDAs (SHA256K_ / SHA256I_ / SHA256H_ / SHA256A_ helpers + SHA256_ main). Single-line formula for each, formatted version of the main, test vectors including the FIPS 180-2 two-block vector and the non-ASCII rejection cases.
  • sha3_256/ — seven decomposed LAMBDAs implementing Keccak-f[1600] (one per FIPS 202 sub-permutation, plus the round loop, the squeeze, and the public entry point). Test vectors covering the rate-block edge, the 0x86 collapsed-padding case, multi-block absorption, and the FIPS 202 example. Per-directory README covers algorithm parameters and the state representation.

3. Install — general procedure

Every LAMBDA in this repo installs the same way. Repeat for each (file, defined name) pair listed in the per-algorithm sections below.

  1. Open your workbook in Excel.
  2. Formulas → Name Manager → New (Ctrl+F3 on Windows, Cmd+F3 on Mac).
  3. Set Name to the defined name from the table (e.g. MD5_, ASCII_). Note the trailing underscore — see below.
  4. Set Scope to Workbook.
  5. Open the corresponding .lambda.txt file and copy the entire single line — most are long, make sure you grab all of it.
  6. Paste into the Refers to box. The leading = must be there.
  7. OK, then Close.

*.lambda.formatted.txt files (where present) contain the same formula with line breaks and indentation if you want to read before you paste.

3a. Install — shared utility (required for SHA-256 and SHA3-256)

The SHA-256 and SHA3-256 entries depend on a shared ASCII-detection helper. Install it once before either of those algorithms. MD5 does not need this — its ASCII guard is inlined directly into MD5_, so MD5 stands on its own.

File Defined name
ascii.lambda.txt ASCII_

3b. Install (MD5)

MD5 is fully self-contained: a single defined name, no shared helper required. The ASCII guard is inlined into the MD5_ body (which sits at 2067 characters, just under Excel's 2084-character defined-name cap).

File Defined name
md5/md5.lambda.txt MD5_

Why the trailing underscore? Excel's Name Manager rejects any defined name that looks like a cell address. MD5 is parsed as "column MD, row 5" and refused; SHA256 is parsed as "column SHA, row 256" and refused. Same [A-Z]{1,3}\d+ rule documented in the quirks section — it applies to workbook-level defined names, not just LET variables. Trailing-underscore is the conventional escape hatch (cell addresses can't contain underscores). Call sites become =MD5_("abc"), =SHA256_("abc"), and =SHA3_("abc").

3c. Install (SHA-256)

SHA-256 ships as four LAMBDAs because Excel caps each defined name's "Refers to" field at 2084 characters and the full SHA-256 formula is ~3200. The split is mechanical — three small helpers plus the main entry point — and the call site is unchanged: =SHA256_("abc") works the same as before.

Requires ASCII_. SHA-256 calls the shared ASCII_ helper to reject non-ASCII input. If you haven't installed it yet, do that first (§3a). Without it =SHA256_(...) returns #NAME?.

Install order matters: define the helpers first so the main LAMBDA can resolve them.

File Defined name
ascii.lambda.txt ASCII_ shared helper — skip if already installed
sha256/sha256k.lambda.txt SHA256K_
sha256/sha256i.lambda.txt SHA256I_
sha256/sha256h.lambda.txt SHA256H_
sha256/sha256a.lambda.txt SHA256A_
sha256/sha256.lambda.txt SHA256_ main entry point

Once all five SHA-256 names are defined (plus the shared ASCII_), =SHA256_(...) works anywhere in the workbook. All names end in _ for the same cell-address-collision reason as MD5_. SHA256A_ is the per-block final-add helper extracted to leave room for the ASCII guard wrap.

3d. Install (SHA3-256)

SHA3-256 ships as seven LAMBDAs because Keccak-f[1600] is significantly more complex than SHA-2's compression function and the full implementation is far too large for a single defined name. The split is mechanical — five permutation/helper components plus an output formatter and the main entry point — and the call site is unchanged: =SHA3_("abc").

Requires ASCII_. SHA3-256 calls the shared ASCII_ helper to reject non-ASCII input. If you haven't installed it yet, do that first (§3a). Without it =SHA3_(...) returns #NAME?.

Install order matters: define the dependencies first.

File Defined name
ascii.lambda.txt ASCII_ shared helper — skip if already installed
sha3_256/sha3k.lambda.txt SHA3K_
sha3_256/sha3t.lambda.txt SHA3T_
sha3_256/sha3rp.lambda.txt SHA3RP_
sha3_256/sha3ci.lambda.txt SHA3CI_
sha3_256/sha3f.lambda.txt SHA3F_
sha3_256/sha3h.lambda.txt SHA3H_
sha3_256/sha3.lambda.txt SHA3_ main entry point

See sha3_256/README.md for what each LAMBDA does and how the state is represented.

4. What works the same across all functions

  • Same Excel version requirements (LAMBDA / LET / REDUCE; SHA-256 also needs HSTACK; SHA3-256 also needs MAP / MAKEARRAY / DROP).
  • Same Name Manager install mechanism (paste formula text into Refers to). MD5 is one self-contained name. SHA-256 (five names) and SHA3-256 (seven names) additionally require the shared ASCII_ helper. See install sections above.
  • Same input handling: characters are read via CODE(MID(...)), which gives Excel's per-character codepoint (typically UTF-16 code units). UTF-8 multi-byte input is not correctly handled — any character outside the ASCII range will produce a hash that does not match md5sum / sha256sum / sha3-256sum on the UTF-8 encoding of the same string. Known limitation of all three formulas; encode upstream if you need byte-level interop.
  • Same performance shape: fine on individual cells, slow when filled down thousands of rows because Excel re-evaluates the whole LAMBDA per cell.

5. What's different about SHA-256

Same trick scaled up, plus one new pattern. Notable differences from the MD5 implementation:

  • Big-endian byte order throughout. Word assembly, length encoding in padding, and final hex output all run high-byte-first. MD5 is little-endian for the same operations.
  • 8-word state instead of 4. The array carried through REDUCE is 1×8 (a through h), built and updated with CHOOSE({1,2,3,4,5,6,7,8}, ...).
  • 64-word message schedule built with REDUCE + HSTACK. Each iteration appends one new word to a growing 1×N array. This is the key trick that makes pure-LAMBDA SHA-256 viable — without grown arrays you can't index back into prior schedule words from inside the loop.
  • K and H constants are embedded literally. They're derived from cube/square roots of small primes, not from a closed form like MD5's floor(2^32 * abs(sin(i))), so the formula carries them as {...} array literals. They're factored out into the SHA256K_ and SHA256I_ helper LAMBDAs (see install section) to keep the main body under Excel's 2084-character defined-name limit.
  • Slower than MD5. Longer schedule (64 vs 16 words after derivation), more state words to carry, and 64 rounds operating on more data per round.

5b. What's different about SHA3-256

A different algorithm family from the Merkle-Damgård SHA-2 line. The implementation introduces a few new patterns:

  • Sponge construction, not Merkle-Damgård. No length encoding in padding; absorption is XOR-into-state followed by a permutation, not a compression function. Padding is the multi-rate pad10*1 rule with the SHA-3 domain-separation suffix 0x06 — and a special case for when the suffix and the final 0x80 marker fall on the same byte (0x86).
  • 64-bit lanes simulated as pairs of 32-bit halves. Excel's BIT* family is 32-bit only. Every Keccak lane is carried as (lo, hi) and the 64-bit left-rotate is implemented by branching on rotation amount (< 32 vs >= 32) and assembling two 32-bit halves. See rotL inside SHA3RP_.
  • 1×50 state array. 25 lanes × 2 halves, threaded through the seven LAMBDAs in a fixed shape so each sub-permutation is a state → state function and the round loop is a clean REDUCE.
  • MAKEARRAY for full state construction. SHA-256's state grows by CHOOSE({1..8}, ...); SHA-3's 50-element state is built with MAKEARRAY(1, 50, LAMBDA(_, j, ...)) because the per-position computation depends on the index.
  • Round constants in their own helper. The 24 64-bit round constants are stored as 48 32-bit values in SHA3K_() — same constant-function pattern as SHA256K_ / SHA256I_.

6. Excel quirk addendum

The original MD5 quirks list (cell-reference-shaped names rejected in both LET and Name Manager, BITLSHIFT overflow, array state through REDUCE, INDEX row addressing) applies to all three algorithms. The newer ones below were discovered while writing SHA-256 and SHA-3.

From SHA-256

  • The cell-reference name trap claims new victims with two-letter prefixes. tt1 and tt2 look harmless — but TT is column 540, valid through row 1048576, so Excel rejects both. Anything that ends in digits is suspect, regardless of the letter prefix length. SHA-256 uses tone / ttwo instead of t1 / t2.
  • HSTACK inside REDUCE works for growing arrays. This wasn't needed for MD5 but is essential here. Each iteration HSTACKs one new word onto the accumulator, producing a final 1×64 array indexed by the round loop. The shape stays 1×N throughout, which keeps INDEX(arr, 1, n) access patterns consistent with the rest of the formula.
  • Big-endian length encoding matters. MD5 packs the message length as little-endian in the trailing 8 bytes; SHA-256 packs it big-endian. The padding lambda differs by exactly one expression: 256^(idx-plen+8) (MD5) vs 256^(plen-1-idx) (SHA-256). Easy thing to copy wrong, and the failure mode is silent — short inputs hash correctly, longer ones diverge.

From SHA3-256

  • The cell-reference name trap, again. hi2, lo2, k1, k2, b1, b2, b3 are all valid cell addresses (HI2, LO2, K1, K2, B1, B2, B3) and silently rejected by the parser. SHA-3's published bodies use hiP / loP / kAlpha / kBeta / bA / bB / bC / bD. Don't "simplify" them back.
  • MAKEARRAY is the right tool for fixed-size state where each cell is a different expression. SHA-256's CHOOSE({1..8}, ...) doesn't scale to 50 positions cleanly. MAKEARRAY(rows, cols, LAMBDA(r, c, ...)) does, and the underscore-named row argument (LAMBDA(_, j, ...)) is the conventional "ignored" placeholder when you only care about the column index.
  • 64-bit ops as 32-bit pairs work but the 32-boundary rotation is the trap. A 64-bit left-rotate by n collapses into three cases: n = 0 (identity), n = 32 (swap halves), and otherwise (mask + shift + OR across both halves). Forget the n = 32 case and you get garbage hashes for exactly 5 of the 25 Keccak lanes — the failure mode is non-obvious because the other 20 lanes still look right.

Name Manager's "Refers to" field is capped at 2084 characters

Excel limits each defined name's formula to 2084 characters. MD5 is ~1850 and squeaks under; SHA-256 is ~3200 and does not. Two ways around this:

  1. Decomposition (preferred). Split the LAMBDA into helpers that each fit. Constant-array helpers (lookup tables, init values) are particularly cheap to extract: a zero-argument LAMBDA returning a literal array works as a "constant function" callable from the main body — SHA256K_, SHA256I_, and SHA3K_ all use this idiom. The pattern scales with algorithm complexity: SHA-256 splits cleanly into 4 LAMBDAs (helpers + main); SHA3-256's 24-round Keccak-f[1600] permutation splits into 7 (one per FIPS 202 sub-permutation, plus the round loop, the squeeze, and the public entry point). See the SHA-256 and SHA3-256 install sections.
  2. Cell indirection (fallback). Paste the LAMBDA into a worksheet cell, then point a defined name at that cell (=Sheet!$A$1). Calls resolve through the cell. Works for any length, but pollutes the workbook with a host cell that displays #CALC!. Only use this if you can't or don't want to decompose.

CODE / MID returns platform-codepage bytes, not Unicode

Excel's CODE function on a non-ASCII character returns a byte from the active platform codepage — Mac Roman on macOS, Windows-1252 on Windows. The same character produces different bytes on the two platforms. For hash functions this is unacceptable (digests must be deterministic across platforms), so this repo rejects non-ASCII input rather than producing a platform-dependent (and therefore wrong) hash. SHA256_ and SHA3_ wrap their bodies with LET(asc, ASCII_(txt), IF(asc = "", <body>, asc)) so the error string short-circuits to the cell. MD5_ does the same but with the ASCII_ body inlined directly into the wrap — small enough to fit under the 2084-char cap, which keeps MD5 a one-name install.

7. Roadmap

  • SHA-1 — straightforward; same shape as SHA-256 with a smaller schedule and different round functions.
  • SHA-512 — needs 64-bit arithmetic, which Excel's BIT* family does not natively support. Doable by simulating 64-bit ops as pairs of 32-bit halves, but painful and slow. Probably not worth it unless someone asks.

PRs that shorten an existing formula without breaking its test vectors, or document additional Excel parser quirks, are also welcome.

8. License

MIT.


MD5 internals

Walkthrough of the MD5 LAMBDA, kept here so the per-algorithm directory stays small. For the algorithm itself see RFC 1321.

  • modBig / maskAll / addM / notM / rotL — 32-bit unsigned arithmetic primitives built on top of Excel's BITAND / BITOR / BITXOR / BITLSHIFT / BITRSHIFT. addM does mod-2³² addition, notM does bitwise NOT against a 32-bit mask, and rotL is left-rotate. rotL masks its input down to (32 - n) bits before shifting so the shifted result cannot overflow past 2³² (see Quirk 3 below).
  • mlen / plen / nblk — message length, padded length (rounded up to a multiple of 64 with room for an 8-byte length suffix), and number of 512-bit blocks.
  • padByte(idx) — returns the byte at index idx of the padded message: original byte if in range, then 0x80, then zeros, then the original message length in bits as a little-endian 64-bit integer in the final 8 bytes.
  • getWord(blkIdx, wordIdx) — assembles four bytes into a little-endian 32-bit word.
  • karr — the K constant table, computed as floor(2^32 * abs(sin(i))) for i = 1..64. Generated with SEQUENCE(64), so the whole table is one cell expression rather than 64 hard-coded constants.
  • shiftV(idx) — the per-round per-step rotation amount lookup.
  • hInit — the four IV words A, B, C, D, expressed as a 1×4 row array via CHOOSE({1,2,3,4}, ...).
  • finalH — the meat. An outer REDUCE walks each 512-bit block, accumulating a 1×4 state array. Inside it, a second REDUCE walks all 64 round operations, picking the right F/G/H/I function and message-word index per round, then permuting the state.
  • hexByte / wordHex — convert each 32-bit word to 8 hex characters in little-endian byte order, concatenate all four words, lowercase.

MD5 Excel quirks (the original five)

Quirk 1: Cell-reference-pattern names are rejected — both inside LET and in Name Manager

Excel rejects any name that matches [A-Z]{1,3}\d+ where the letter portion is a valid column ≤ XFD (16384) and the digit portion is a valid row ≤ 1048576. This rule applies to both LET / LAMBDA parameter names and to workbook-level defined names registered through Name Manager. Inside a formula the rejection is silent (generic "formula is invalid" error); in Name Manager the dialog gives a slightly more specific complaint about syntax but does not name the rule.

Rejected Why
M32 column M, row 32
ADD32 "ADD" is column 784, row 32
NOT32 "NOT" is column 9874, row 32
MOD32 "MOD" is column 9182, row 32
b1, b2, b3 columns B, rows 1–3
MAX32 "MAX" is column 8838, row 32
MD5 "MD" is column 342, row 5 — bites you when registering the LAMBDA
SHA256 "SHA" is column 12029, row 256 — same

Safe: 4+ letters before digits (MASK32, modBig), no digits at all (karr, padByte), or underscores to break the pattern (b_1, MD5_, SHA256_, SHA3_). Anyone publishing a hash, cipher, or codec LAMBDA — SHA1, RC4, AES1, B64, CRC32 — will hit this when they try to install it. Pick the trailing-underscore convention up front.

Quirk 2: Office.js names.add() cannot register LAMBDAs containing REDUCE

Registering programmatically via context.workbook.names.add("MD5", formulaText) appears to succeed, but every call returns #REF!. The same formula works when pasted into a cell as =LAMBDA(...)("abc") or entered through Name Manager. If you build tooling that auto-installs LAMBDAs, anything containing REDUCE over an array accumulator may silently break via the API.

Quirk 3: BITLSHIFT overflows past 2³²

BITLSHIFT(x, n) will produce values larger than 2³², which corrupts subsequent bitwise ops. Mask before shifting:

rotL(x, n) = ((x AND (2^(32-n) - 1)) << n) OR (x >> (32-n))   mod 2^32

Quirk 4: Carrying array state through REDUCE

REDUCE supports array accumulators — pass a 1×4 row array as the seed and return a 1×4 row array each iteration. You can't construct it with {a,b,c,d} literal syntax when the elements are formula expressions; use CHOOSE({1,2,3,4}, a, b, c, d).

Quirk 5: INDEX(arr, 1, n) for column extraction from a 1×N row

INDEX(state, 1, 1), INDEX(state, 1, 2), etc. The single-argument form INDEX(state, n) does not reliably work on a 1×N row array.

About

Cryptographic Hash Functions Implemented in Excel

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors