Veeam Log Anonymizer — Rust Edition

High-performance anonymization tool for Veeam Backup & Replication logs, rewritten in Rust for speed and portability.

Coverage aligned with Veeam KB2462 — Sensitive data types in Veeam Backup & Replication and Veeam Backup for Microsoft 365 log files.

⚠ Disclaimer

This is a community project. It is NOT an official Veeam product and comes with NO official Veeam support.

Use at your own risk.
Always review anonymized output before sharing it with third parties — no detection system is perfect, and false negatives (sensitive data that slipped through) are possible.
The --paranoid flag re-scans output for known entities as a safety net, but it does not guarantee zero leakage.
The dictionary file (-D) contains the full reverse mapping in cleartext. Never include it in a support bundle. Use --dict-output to write it to a separate directory.
The author and Veeam Software accept no responsibility for any data leakage, regulatory issue, or operational impact arising from use of this tool.

Author

Bertrand Castagnet — EMEA TAM at Veeam France

Reference work

This tool's detection scope follows the categories listed in KB2462. The current coverage map is summarized in the table below.

KB2462 coverage matrix (VBR)

KB2462 sensitive data type	v2.6 status
User names	✅ DOMAIN\user, .\user, --aggressive naked-user, --user-list
Object names (hosts, datastores, VMs, clusters)	✅ via `--object-list`
VM file names and paths	✅ backup files (.vbk/.vib/.vbm/.vrb) + file/directory names anonymized (v2.5); other objects via lists
FQDN / Hostname / NetBIOS names	✅ FQDN via `--aggressive`, short hostnames via `--hostname-list`
IPv4 addresses	✅
IPv6 addresses	✅
Customer-specific paths to backup files	✅ file & directory names anonymized in output paths (v2.5)
Names of backup files	✅
SharePoint / Exchange / SQL / Oracle / PostgreSQL / MongoDB / SAP HANA	🟡 DB names via `--db-list`
Query execution results	❌ out of scope (would corrupt logs)
SSH host fingerprints	✅ SHA256, MD5, ssh-rsa/ed25519/ecdsa public keys
SSH connection type	❌ not sensitive
SSH scripts/commands output	❌ not delimitable reliably
PEM certificates / private keys / JWT	✅
MAC addresses	✅ (bonus — not in KB2462 but recommended)

Features

Fast: Aho-Corasick literal replacement engine, parallel file processing with rayon, lock-free entity aggregation
Portable: Single static binary, no runtime dependencies
Smart: Strict validation prevents false positives — only real entities are anonymized
Consistent: Same entity always gets the same replacement across all files
Reversible: Export a dictionary, then reverse anonymization when needed
Comprehensive: Detects all KB2462 categories where automatic detection is reliable; explicit lists for the rest
Flexible: Exclude specific entity types with --exclude, opt-in aggressive detection with --aggressive
Safe: Paranoid re-scan mode + collision detection on generated values

What's new in v2.6

Three backlog features, all shipped together.

`--validate-only` — dry-run audit with a JSON report

Scan a bundle without writing anything and emit a machine-readable JSON report of what would be anonymized — counts by entity kind and by file, never the original values. Built for pipelines / agent orchestration.

Output to stdout (pure JSON — banner and progress go to stderr) or to --report-output FILE.
Deterministic exit code: 0 if no entities detected, 2 if entities were detected, 1 on error.
Reuses the exact same detection engine as anonymization (no logic drift).

veeam-log-anonymizer -d ./logs --validate-only | jq .summary
veeam-log-anonymizer -d bundle.zip --validate-only --report-output audit.json

Direct `.zip` bundle input

Point -d at a support .zip directly (auto-detected by extension / PK magic bytes) — no manual decompression.

--output-zip FILE repacks an anonymized .zip (what you send back to support), preserving the internal tree and entry timestamps. Otherwise the bundle is extracted, anonymized, into -o DIR.
.log entries get their content anonymized; other entries are copied byte-for-byte; every entry name is anonymized (path-safe entities). Processed entry-by-entry (memory bounded).
The dictionary is never written inside the zip.

veeam-log-anonymizer -d 2026-05-16_VeeamBackupLogs.zip --output-zip anonymized.zip -f -D --dict-output ./keep-safe

Optional dictionary encryption (`--encrypt-dict`)

Opt-in encryption of the reversible dictionary (a credential) with a passphrase, using the age format. Output gets a .age suffix.

Passphrase from VLAR_DICT_PASSPHRASE (automation) or an interactive hidden prompt — never a CLI argument.
--reverse transparently decrypts a .age dictionary (prompts / reads the env var).
Losing the passphrase means the anonymization can never be reversed.

veeam-log-anonymizer -d ./logs -o ./out -f -D --dict-output ./keep-safe --encrypt-dict
veeam-log-anonymizer --reverse ./keep-safe/veeam-anonymizer-*.json.age -d ./out -o ./restored -f

What's new in v2.5

File & directory name anonymization

Resolves issue #1: sensitive entities in file and directory names (e.g. Task.HOSTNAME-vm....log, or a folder named after a VM/job) were copied verbatim into the output. They are now anonymized too.

On by default: path components are anonymized using the same consistent, reversible mappings as the file content. Recognizable prefixes (Task., Agent., Svc.) and the .log extension are preserved — only the sensitive token is replaced.
Path names are also scanned: an email / FQDN / IP / backup-file name present only in a path (never in content) is now auto-detected and anonymized.
Reversible: --reverse restores the original file and directory names along with content.
--paranoid also re-scans output path names and flags any leaked entity still present.
Opt-out: --keep-path-names keeps original names (content is still anonymized).
Limitation (by design): IPv4/IPv6/MAC/DOMAIN\user are not altered in path names — their masked forms contain characters (*, :, \) invalid in filenames. Short bare hostnames in names still require --hostname-list / --object-list (not reliably auto-detectable), per the tool's "miss rather than corrupt" philosophy.

Paranoid false-positive fix

Resolves issue #2: backup-file paths such as disk.vib\next or chain.vbk\n1024 were wrongly captured as DOMAIN\user (the "domain" segment being a file extension), then re-flagged by --paranoid as leaks. The DOMAIN\user detector now rejects matches whose domain segment is a known file extension.

What's new in v2.4

Major coverage upgrade aligned with Veeam KB2462:

IPv6 addresses detected and anonymized (preserves loopback, link-local, multicast)
MAC addresses in both colon (XX:XX:XX:XX:XX:XX) and compact (XXXXXXXXXXXX) formats
SSH host fingerprints: SHA256, MD5, and full ssh-rsa/ed25519/ecdsa public keys
Backup file names (.vbk/.vib/.vbm/.vrb): stem replaced, extension preserved
PEM inline (JSON-escaped \n between BEGIN/END): now properly redacted (was missed in v2.3)
--hostname-list FILE: explicit list of short hostnames to anonymize
--object-list FILE: explicit list of customer object names (VMs, datastores, hosts, clusters)
--db-list FILE: explicit list of database names (SQL/Oracle/PostgreSQL/MongoDB/HANA)
All new types individually toggleable via --exclude ipv6,mac,ssh-fp,backup-file,hostname,object,db
Banner now references KB2462 as scope reference
Dictionary JSON format extended (backward-compatible via #[serde(default)])

Previous releases (recap)

v2.3: Aho-Corasick engine (5-10× faster), --aggressive for FQDN/naked-user, PEM/JWT redaction, .\user local-machine detection
v2.2: Single-pass replacement engine, lock-free parallel scanning, UTF-16 BOM handling, collision-safe generation, --dict-output, --paranoid, internal-TLD handling

Installation

From source

# Install Rust if needed (1.80+ required for LazyLock)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build
cd veeam-log-anonymizer
cargo build --release

# Binary: target/release/veeam-log-anonymizer

Pre-built binaries

Download from the Releases page. Builds available for Linux (x86_64, ARM64), macOS (Intel, Apple Silicon), and Windows.

Usage

Default mode (safe)

# Single file
veeam-log-anonymizer -i backup.log -o ./output -f

# Directory (recursive)
veeam-log-anonymizer -d /var/log/veeam -o ./anonymized -f -v

# Recommended workflow with separated dictionary and paranoid check
veeam-log-anonymizer -d ./logs -o ./anonymized -f -v -D \
    --dict-output ./keep-safe -s --paranoid

Maximum KB2462 coverage

# Prepare explicit lists (one entry per line, # for comments)
cat > ~/.vla/users.txt <<EOF
veeamadmin
backup-svc
EOF

cat > ~/.vla/hosts.txt <<EOF
vsa1
backup-srv01
EOF

cat > ~/.vla/objects.txt <<EOF
vm-prod-crm
vm-prod-db
Datastore-Tier1
EOF

cat > ~/.vla/dbs.txt <<EOF
VeeamBackup
ProductionCRM
EOF

# Full anonymization run
veeam-log-anonymizer \
    -d ./logs -o ./anonymized -f -v -D \
    --dict-output ~/.vla/dicts \
    --aggressive --paranoid -s \
    --user-list ~/.vla/users.txt \
    --hostname-list ~/.vla/hosts.txt \
    --object-list ~/.vla/objects.txt \
    --db-list ~/.vla/dbs.txt

Reverse anonymization

veeam-log-anonymizer --reverse ~/.vla/dicts/veeam-anonymizer-*.json \
    -d ./anonymized -o ./restored -f

Selective exclusion

# Keep IPs visible (e.g. local-only deployment)
veeam-log-anonymizer -d ./logs -o ./output -f -e ip,ipv6

# Disable PEM redaction (rare — need to inspect certificate chain)
veeam-log-anonymizer -d ./logs -o ./output -f -e pem

Options

Flag	Long	Description
`-i`	`--input FILE`	Input log file
`-d`	`--directory DIR`	Input directory (recursive) or a `.zip` bundle
`-o`	`--output DIR`	Output directory (required, except with `--validate-only` / `--output-zip`)
	`--output-zip FILE`	Repack the anonymized result into a new `.zip` (zip input)
`-f`	`--force`	Force overwrite / create directories
`-v`	`--verbose`	Show filenames in progress bar
`-m`	`--mapping`	Print mapping table to console
`-D`	`--dictionary`	Export mapping to JSON file
	`--dict-output DIR`	Write dictionary to a separate directory (recommended)
`-s`	`--stats`	Show detailed statistics
`-e`	`--exclude TYPES`	Skip entity types (see below)
	`--dry-run`	Preview without writing files (human-readable console listing)
	`--validate-only`	Scan only; emit JSON report (exit 0/2); writes nothing
	`--report-output FILE`	Write the `--validate-only` JSON report to a file
	`--reverse FILE`	De-anonymize using dictionary JSON (decrypts `.age` transparently)
	`--paranoid`	Re-scan output files to detect any leaked entities
	`--aggressive`	Enable detection of standalone FQDNs and naked usernames
	`--user-list FILE`	Explicit list of usernames
	`--hostname-list FILE`	Explicit list of short hostnames
	`--object-list FILE`	Explicit list of customer object names (VMs, datastores, hosts)
	`--db-list FILE`	Explicit list of database names
	`--keep-path-names`	Keep original file/directory names (path anonymization is on by default)
	`--encrypt-dict`	Encrypt the exported dictionary (`-D`) with a passphrase (age)

`--exclude` accepted types

email, user, domain, ip, ipv6, mac, ssh-fp, backup-file, naked-user, fqdn, hostname, object, db, pem, private-key, jwt

What gets anonymized

Default (always on, except via `--exclude`)

Entity	Example	Replacement
Email addresses	`admin@company.com`	`k8mN2xpQ@rT4wL9mK3nPq.com`
Domain\User	`CORP\john.doe`	`aBcDeFgH\iJkLmNoPqR`
Local user	`.\veeamadmin`	(anonymized via naked-user channel)
Domains (from emails)	`company.com`	`rT4wL9mK3nPq.com`
Internal FQDNs	`mail.corp.local`	`rT4wL9mK3nPq.com`
IPv4	`192.168.1.100`	`..1.100`
IPv4-mapped IPv6	`[::ffff:172.16.5.5]`	`[::ffff:..5.5]`
IPv6	`2a01:cb05:...:aa77`	`**::::::**:aa77`
MAC (colon)	`00:50:56:96:AA:77`	`::::**:77`
MAC (compact)	`005056962A77`	`**********77`
SSH SHA256	`SHA256:abc...xyz=`	`SHA256:[REDACTED]`
SSH MD5	`MD5:ab:cd:...`	`MD5:[REDACTED]`
SSH pubkey	`ssh-rsa AAAA...`	`ssh-rsa [REDACTED]`
Backup files	`Job-CRM-2026-05-17.vbk`	`xR4t9pZmK9Lq.vbk`
PEM certificates	full block	`BEGIN/END preserved, body redacted`
PEM private keys	full block	`[REDACTED RSA PRIVATE KEY]`
JWT tokens	`eyJ...`	`[REDACTED JWT]`

Aggressive mode (`--aggressive`)

Entity	Example	Replacement
Naked usernames	`User: veeamadmin`	`User: xRyZ8vMqWp`
Naked usernames	`Account: jdoe`	`Account: aB3kLm9PqR`
Standalone FQDNs	`k10-route.apps.cluster.home`	`xR4t9pZ.anon.home`

Explicit lists (no auto-detection — provide your own)

Source	Replacement format
`--hostname-list`	`host-XXXXXX`
`--object-list`	`obj-XXXXXXXX`
`--db-list`	`db-XXXXXXXX`
`--user-list`	naked-user channel

Always preserved

VMware vSphere versions (7.x.x.x, 8.x.x.x)
VBR/Kasten product versions (e.g. 12.1.0.2131)
Loopback (127.0.0.1, ::1)
Link-local (169.254.x.x, fe80::/10)
Broadcast, multicast (IPv4 224-239, IPv6 ff::/8)
All timestamps, log levels, and non-sensitive text
System accounts (SYSTEM, Administrator, LocalService, etc.)
Technical terms and Veeam service names

Recommended support workflow

# 1. Anonymize with maximum coverage; dictionary in a SEPARATE private dir
veeam-log-anonymizer \
    -d ./logs -o ./anonymized -f -D \
    --dict-output ~/private/veeam-dicts \
    --aggressive --paranoid \
    --user-list ~/.vla/users.txt \
    --hostname-list ~/.vla/hosts.txt \
    --object-list ~/.vla/objects.txt \
    --db-list ~/.vla/dbs.txt

# 2. Verify --paranoid reports zero leaks. If not, review and re-run.
#    Add the leaked entries to the appropriate list and re-run.

# 3. Bundle and send ONLY the ./anonymized directory to support.
#    Do NOT include the dictionary file.

# 4. When support pinpoints an issue, reverse to see real values locally
veeam-log-anonymizer --reverse ~/private/veeam-dicts/veeam-anonymizer-*.json \
    -d ./anonymized -o ./restored -f

Known limitations

Auto-detection is regex-based — sophisticated obfuscation, custom log formats, or unexpected encoding may cause false negatives. Use explicit lists for known-sensitive items + --paranoid + manual review for sensitive cases.
Query execution results (KB2462) are not anonymized: they are arbitrary text and any regex would either miss them or corrupt valid log content. Manual review or pre-processing required.
PostgreSQL/SQL/Oracle/Mongo/Hana DB content beyond names: same caveat.
Generated replacements use a non-cryptographic PRNG (rand::thread_rng, ChaCha12 in rand 0.8). Adequate for anonymization, not for cryptographic privacy guarantees.
The dictionary file is unencrypted. Treat it like a credential.
Very large files (>1 GB) are read into memory. Consider splitting beforehand.
FQDN auto-detection requires a recognized TLD whitelist; unknown internal TLDs require --hostname-list.

Development

make check          # Format + lint + test (CI equivalent)
make release        # Optimized build
make demo           # Quick visual test
make build-all      # Cross-compile for all platforms
make install        # Install to ~/.cargo/bin

License

MIT License. No warranty, express or implied. See LICENSE.

This tool is informed by — but not endorsed by — Veeam Software. The list of sensitive data types this tool aims to detect is based on the public Veeam Knowledge Base article KB2462.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Veeam Log Anonymizer — Rust Edition

⚠ Disclaimer

Author

Reference work

KB2462 coverage matrix (VBR)

Features

What's new in v2.6

--validate-only — dry-run audit with a JSON report

Direct .zip bundle input

Optional dictionary encryption (--encrypt-dict)

What's new in v2.5

File & directory name anonymization

Paranoid false-positive fix

What's new in v2.4

Previous releases (recap)

Installation

From source

Pre-built binaries

Usage

Default mode (safe)

Maximum KB2462 coverage

Reverse anonymization

Selective exclusion

Options

--exclude accepted types

What gets anonymized

Default (always on, except via --exclude)

Aggressive mode (--aggressive)

Explicit lists (no auto-detection — provide your own)

Always preserved

Recommended support workflow

Known limitations

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--validate-only` — dry-run audit with a JSON report

Direct `.zip` bundle input

Optional dictionary encryption (`--encrypt-dict`)

`--exclude` accepted types

Default (always on, except via `--exclude`)

Aggressive mode (`--aggressive`)

Packages