jsonguard

Input inspection and output sanitization for JSON/JSONL, CSV, and TSV. Guards against formula injection, bidi-override, control-character, and encoding attacks at both ends of the data pipeline — with a secure-by-default API that makes the safe path the only path.

The Problem

Untrusted data is dangerous at both ends of a pipeline:

At ingestion — you need to know what you're accepting before you store it or route it:

Pattern	Risk
`=HYPERLINK(...)` arriving in a field	Stored formula executes when exported to CSV
U+202E in a username	Display name reversed in UI — phishing vector
`\xFF\xFE` bytes from an external source	Unvalidated bytes corrupt downstream JSON encoding

At emission — structured formats have strict encoding rules that raw strings violate:

Attack	Format	Example
Formula injection	CSV/TSV	`=HYPERLINK("https://evil.example","click me")`
JSON string injection	JSONL	`hello"\n,"injected_key":"injected_value`
Bidi override	All	U+202E reverses displayed text
Control characters	All	TAB/CR/LF breaks field boundaries
CJKV encoding hazard	JSONL	Big5 `許` has `0x5C` as second byte — triggers JSON `\` escape
Unbalanced quotes	CSV	Raw `"` in a field breaks RFC 4180 parsers

Most crates address one end or one attack class. jsonguard covers both ends, all attack classes, by default.

Secure by Default

Every function in jsonguard accepts &str or &[u8]. Byte input is decoded through a safe UTF-8 lossy path before inspection or sanitization. There is no "call decode() first" footgun — the decode step is invisible and mandatory.

use jsonguard::{csv_field, tsv_safe, jsonl_safe};

// Both compile. Both are safe. The &[u8] path decodes first.
let from_str   = csv_field("O'Brien, \"quoted\"");
let from_bytes = csv_field(b"\xB3\x5C raw bytes from Big5 source");
//                          ^^ 許 in Big5; 0x5C second byte is handled correctly

assert!(!from_str.lossy);   // &str input: always valid UTF-8
assert!(from_bytes.lossy);  // byte input: \xB3 was undecodable, replaced
println!("{}", from_bytes); // still a valid, safe CSV field

Features

Feature	Default	Description
`alloc`	✓	`Cow<'_, str>` / `String` return types
`std`		`std::error::Error` impl

Install

[dependencies]
jsonguard = "0.2"

No-std compatible:

[dependencies]
jsonguard = { version = "0.2", default-features = false, features = ["alloc"] }

Usage

CSV

use jsonguard::csv_field;

// Wraps in quotes and escapes internal quotes per RFC 4180
let field = csv_field("=SUM(A1:A10)");  // formula injection blocked
assert_eq!(field.to_string(), r#"'=SUM(A1:A10)"#);

let field = csv_field("name with \"quotes\"");
assert_eq!(field.to_string(), r#""name with ""quotes"""#);

TSV

use jsonguard::tsv_safe;

// Tabs, CR, LF replaced; bidi stripped; formula-prefixed fields escaped
let cell = tsv_safe("value\twith\ttabs");
assert_eq!(cell.to_string(), "value with tabs");

JSON Lines

use jsonguard::jsonl_safe;

// Produces a valid JSON string value (with surrounding quotes)
let val = jsonl_safe("line1\nline2");
assert_eq!(val.to_string(), r#""line1\nline2""#);

Display-safe strings

use jsonguard::{display_safe, cap_display};

// Strip control chars and bidi overrides for any display context
let s = display_safe("normal \x00 text \u{202E} reversed");
assert_eq!(s.to_string(), "normal  text  reversed");

// Truncate with a safe sentinel, never mid-character
let s = cap_display("hello world", 5);
assert_eq!(s.to_string(), "hello…");

Raw bytes

use jsonguard::bytes_to_utf8_lossy_safe;

let decoded = bytes_to_utf8_lossy_safe(b"\xFF\xFE valid utf-8 \xC0\x80 not");
assert!(decoded.lossy);  // replacement chars inserted
println!("{}", decoded); // safe for display

The `Guarded` type

Every sanitizer returns Guarded:

pub struct Guarded {
    pub value: String,
    pub lossy: bool,   // true when input had undecodable bytes
}

Guarded implements Display (emits value) so it drops straight into format strings. Check lossy when you care about data fidelity — e.g., log a warning or reject the record.

Input inspection

inspect() passively scans input and returns a Findings report without modifying anything.

use jsonguard::inspect;

// Reject at an API boundary
let f = inspect(user_input);
if !f.is_csv_safe() {
    return Err("input contains characters unsafe for CSV");
}

// Audit log with per-violation detail
let f = inspect(raw_bytes);
for v in &f.violations {
    eprintln!("violation {:?} at byte {}", v.kind, v.byte_offset);
}

// Branch: sanitize or reject
let f = inspect(user_input);
if f.is_clean() {
    Ok(csv_field(user_input))       // no changes needed
} else if f.is_csv_safe() {
    Ok(csv_field(user_input))       // sanitizer handles it
} else {
    Err("rejected: unsafe characters for CSV")
}

Findings exposes format-specific safety queries (is_csv_safe, is_tsv_safe, is_jsonl_safe, is_display_safe) and generic violation queries (has_formula, has_bidi, has_controls, has_invalid_utf8). Each Violation carries the byte_offset and the offending char.

Attack Coverage

Attack class	Handled
CSV formula injection (`= + - @` prefix)	✓ `'`-prefix escape
RFC 4180 quote handling	✓ double-quote escape
TSV column shift (embedded TAB)	✓ replace with space
JSON string escape (backslash, quote, control chars)	✓
CJKV Big5/GBK/EUC-KR byte collision with `\` and `"`	✓ decode-first
Bidi override (U+202E, U+200F, U+2069, etc.)	✓ strip
C0/C1 control characters	✓ strip
Null bytes	✓ strip
UTF-8 overlong sequences and surrogates	✓ lossy decode

Validated against real-world attack corpora — 50 Unicode Consortium bidi sequences (UCD 17.0.0), 7 OWASP formula injection payloads, Markus Kuhn's UTF-8 stress test (22 KB), and CJKV encoding hazards including Big5 \xB3\x5C and surrogate/overlong sequences. See docs/validation.md for sources and reproduction steps.

Acknowledgements

OWASP CSV Injection — formula injection documentation
RFC 4180 — CSV format specification
RFC 8259 — JSON format specification
Unicode Bidirectional Algorithm — bidi control character reference

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jsonguard

The Problem

Secure by Default

Features

Install

Usage

CSV

TSV

JSON Lines

Display-safe strings

Raw bytes

The `Guarded` type

Input inspection

Attack Coverage

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jsonguard

The Problem

Secure by Default

Features

Install

Usage

CSV

TSV

JSON Lines

Display-safe strings

Raw bytes

The Guarded type

Input inspection

Attack Coverage

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `Guarded` type

Packages