Deencode: Reverse engineer encoding errors

My first name is Clément. Throughout my life, I've encountered my fair share of bad printings of my name because of bad encoding management: the text is encoded (turned from an internal representation into a sequence of bytes) then decoded (turned from a sequence of bytes into an internal representation) using different schemes. This often leads to non-ASCII characters being mangled, replaced, or outright ignored.

For example:

The string "Clément"
└╴encoded as UTF-8 is 43 6C C3 A9 6D 65 6E 74
  └╴decoded as Latin-1 / Codepage 1252 is "ClÃ©ment"

Having this sort of visualisations is why I created this crate. You take a number of engines, pass them to deencode::deencode() to get back a tree of possible sequences of encodings and decodings, and then work on that tree.

This crate is published on crates.io; with documentation at docs.rs.

Example usage

// List the engines to use.
let engines: Vec<&dyn Engine> = vec![&UTF8, &LATIN1, &MIXED816BE, &MIXED816LE, &UTF7];
// Explore the tree of possible encodings and decodings.
let mut tree = deencode("Clément", &engines, 1);
// Remove duplicate entries from the tree.
let _ = tree.deduplicate();
// Export the tree with box drawings.
println!("{}", tree);
// Export the tree as JSON.
println!("{}", serde_json::to_string(&tree).unwrap());

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deencode: Reverse engineer encoding errors

Example usage

Some additional reading

About

Uh oh!

Releases

Packages

Languages

License

cigix/deencode

Folders and files

Latest commit

History

Repository files navigation

Deencode: Reverse engineer encoding errors

Example usage

Some additional reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages