utf8-ranges

DEPRECATED: This crate has been folded into the regex-syntax and is now deprecated.

utf8-ranges

This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte ranges. This is useful when constructing byte based automata from Unicode. Stated differently, this lets one embed UTF-8 decoding as part of one's automaton.

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/utf8-ranges

Example

This shows how to convert a scalar value range (e.g., the basic multilingual plane) to a sequence of byte based character classes.

extern crate utf8_ranges;

use utf8_ranges::Utf8Sequences;

fn main() {
    for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
        println!("{:?}", range);
    }
}

The output:

[0-7F]
[C2-DF][80-BF]
[E0][A0-BF][80-BF]
[E1-EC][80-BF][80-BF]
[ED][80-9F][80-BF]
[EE-EF][80-BF][80-BF]

These ranges can then be used to build an automaton. Namely:

Every arbitrary sequence of bytes matches exactly one of the sequences of ranges or none of them.
Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous encodings of surrogate codepoints in UTF-8 cannot match any of the byte ranges above.)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
benches		benches
ci		ci
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
COPYING		COPYING
Cargo.toml		Cargo.toml
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
UNLICENSE		UNLICENSE
ctags.rust		ctags.rust
session.vim		session.vim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

utf8-ranges

Documentation

Example

About

Licenses found

Releases

Packages

Contributors 5

Languages

License

Licenses found

BurntSushi/utf8-ranges

Folders and files

Latest commit

History

Repository files navigation

utf8-ranges

Documentation

Example

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages