-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode normalization combinators #698
Conversation
Ignore |
@adithyaov can you rebase this, port it on the new unicode-data package, and add benchmarks from unicode-transforms package? |
0a6debe
to
fa8f961
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Let's add the normalization routines to the Unicode.Char module. We will use this module for operations on Char type including stream of Char to Char transformations. The Unicode.Stream module should go away. We will move the encode/decode routines in Unicode.Stream to Unicode.Utf8 and Unicode.Latin1.
- Let's add some basic benchmarks as well so that we can start working on performance optimizations of these modules. Just take an input file and normalize it. We already have some benchmarks in the existing Unicode.Stream benchmarks that take input files using env vars.
Once the normalization tests pass and we have benchmarks in place, we can commit this and do the performance optimizations later.
-- decomposed. | ||
{-# INLINE_EARLY partialComposeD #-} | ||
partialComposeD :: Monad m => Stream m Char -> Stream m Char | ||
partialComposeD (Stream step state) = Stream step' (ComposeNone state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this composeD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of partially composing. This does not compose a semi-decomposed stream. It shouldn't be used directly.
7424d87
to
131eff1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good at a high level. I did not check the details.
131eff1
to
5061dd1
Compare
5061dd1
to
2a638b0
Compare
Todo: