Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please expose internal module? #4

Closed
winterland1989 opened this issue Apr 19, 2021 · 8 comments
Closed

Could you please expose internal module? #4

winterland1989 opened this issue Apr 19, 2021 · 8 comments

Comments

@winterland1989
Copy link

winterland1989 commented Apr 19, 2021

As the author of Z-Data, we'd like to add support to unicode-collation, but our Text type use UTF-8 encoding, which is different to text 's one. So could you please add API for creating sort keys from [Int] directly? I assumed this can be done via exposing some internal modules.

@jgm
Copy link
Owner

jgm commented Apr 19, 2021

Do you already have a mechanism for normalizing your UTF-8 encoded texts into NFD form? That is a required step of the algorithm. If so, I can expose something like

normalizedCodePointsToSortKey :: Collation -> VariableWeighting -> [Int] -> SortKey
normalizedCodePointsToSortKey collation weighting =
    mkSortKey opts                                             
  . handleVariable weighting
  . getCollationElements collation 

But if you don't have the normalization piece yet, this won't do any good. unicode-transforms is limited to Text.

Note: I've requested that unicode-transforms expose an interface for incremental normalization:
composewell/unicode-transforms#60
If, on top of that, they also exposed an interface that factored out the stream step (Text -> Stream Char), then you'd have something you could use directly. You might request that.

@winterland1989
Copy link
Author

Yes, it’s already there

@jgm
Copy link
Owner

jgm commented Apr 22, 2021

Would the signature noted above be the right kind of thing for you to use?

@jgm jgm closed this as completed in fc03023 Apr 22, 2021
@jgm
Copy link
Owner

jgm commented Apr 22, 2021

Let me know if this commit gives you what you need.
I'll wait for your comment before releasing anything.

@winterland1989
Copy link
Author

Thanks, it definitely works.

@jgm
Copy link
Owner

jgm commented Apr 26, 2021

My latest version gives you something a bit different, but hopefully still workable:

    -- | Compare two strings of any type that can be unpacked
    -- lazily into a list of 'Char's.
  , collateWithUnpacker   :: forall a. Eq a => (a -> [Char]) -> a -> a -> Ordering

@jgm
Copy link
Owner

jgm commented Apr 26, 2021

Also note that in the latest version (0.1.3), you collateWithUnpacker does not assume that the string is normalized. It will do the normalization for you, unless you explicitly turn it off with setNormalization.

@winterland1989
Copy link
Author

Yes, I see, that's a better API indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants