Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the behaviour of Utf8 to replicate that of strict Text #1386

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

adithyaov
Copy link
Member

  • Few undefined APIs.
  • Few APIs need to be improved, need to be made more performant.

This PR adds the general structure and some basic implementation.

For reviewer:

  • The only thing that needs to be reviewed are the XXX comments and the export list.
  • We might not need a few APIs. The docs are copied with minor modification, they don't really need to be reviewed.
  • I wanted to add individual small commits but at one point it became too late.

@adithyaov adithyaov force-pushed the utf8-text branch 7 times, most recently from 9c92cb8 to ee78234 Compare December 30, 2021 19:11
@adithyaov adithyaov force-pushed the utf8-text branch 2 times, most recently from f9418ae to 6f008eb Compare December 31, 2021 05:49
Copy link
Member

@harendra-kumar harendra-kumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the Type module. Will review others later.

  1. Can you arrange the code in define before use order? For example the pack operation is used before it is defined. Similarly, toArray.
  2. Operations that apply to generic array modules can be moved to the array module.

src/Streamly/Internal/Data/Array/Foreign/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Data/Array/Foreign/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8.hs Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
src/Streamly/Internal/Unicode/Utf8/Type.hs Outdated Show resolved Hide resolved
@adithyaov adithyaov force-pushed the utf8-text branch 3 times, most recently from 14efcf7 to 7a23ccd Compare January 7, 2022 09:41
@adithyaov adithyaov force-pushed the utf8-text branch 2 times, most recently from ce3fa66 to 6a5e2d2 Compare January 18, 2022 17:23
* Few undefined APIs.
* Few APIs need to be improved, need to be made more performant.

This commit adds the general structure and some basic implementation.
Some documentation & a few examples are taken from the text package
Copy link
Member

@harendra-kumar harendra-kumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Remove all unsafePerformIO. If you retain any there should be a good reason for that.
  2. I think we should use a SerialT m a wherever we are using a [a]. That will provide a more efficient way of composing the APIs. I do not think we need a drop in replacement for text. We can add the missing APIs in a compat module if we feel that is needed.

{-# INLINE zip #-}
zip :: Utf8 -> Utf8 -> [(Char,Char)]
zip a b =
unsafePerformIO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use identity monad instead of unsafePerformIO.

case Prelude.filter (not . null) ts of
[] -> empty
[t] -> t
xs -> Prelude.foldl1 append xs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use something like this:

concat m = fromStreamD $ D.unfoldMany A.read (toStreamD m)

-- /Time complexity:/ O(n)
{-# INLINE concatMap #-}
concatMap :: (Char -> Utf8) -> Utf8 -> Utf8
concatMap f = concat . foldr ((:) . f) []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use stream based implementation instead of using list.

-- /Time complexity:/ O(n)
{-# INLINE unlines #-}
unlines :: [Utf8] -> Utf8
unlines = concat . List.map (`snoc` '\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using snoc may not be most efficient, we should implement it in a streaming way, see unlines in Unicode.Stream module.

-- XXX This would require rewriting Array.Foreign.write-ish functions. We can't
-- use the underlying Array.Foreign.Mut.write-ish
{-# INLINE fromStream #-}
fromStream :: SerialT IO Char -> Utf8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be fromStream :: SerialT m Char -> m Utf8. You cannot have the result pure when the stream is not pure.

-- /Time complexity:/ O(n)
{-# INLINE_NORMAL unpack #-}
unpack :: Utf8 -> String
unpack = unsafePerformIO . Stream.toList . toStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use identity monad, do not use unsafePerformIO anywhere.

@harendra-kumar harendra-kumar marked this pull request as draft July 5, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants