Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use Text instead of String #14

Open
sergv opened this issue Mar 2, 2016 · 5 comments
Open

Option to use Text instead of String #14

sergv opened this issue Mar 2, 2016 · 5 comments

Comments

@sergv
Copy link

sergv commented Mar 2, 2016

I think it could be useful to move the library to, or at least give users an option of using, Text akin to https://github.com/ivan-m/wl-pprint-text. The reason is that String may get pretty inefficient memorywise for printing large Docs.

Maybe parameterizing Doc with underlying string type would help?

@rpglover64
Copy link
Collaborator

Has anyone (you, for example) run into performance issues using this library? Parameterizing on the string type would, at best complicate the library, and at worst be a breaking change for little gain.

@sergv
Copy link
Author

sergv commented Mar 2, 2016

Well, I haven't used this library yet. I'm considering to use it over wl-pprint-text, but wouldn't like to have to switch to something more performant, but lacking in features, once problems show up.

Than being said, I have witnessed performance problems with generating C code caused by usage of the pretty package. It's somewhat apples to oranges comparison since, although it uses Strings, it renders documents strictly, which forces resulting string to be allocated in full in memory. For the sake of argument, simple switch from pretty to wl-pprint-text gave 50% speedup on particularly large code generation test cases, which is no little gain.

Again, even though they're the standard as of now, Strings that use 20 to 40 bytes per character, depending on your platform, do not look very appealing in general. I would very much prefer Text or ByteStrings over linked lists of characters.

Finally, parameterizing the library can be done in fully backwards-compatible fashion: add new modules that provide Text-based or some other kind of interface and leave current module a specialization of those to Strings.

@nwf
Copy link
Collaborator

nwf commented Mar 2, 2016

Rather than parametrize the library, which seems pretty gross to me, how about we just eliminate most of its internal use of String and try to preserve the underlying efficient representations as long as possible, such as the below patch?

I think this approach is viable because the only thing we do with the Strings is print them out and take their length (once, caching the value for reuse in layout computations); if we can take the length of the efficient form without converting to String, then we'll delay conversion to printout.

diff --git a/src/Text/PrettyPrint/Free/Internal.hs b/src/Text/PrettyPrint/Free/Internal.hs
index 658b29d..24f2f9f 100644
--- a/src/Text/PrettyPrint/Free/Internal.hs
+++ b/src/Text/PrettyPrint/Free/Internal.hs
@@ -137,8 +137,11 @@ import Data.Functor.Bind
 import Data.Functor.Plus
 import Data.Int
 import Data.Word
-import qualified Data.ByteString.UTF8 as B
-import qualified Data.ByteString.Lazy.UTF8 as BL
+import qualified Data.ByteString as B
+import qualified Data.ByteString.UTF8 as B8
+import qualified Data.ByteString.Lazy as BL
+import qualified Data.ByteString.Lazy.UTF8 as BL8
+import qualified Data.List as L
 import qualified Data.Text as T
 import qualified Data.Text.Encoding as T
 import qualified Data.Text.Lazy as TL
@@ -582,17 +585,34 @@ instance Pretty a => Pretty [a] where
 instance Pretty (Doc a' e') where
   pretty = docLeafyRec (\_ -> Empty) (\_ -> id)

+-- Doc-ify a "character sequence"-like thing, while trying to keep the
+-- underlying representation whenever possible by exploiting laziness.
+prettycs :: (t -> Maybe (Char, t))           -- ^ uncons
+         -> ((Char -> Bool) -> t -> (t, t))  -- ^ span
+         -> (t -> Bool)                      -- ^ null test
+         -> (t -> Int)                       -- ^ length
+         -> (t -> String)                    -- ^ to string
+         -> t -> Doc a a1
+prettycs unc spn nulp len tos = go
+   where
+    go x = maybe empty ne $ unc x
+     where
+      ne ('\n', t) = line <> go t
+      ne (_, _) = let (h,t) = spn (/='\n') x in chunk h <> go t
+      chunk c = if nulp c then Empty else Text (len c) (tos c)
+{-# INLINE prettycs #-}
+
 instance Pretty B.ByteString where
-  pretty = pretty . B.toString
+  pretty = prettycs B8.uncons B8.span B.null B8.length B8.toString

 instance Pretty BL.ByteString where
-  pretty = pretty . BL.toString
+  pretty = prettycs BL8.uncons BL8.span BL.null BL8.length BL8.toString

 instance Pretty T.Text where
-  pretty = pretty . T.encodeUtf8
+  pretty = prettycs T.uncons T.span T.null T.length (B8.toString . T.encodeUtf8)

 instance Pretty TL.Text where
-  pretty = pretty . TL.encodeUtf8
+  pretty = prettycs TL.uncons TL.span TL.null (fromIntegral . TL.length) (BL8.toString . TL.encodeUtf8)

 instance Pretty () where
   pretty () = text "()"
@@ -602,10 +622,7 @@ instance Pretty Bool where

 instance Pretty Char where
   pretty = char
-  prettyList "" = empty
-  prettyList ('\n':s) = line <> prettyList s
-  prettyList s = case span (/='\n') s of
-             (xs,ys) -> text xs <> prettyList ys
+  prettyList = prettycs L.uncons L.span L.null L.length id

 instance Pretty a => Pretty (Seq a) where
   pretty = prettyList . toList

@nwf
Copy link
Collaborator

nwf commented Mar 2, 2016

@sergv if you have particular benchmarks you care about, I'd be curious to see how they behaved before and after the above patch. :)

@rpglover64
Copy link
Collaborator

@nwf sounds reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants