Option to use Text instead of String #14

sergv · 2016-03-02T13:16:51Z

I think it could be useful to move the library to, or at least give users an option of using, Text akin to https://github.com/ivan-m/wl-pprint-text. The reason is that String may get pretty inefficient memorywise for printing large Docs.

Maybe parameterizing Doc with underlying string type would help?

rpglover64 · 2016-03-02T14:59:29Z

Has anyone (you, for example) run into performance issues using this library? Parameterizing on the string type would, at best complicate the library, and at worst be a breaking change for little gain.

sergv · 2016-03-02T19:41:51Z

Well, I haven't used this library yet. I'm considering to use it over wl-pprint-text, but wouldn't like to have to switch to something more performant, but lacking in features, once problems show up.

Than being said, I have witnessed performance problems with generating C code caused by usage of the pretty package. It's somewhat apples to oranges comparison since, although it uses Strings, it renders documents strictly, which forces resulting string to be allocated in full in memory. For the sake of argument, simple switch from pretty to wl-pprint-text gave 50% speedup on particularly large code generation test cases, which is no little gain.

Again, even though they're the standard as of now, Strings that use 20 to 40 bytes per character, depending on your platform, do not look very appealing in general. I would very much prefer Text or ByteStrings over linked lists of characters.

Finally, parameterizing the library can be done in fully backwards-compatible fashion: add new modules that provide Text-based or some other kind of interface and leave current module a specialization of those to Strings.

nwf · 2016-03-02T20:55:47Z

Rather than parametrize the library, which seems pretty gross to me, how about we just eliminate most of its internal use of String and try to preserve the underlying efficient representations as long as possible, such as the below patch?

I think this approach is viable because the only thing we do with the Strings is print them out and take their length (once, caching the value for reuse in layout computations); if we can take the length of the efficient form without converting to String, then we'll delay conversion to printout.

diff --git a/src/Text/PrettyPrint/Free/Internal.hs b/src/Text/PrettyPrint/Free/Internal.hs
index 658b29d..24f2f9f 100644
--- a/src/Text/PrettyPrint/Free/Internal.hs
+++ b/src/Text/PrettyPrint/Free/Internal.hs
@@ -137,8 +137,11 @@ import Data.Functor.Bind
 import Data.Functor.Plus
 import Data.Int
 import Data.Word
-import qualified Data.ByteString.UTF8 as B
-import qualified Data.ByteString.Lazy.UTF8 as BL
+import qualified Data.ByteString as B
+import qualified Data.ByteString.UTF8 as B8
+import qualified Data.ByteString.Lazy as BL
+import qualified Data.ByteString.Lazy.UTF8 as BL8
+import qualified Data.List as L
 import qualified Data.Text as T
 import qualified Data.Text.Encoding as T
 import qualified Data.Text.Lazy as TL
@@ -582,17 +585,34 @@ instance Pretty a => Pretty [a] where
 instance Pretty (Doc a' e') where
   pretty = docLeafyRec (\_ -> Empty) (\_ -> id)

+-- Doc-ify a "character sequence"-like thing, while trying to keep the
+-- underlying representation whenever possible by exploiting laziness.
+prettycs :: (t -> Maybe (Char, t))           -- ^ uncons
+         -> ((Char -> Bool) -> t -> (t, t))  -- ^ span
+         -> (t -> Bool)                      -- ^ null test
+         -> (t -> Int)                       -- ^ length
+         -> (t -> String)                    -- ^ to string
+         -> t -> Doc a a1
+prettycs unc spn nulp len tos = go
+   where
+    go x = maybe empty ne $ unc x
+     where
+      ne ('\n', t) = line <> go t
+      ne (_, _) = let (h,t) = spn (/='\n') x in chunk h <> go t
+      chunk c = if nulp c then Empty else Text (len c) (tos c)
+{-# INLINE prettycs #-}
+
 instance Pretty B.ByteString where
-  pretty = pretty . B.toString
+  pretty = prettycs B8.uncons B8.span B.null B8.length B8.toString

 instance Pretty BL.ByteString where
-  pretty = pretty . BL.toString
+  pretty = prettycs BL8.uncons BL8.span BL.null BL8.length BL8.toString

 instance Pretty T.Text where
-  pretty = pretty . T.encodeUtf8
+  pretty = prettycs T.uncons T.span T.null T.length (B8.toString . T.encodeUtf8)

 instance Pretty TL.Text where
-  pretty = pretty . TL.encodeUtf8
+  pretty = prettycs TL.uncons TL.span TL.null (fromIntegral . TL.length) (BL8.toString . TL.encodeUtf8)

 instance Pretty () where
   pretty () = text "()"
@@ -602,10 +622,7 @@ instance Pretty Bool where

 instance Pretty Char where
   pretty = char
-  prettyList "" = empty
-  prettyList ('\n':s) = line <> prettyList s
-  prettyList s = case span (/='\n') s of
-             (xs,ys) -> text xs <> prettyList ys
+  prettyList = prettycs L.uncons L.span L.null L.length id

 instance Pretty a => Pretty (Seq a) where
   pretty = prettyList . toList

nwf · 2016-03-02T20:58:19Z

@sergv if you have particular benchmarks you care about, I'd be curious to see how they behaved before and after the above patch. :)

rpglover64 · 2016-03-02T23:58:20Z

@nwf sounds reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to use Text instead of String #14

Option to use Text instead of String #14

sergv commented Mar 2, 2016

rpglover64 commented Mar 2, 2016

sergv commented Mar 2, 2016

nwf commented Mar 2, 2016

nwf commented Mar 2, 2016

rpglover64 commented Mar 2, 2016

Option to use Text instead of String #14

Option to use Text instead of String #14

Comments

sergv commented Mar 2, 2016

rpglover64 commented Mar 2, 2016

sergv commented Mar 2, 2016

nwf commented Mar 2, 2016

nwf commented Mar 2, 2016

rpglover64 commented Mar 2, 2016