analyze : evaluate `streaming` for RFrame #35

ocramz · 2019-01-13T14:13:20Z

The RFrame type currently stores the frame entries as a Vector of Vectors (each inner vector being a data row). It would be nice to evaluate the performance of this way of storing with that of a streaming library (e.g. Stream (Of (Vector v)) m ()).

The text was updated successfully, but these errors were encountered:

Magalame · 2019-04-23T06:41:10Z

Streamly seems to have some good performances compared to the other streaming libraries.

adamConnerSax · 2019-04-23T11:49:15Z

I like Streamly! And it's easier to work with (IMO) than Streaming. I've used both and compared (a little) for my map-reduce-folds library. It often seems faster than Vector as well.
But Vectors have advantages for storage capacity right? At least in Frames, but that's using Storable vectors for columns which is different from what you are talking about here.

Magalame · 2019-04-30T17:30:43Z

I think some operations that would constitute good benchmarks would be: ==,update,takeRows. It could be a basis

Magalame · 2019-05-06T23:10:29Z

I wrote the following benchmark. So far two things are tested:

merely comparing two frames. Streamly seems to have a marginal advantage
takeRows, where Streamly seems to show some weakness

{-# LANGUAGE GADTs #-}


module Main where

import           Streamly
import qualified Streamly.Prelude as S

import Control.Monad.Catch (MonadThrow (..))
import Data.Function ((&))
import Data.Traversable (sequence)
import Control.Monad

import Analyze.Common (Data (..), DuplicateKeyError (..), RowSizeMismatch (..))
import qualified Analyze as A
import qualified Analyze.Common as AC


import qualified Data.Vector         as V
import           Data.Vector         (Vector)

import qualified Data.Text           as T
import           Data.Text           (Text)

import           Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HM
import qualified Data.HashSet        as HS

import System.Random
import qualified System.Random.MWC as M
import qualified Criterion.Main as C

import Control.DeepSeq (deepseq)


-- implementation details required
data RFrameUpdateS k v m where

    RFrameUpdateS :: MonadThrow m => { 
    _rframeUpdateKeys :: !(Vector k),
    _rframeUpdateData :: !(SerialT m (Vector v))
    } -> RFrameUpdateS k v m 

data RFrameS k v m where
    
    RFrameS :: MonadThrow m => { 
    _rframeKeys :: !(Vector k),
    _rframeLookup :: !(HashMap k Int),
    _rframeData   :: !(SerialT m (Vector v))
    } -> RFrameS k v m


fromUpdate :: (Data k, MonadThrow m) => RFrameUpdateS k v m -> m (RFrameS k v m)
fromUpdate (RFrameUpdateS ks vs) = AC.checkForDupes ks >> pure (RFrameS ks (AC.makeLookup ks) vs)

update :: (Data k, MonadThrow m) => RFrameUpdateS k v m -> RFrameS k v m -> m (RFrameS k v m)
update (RFrameUpdateS uks uvs) (RFrameS fks _ fvs) = do
  fSize <- S.length fvs
  uSize <- S.length uvs
  if fSize /= uSize
    then throwM (RowSizeMismatch fSize uSize)
    else do
      AC.checkForDupes uks
      let kis = AC.mergeKeys fks uks
          ks' = (\(k, _, _) -> k) <$> kis
          look' = AC.makeLookup ks'
          vs' = S.zipWith (AC.runIndexedLookup kis) fvs uvs
      return (RFrameS ks' look' vs')


-- only concerned with generating data
n :: Int
n = 1000

testKeys :: IO (Vector Text)
testKeys = V.replicateM n $ liftM (T.pack . take 10 . randomRs ('a','z')) newStdGen

testData :: IO (Vector (Vector Double))
testData = do 
    gen <- M.create
    V.replicateM n $ M.uniformVector gen n

testDataS :: IO (SerialT IO (Vector Double))
testDataS = do 
    vec <- testData
    return $ S.fromFoldable vec

-- the actual benchmarks
cmprVec :: IO Bool
cmprVec = do
    keys <- testKeys
    dat <- testData

    let 
        update = A.RFrameUpdate keys dat
    
    frame1 <- A.fromUpdate update
    frame2 <- A.fromUpdate update

    return $ frame1 == frame2

cmprStream :: IO Bool
cmprStream = do
    keys <- testKeys
    dat <- testDataS

    let 
        update = RFrameUpdateS keys dat
    
    frame1 <- fromUpdate update
    frame2 <- fromUpdate update

    let 
        sameKeys = _rframeKeys frame1 == _rframeKeys frame2
        sameLookup = _rframeLookup frame1 == _rframeLookup frame2
        dat1 = _rframeData frame1 
        dat2 = _rframeData frame2
    
    sameDat <- (S.zipWith (==) dat1 dat2 & S.notElem False)

    return $ sameKeys && sameLookup && sameDat

cmpr f1 f2 = do
    let 
        sameKeys = _rframeKeys f1 == _rframeKeys f2
        sameLookup = _rframeLookup f1 == _rframeLookup f2
        dat1 = _rframeData f1 
        dat2 = _rframeData f2
    
    sameDat <- (S.zipWith (==) dat1 dat2 & S.notElem False)

    return $ sameKeys && sameLookup && sameDat 

takeRowsVec :: IO [Bool]
takeRowsVec = do
    keys <- testKeys
    dat <- testData

    let 
        update = A.RFrameUpdate keys dat

    frame <- A.fromUpdate update

    let 
        postTake = zipWith A.takeRows ((div n) <$> [5,4,3,2,1])  (repeat frame)

    return $ zipWith (==) (repeat frame) postTake  

-- | Takes first 'n' rows of an 'RFrame'.
takeRows n (RFrameS ks look srm) = RFrameS ks look (S.take n srm)

takeRowsS :: IO [Bool]
takeRowsS = do
    keys <- testKeys
    dat <- testDataS

    let 
        update = RFrameUpdateS keys dat

    frame <- fromUpdate update

    let 
        postTake = zipWith takeRows ((div n) <$> [5,4,3,2,1])  (repeat frame)

    sequence $ zipWith cmpr (repeat frame) postTake 

main :: IO()
main = C.defaultMain [ C.bgroup "Tests" [ C.bench "Vec"   $ C.whnfIO cmprVec    
                                        , C.bench "Stream" $ C.whnfIO cmprStream
                                        , C.bench "takeRowsVec" $ C.whnfIO takeRowsVec
                                        , C.bench "takeRowsStream" $ C.whnfIO takeRowsS]]

I get this (at most a 5% improvement, in the worst case 20% decrease in performance)

benchmarking Tests/Vec
time                 50.78 ms   (48.75 ms .. 52.79 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 51.72 ms   (50.33 ms .. 53.85 ms)
std dev              3.417 ms   (1.287 ms .. 5.131 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking Tests/Stream
time                 49.05 ms   (47.58 ms .. 50.81 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 49.30 ms   (48.12 ms .. 50.91 ms)
std dev              2.704 ms   (1.691 ms .. 4.099 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking Tests/takeRowsVec
time                 41.39 ms   (39.96 ms .. 42.73 ms)
                     0.995 R²   (0.986 R² .. 0.999 R²)
mean                 44.10 ms   (42.83 ms .. 46.00 ms)
std dev              3.021 ms   (1.645 ms .. 4.291 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking Tests/takeRowsStream
time                 51.69 ms   (50.14 ms .. 53.24 ms)
                     0.997 R²   (0.993 R² .. 0.999 R²)
mean                 54.46 ms   (53.17 ms .. 56.48 ms)
std dev              3.000 ms   (2.079 ms .. 4.349 ms)
variance introduced by outliers: 14% (moderately inflated)

It is very likely that I'm doing something wrong though, but I'm not sure where.
It seems that streamly makes the interface inherently more complicated. For example comparing two RFrames, (==) becomes useless, and we can't just derive (Eq) our way out of it.

ocramz · 2019-05-07T04:01:06Z

Thank you @Magalame ! Well a 10% average speed improvement is already pretty cool but I think the real selling point of streaming would be if the resulting program used "constant" (with respect to the stream length) memory. Could you try adding a weigh benchmark as well?

Magalame · 2019-05-07T18:45:34Z

You're very welcome @ocramz !
Eventually I rethought my benchmarking process and unsurprisingly it was flawed. I modified it such that the test data would only be created once at runtime (using unsafePerformIO), and I removed the cost of creating said data from the benchmark. And we actually end up with some nice surprises:

benchmarking Tests/Vec -- compares two identical frames
time                 8.207 ms   (8.065 ms .. 8.324 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 8.297 ms   (8.170 ms .. 8.524 ms)
std dev              471.1 μs   (178.4 μs .. 761.0 μs)
variance introduced by outliers: 29% (moderately inflated)

benchmarking Tests/Stream -- same here, but with Streamly as backend
time                 5.637 ms   (5.598 ms .. 5.692 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 5.643 ms   (5.619 ms .. 5.672 ms)
std dev              84.95 μs   (63.52 μs .. 115.0 μs)

benchmarking Tests/takeRowsVec 1000 -- performance when taking 1000 rows out of a frame with 1000 rows
time                 7.628 ms   (7.567 ms .. 7.683 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 7.752 ms   (7.698 ms .. 7.840 ms)
std dev              196.6 μs   (117.4 μs .. 287.3 μs)

benchmarking Tests/takeRowsVec 500
time                 4.044 ms   (4.020 ms .. 4.068 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.066 ms   (4.047 ms .. 4.089 ms)
std dev              65.60 μs   (48.28 μs .. 96.86 μs)

benchmarking Tests/takeRowsVec 200
time                 1.981 ms   (1.851 ms .. 2.081 ms)
                     0.989 R²   (0.986 R² .. 0.998 R²)
mean                 1.886 ms   (1.861 ms .. 1.915 ms)
std dev              96.98 μs   (73.04 μs .. 131.7 μs)
variance introduced by outliers: 36% (moderately inflated)

benchmarking Tests/takeRowsVec 100
time                 1.337 ms   (1.328 ms .. 1.349 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.351 ms   (1.342 ms .. 1.362 ms)
std dev              34.67 μs   (27.39 μs .. 47.55 μs)
variance introduced by outliers: 15% (moderately inflated)

benchmarking Tests/takeRowsStream 1000 -- same as above but with Streamly
time                 5.912 ms   (5.885 ms .. 5.933 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.005 ms   (5.974 ms .. 6.056 ms)
std dev              121.0 μs   (89.25 μs .. 185.5 μs)

benchmarking Tests/takeRowsStream 500
time                 3.208 ms   (3.192 ms .. 3.221 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.213 ms   (3.201 ms .. 3.226 ms)
std dev              39.64 μs   (28.16 μs .. 55.11 μs)

benchmarking Tests/takeRowsStream 200
time                 1.525 ms   (1.510 ms .. 1.539 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 1.651 ms   (1.618 ms .. 1.694 ms)
std dev              128.2 μs   (104.1 μs .. 168.8 μs)
variance introduced by outliers: 58% (severely inflated)

benchmarking Tests/takeRowsStream 100
time                 996.8 μs   (990.1 μs .. 1.005 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.005 ms   (998.2 μs .. 1.013 ms)
std dev              24.47 μs   (20.43 μs .. 31.05 μs)
variance introduced by outliers: 13% (moderately inflated)

So Streamly seems to be on average 20% quicker. However it seems to consume more memory:
With a 500-row frame:

Case                Allocated  GCs
Vec                   900,920    0
Stream                971,784    0
takeRowsVec 500       501,968    0
takeRowsVec 200       504,712    0
takeRowsVec 100       506,240    0
takeRowsStream 500    605,368    0
takeRowsStream 200    546,632    0
takeRowsStream 100    519,648    0

1000:

Case                Allocated  GCs
Vec                 1,873,304    1
Stream              2,004,552    1
takeRowsVec 500     1,040,648    1
takeRowsVec 200     1,046,944    1
takeRowsVec 100     1,046,832    1
takeRowsStream 500  1,151,240    1
takeRowsStream 200  1,083,864    1
takeRowsStream 100  1,062,608    1

1500:

Case                Allocated  GCs
Vec                 2,864,264    2
Stream              3,046,216    2
takeRowsVec 500     1,582,736    1
takeRowsVec 200     1,587,816    1
takeRowsVec 100     1,589,776    1
takeRowsStream 500  1,684,936    1
takeRowsStream 200  1,631,704    1
takeRowsStream 100  1,608,144    1

Magalame · 2019-05-07T22:20:10Z

I took a look at the original repo, and I stumbled upon ejconlon/analyze#3 if it's of any relevance here

ocramz · 2019-05-08T06:08:37Z

Streamly seems to be on average 20% quicker. However it seems to consume more memory

this is where it gets tricky and possibly more eyes on the problem would be beneficial. @Magalame Could you send in your benchmark code as a PR? Thanks!

ocramz · 2019-05-09T15:45:31Z

Closed by #54 . Thanks to @Magalame !

ocramz added enhancement New feature or request help wanted Extra attention is needed R&D: library Research and (re-)design a library component labels Jan 13, 2019

Magalame mentioned this issue Apr 28, 2019

Performance when comparing streams composewell/streamly#189

Closed

Magalame mentioned this issue May 8, 2019

Benchmarking Streamly backend #54

Merged

ocramz closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyze : evaluate `streaming` for RFrame #35

analyze : evaluate `streaming` for RFrame #35

ocramz commented Jan 13, 2019 •

edited

Magalame commented Apr 23, 2019 •

edited

adamConnerSax commented Apr 23, 2019

Magalame commented Apr 30, 2019

Magalame commented May 6, 2019 •

edited

ocramz commented May 7, 2019 •

edited

Magalame commented May 7, 2019 •

edited

Magalame commented May 7, 2019

ocramz commented May 8, 2019

ocramz commented May 9, 2019

analyze : evaluate streaming for RFrame #35

analyze : evaluate streaming for RFrame #35

Comments

ocramz commented Jan 13, 2019 • edited

Magalame commented Apr 23, 2019 • edited

adamConnerSax commented Apr 23, 2019

Magalame commented Apr 30, 2019

Magalame commented May 6, 2019 • edited

ocramz commented May 7, 2019 • edited

Magalame commented May 7, 2019 • edited

Magalame commented May 7, 2019

ocramz commented May 8, 2019

ocramz commented May 9, 2019

analyze : evaluate `streaming` for RFrame #35

analyze : evaluate `streaming` for RFrame #35

ocramz commented Jan 13, 2019 •

edited

Magalame commented Apr 23, 2019 •

edited

Magalame commented May 6, 2019 •

edited

ocramz commented May 7, 2019 •

edited

Magalame commented May 7, 2019 •

edited