Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation performance #17

Open
etorreborre opened this issue Jun 1, 2018 · 20 comments
Open

Compilation performance #17

etorreborre opened this issue Jun 1, 2018 · 20 comments

Comments

@etorreborre
Copy link

I have been trying to compute the union of 2 sets of 9 elements each and ... well the compiler is still running :-). Have you or anyone else investigating ways to speed up those type level computations?

@dorchard
Copy link
Owner

dorchard commented Jun 1, 2018

Could you post the example? I haven't had any problems with this before. It shouldn't take that long!

@etorreborre
Copy link
Author

etorreborre commented Jun 3, 2018

I am using the following Cmp instance to compare types deriving Generic

type family TypeName a :: Symbol where
  TypeName Double = "Double"
  TypeName Int = "Int"
  TypeName String = "String"
  TypeName (M1 D ('MetaData name _ _ _) f ()) = name
  TypeName a = TypeName (Rep a ())

type family Cmp (a :: k) (b :: k) :: Ordering where
  Cmp a b = CmpSymbol (TypeName a) (TypeName b)

And here is some code which takes a very long time to compile:

import           GHC.Generics
import           Prelude
import           Data.Type.Set

newtype X0  = X0  Int deriving (Generic)
newtype X1  = X1  Int deriving (Generic)
newtype X2  = X2  Int deriving (Generic)
newtype X3  = X3  Int deriving (Generic)
newtype X4  = X4  Int deriving (Generic)
newtype X5  = X5  Int deriving (Generic)
newtype X6  = X6  Int deriving (Generic)
newtype X7  = X7  Int deriving (Generic)
newtype X8  = X8  Int deriving (Generic)
newtype X9  = X9  Int deriving (Generic)
newtype X10 = X10 Int deriving (Generic)
newtype X11 = X11 Int deriving (Generic)
newtype X12 = X12 Int deriving (Generic)
newtype X13 = X13 Int deriving (Generic)
newtype X14 = X14 Int deriving (Generic)
newtype X15 = X15 Int deriving (Generic)
newtype X16 = X16 Int deriving (Generic)
newtype X17 = X17 Int deriving (Generic)
newtype X18 = X18 Int deriving (Generic)
newtype X19 = X19 Int deriving (Generic)

type R0_to_9 = '[
    X0
  , X1
  , X2
  , X3
  , X4
  , X5
  , X6
  , X7
  , X8
  , X9
  ]

type R10_to_19 = '[
    X10
  , X11
  , X12
  , X13
  , X14
  , X15
  , X16
  , X17
  , X18
  , X19
  ]

type R0_to_19 = '[
    X0
  , X1
  , X2
  , X3
  , X4
  , X5
  , X6
  , X7
  , X8
  , X9
  , X10
  , X11
  , X12
  , X13
  , X14
  , X15
  , X16
  , X17
  , X18
  , X19
  ]

rX_0_9 :: Set R0_to_9
rX_0_9 =
  s (X0 1) `append`
  s (X1 1) `append`
  s (X2 1) `append`
  s (X3 1) `append`
  s (X4 1) `append`
  s (X5 1) `append`
  s (X6 1) `append`
  s (X7 1) `append`
  s (X8 1) `append`
  s (X9 1)

rX_10_19 :: Set R10_to_19
rX_10_19 =
  s (X10 1) `append`
  s (X11 1) `append`
  s (X12 1) `append`
  s (X13 1) `append`
  s (X14 1) `append`
  s (X15 1) `append`
  s (X16 1) `append`
  s (X17 1) `append`
  s (X18 1) `append`
  s (X19 1)

s :: a -> Set '[a]
s a = Ext a Empty

rx_0_18 :: Set R0_to_19
rx_0_18 = rX_0_9 `union` rX_10_19

Maybe you have a better idea for the Cmp instance which would make things faster to compile but I suspect that this is only the typelevel computations which are creating too big types. For the record I have more than an intellectual interest in the subject, I am actually planning to use this type-level union of types for a dependency-injection library :-).

@dorchard
Copy link
Owner

Ouch this IS slow. I will look into it when I can. I tried this version but I used an instance of Cmp rather than redefining the whole family, i.e.

type instance Cmp a b =
  CmpSymbol (TypeName a) (TypeName b)

Is that ok?

@etorreborre
Copy link
Author

Is there a "canonical" way to compare types based on their Typeable instance? This is what I would like to have eventually.

@etorreborre
Copy link
Author

And thanks for having a look. My impression is that short of having compiler support for type-level sets we will always be quite slow but I hope I'm wrong (I have to put up to 100 elements in such a structure).

@rahulmutt
Copy link

rahulmutt commented Apr 3, 2019

As a quick way to see how the type family is being evaluated, you can do:

ghc-options: -ddump-tc-trace -ddump-to-file -dsupress-all

Search for *.ddump-tc-trace inside your .stack-work or dist folders.

EDIT: The file gets very big very fast. You may want to terminate the build and just inspect it. I suggest you use a terminal editor like vim/nano since they can handle crazy big files well.

@etorreborre
Copy link
Author

Thanks for the tip, what kind of insights should I be after when reading such a file? Do you think I'd be able to spot some alternative way of doing typelevel sets? My impression is that this is all slow because we are doing a typelevel quicksort after all!

@rahulmutt
Copy link

rahulmutt commented Apr 4, 2019

Main thing is to grep for a particular type function to see what arguments it's being applied to.

It turns out the implementation of Filter in this library takes exponential time and not linear because If isn't lazy since it's a type-level function and type functions are strict. Hence it's evaluating both branches of the If before selecting on the condition.

I did fix this in a local branch (adding a lot of boilerplate) and it ran much faster but I ended up getting a type error with your example program for some reason. Need to take a closer look.

When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.

@etorreborre
Copy link
Author

Thanks a lot @rahulmutt for having a look at this, you must be otherwise terribly busy!

When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.

I just need sets at the type level in my registry library to track the inputs and outputs of functions stored in the Registry. When I use registry for assembling Hedgehog generators I have a case where I have approximately 250 types to track but only 140 once deduplicated. And checking that I can get an element out of the registry takes around 15 to 20 seconds. If I could reduce that time I would do a compile-check in my code instead of discovering that a generator is missing at runtime.

@rahulmutt
Copy link

@etorreborre The following code compiles in about 2 seconds on my machine:

https://gist.github.com/rahulmutt/98a12cb3eec9fe800081a627a8314526

It uses insertion sort which should be OK for small lists - you probably won't be dealing with type-level lists large than, say 1000. Let me know if you need it to run even faster.

@etorreborre
Copy link
Author

Thanks @rahulmutt I wanted to try your solution at my real life solution but I'm bumping into another issue which I also did not know how to solve. This is a closed type family

type family TypeName a :: Symbol where
  TypeName Int = "Int"
  TypeName Text = "Text"
  TypeName (M1 D ('MetaData name _ _ _) f ()) = name
  TypeName a = TypeName (Rep a ())

but I don't know in advance all the types I am going to stick in my registry. Also I'd rather avoid imposing that people derive Generic or anything else on the components they put in the registry. I remember searching for a solution for this but it didn't seem that GHC gives us any handle here. Which is quite weird because it should have a handle of the symbol name corresponding to a type, right?

@etorreborre
Copy link
Author

It really doesn't seem to be possible, otherwise we wouldn't have https://hackage.haskell.org/package/compare-type-0.1.1/docs/Type-Compare.html :-(.

@rahulmutt
Copy link

rahulmutt commented Apr 5, 2019

TypeName can be open - the algorithm does not depend on its closed-ness. The user will have to specify an instance for TypeName for every input/output type they put into a registry OR make it derive Generic.

As of now, it's not possible to get a Symbol from an arbitrary type without having that type having deriving Generic in the first place. The only way you can get a canonical ordering of all types is if GHC supported reifying the TypeRep fingerprint to the type-level as a Nat or if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types.

@etorreborre
Copy link
Author

The user will have to specify an instance for TypeName for every input/output type they put into a registry OR make it derive Generic

Unfortunately that wouldn't be very practical because some types come from libraries and my intention is to reduce boilerplate as much as possible. Maybe this means a trade-off between ease-of-use and compilation times in some cases.

if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types

I doubt that the GHC team is willing to support such an obscure use case. Maybe if I can find other people needing sorting of arbitrary type lists we could make a collective case for it?

@etorreborre
Copy link
Author

I just realized that I would have other issues to solve even if I could solve that one: error messages! I would to a nicer representation of the type level sets otherwise errors would look like:

No instance for (IsSubset
                         (Nub
                            (SortHelper
                               (Nub
                                  (SortHelper
                                     ('[Int]
                                      :++ Nub
                                            (SortHelper2
                                               (NotEqual
                                                  (GHC.TypeLits.CmpSymbol
                                                     (TypeName (Rep Text1 ())) "Int")
                                                  'GT)
                                               Text1
                                               '[]
                                               Int
                                               '[Text]))
                                     '[]))
                               '[]))
                         (Nub
                            (SortHelper
                               ('[Int]
                                :++ Nub
                                      (SortHelper
                                         ('[Text]
                                          :++ Nub
                                                (SortHelper2
                                                   (NotEqual
                                                      (GHC.TypeLits.CmpSymbol
                                                         (TypeName (Rep Text2 ()))
                                                         (TypeName (Rep Text1 ())))
                                                      'GT)
                                                   Text2
                                                   '[]
                                                   Text1
                                                   '[]))
                                         '[]))
                               '[])))

Not great :-)

@etorreborre
Copy link
Author

Actually @rahulmutt I tried something really simple which seems to work ok and here for a small test.

On my big project I observe that the cost of de-duplicating the type lists is amortized after the first to extract something from the registry, so that's a win! (plus I can get nicer lists of types).

This is actually so simple that I'm wondering what I doing wrong :-).

@rahulmutt
Copy link

That's a great, simple solution. Is it still OK that two lists contain the same elements but are not considered the same?

Following along the lines of your solution that avoids type-level comparisons, I've updated the Gist (see Main2.hs) which implements efficient tail-recursive type families that compute whether two lists of types are equal & union of two de-deduplicated lists only assuming we have the ability to check for type equality.

@etorreborre
Copy link
Author

Is it still OK that two lists contain the same elements but are not considered the same?

In my context yes.

I've updated the Gist (see Main2.hs)

Nice! I also did some improvements for my specific problem by using TemplateHaskell) because after all with Template Haskell I can do a bit of type-checking and then emit a simplified expression without duplicate types. And now my compilation times are even faster.

@jmorag
Copy link
Contributor

jmorag commented Jan 21, 2022

Is there a way to get linear performance out of Filter? The examples in https://blog.josephmorag.com/posts/databass3/ take a really really long time to compile as well. I haven't worked out the interaction with the FilterV class, but would changing the Filter type family to not use If recover laziness and run in O(n)?

type family Filter (f :: Flag) (p :: k) (xs :: [k]) :: [k] where
            Filter f p '[]       = '[]
            Filter f p (x ': xs) = Filter' f p x xs (Cmp x p)

type family Filter' (f :: Flag) (p :: k) (x :: k) (xs :: [k]) cmp where
            Filter' FMin p x xs LT = x ': Filter FMin p xs
            Filter' FMin p x xs eqOrGt = Filter FMin p xs
            Filter' FMax p x xs LT = Filter FMax p xs
            Filter' FMax p x xs eqOrGt = x ': Filter FMax p xs

@jmorag
Copy link
Contributor

jmorag commented Jan 21, 2022

Hm, apparently there's no way to reason about type family time complexity. I will investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants