Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression #3435

Closed
nad opened this issue Dec 6, 2018 · 13 comments
Closed

Performance regression #3435

nad opened this issue Dec 6, 2018 · 13 comments
Assignees
Labels
instance Instance resolution performance Slow type checking, interaction, compilation or execution of Agda programs regression on master Unreleased regression in development version (Change to "regression in ..." should it be released!)
Milestone

Comments

@nad
Copy link
Contributor

nad commented Dec 6, 2018

I have some code that type-checks about ten times slower using Agda 2.6.0-4a5e1dc, compared to Agda 2.5.4.2:

git clone http://www.cse.chalmers.se/~nad/repos/equality/
(cd equality && git checkout c8b6831ab6a447780b21fc8c0e1fad7c181cbb98)
git clone http://www.cse.chalmers.se/~nad/repos/delay-monad/
(cd delay-monad && git checkout 04df7090e6b69a9e0d6733fb30e284f7a31785aa)
git clone http://www.cse.chalmers.se/~nad/repos/up-to/
(cd up-to && git checkout 7e52093f8eb11c37745e646805f6666a0cafb600)
echo $PWD/equality/equality.agda-lib > libraries
echo $PWD/delay-monad/delay-monad.agda-lib >> libraries
echo $PWD/up-to/up-to.agda-lib >> libraries
cd up-to
cat > src/Dependencies.agda <<EOF
module Dependencies where

import Bisimilarity
import Bisimilarity.CCS
import Bisimilarity.Equational-reasoning-instances
import Bisimilarity.Weak
import Bisimilarity.Weak.Equational-reasoning-instances
import Equality.Propositional
import Equational-reasoning
import Expansion
import Expansion.CCS
import Expansion.Equational-reasoning-instances
import Function-universe
import Labelled-transition-system.CCS
import Labelled-transition-system.Equational-reasoning-instances
import Prelude
EOF
agda --ignore-interfaces --no-default-libraries --library-file=../libraries src/Dependencies.agda +RTS -s
agda --no-default-libraries --library-file=../libraries src/Bisimilarity/Weak/CCS.agda +RTS -s

The first call to agda above takes about three times longer (with my current setup), and the second call about ten times longer. However, memory usage ("total memory in use") is lower.

@nad nad added performance Slow type checking, interaction, compilation or execution of Agda programs regression on master Unreleased regression in development version (Change to "regression in ..." should it be released!) labels Dec 6, 2018
@nad nad added this to the 2.6.0 milestone Dec 6, 2018
@UlfNorell
Copy link
Member

What does -v profile:7 say?

@UlfNorell
Copy link
Member

Do you have a bisect running?

@nad
Copy link
Contributor Author

nad commented Dec 7, 2018

What does -v profile:7 say?

For the dependencies, with 2.5.4.2:

Total                        290,161ms            
Miscellaneous                    260ms            
Typing                        11,131ms (214,561ms)
Typing.CheckRHS              111,622ms            
Typing.OccursCheck            51,179ms            
Typing.CheckLHS               18,593ms  (25,862ms)
Typing.CheckLHS.UnifyIndices   7,268ms            
Typing.TypeSig                11,013ms            
Typing.With                    3,753ms            
Termination                    7,972ms  (21,160ms)
Termination.Compare           11,612ms            
Termination.Graph              1,208ms            
Termination.RecCheck             367ms            
Serialization                 12,094ms  (20,034ms)
Serialization.BinaryEncode     4,831ms            
Serialization.BuildInterface   2,131ms            
Serialization.Compress           569ms            
Serialization.Sort               407ms            
Parsing                          842ms  (15,194ms)
Parsing.OperatorsExpr         12,694ms            
Parsing.OperatorsPattern       1,657ms            
Scoping                        4,644ms   (8,275ms)
Scoping.InverseScopeLookup     3,631ms            
Positivity                     4,359ms            
DeadCode                       2,631ms            
Highlighting                   1,528ms            
Coverage                       1,435ms            
Import                           559ms            
Injectivity                       92ms            
ProjectionLikeness                65ms            

With 2.6.0-4a5e1dc:

Total                        902,925ms            
Miscellaneous                    409ms            
Typing                        11,633ms (805,741ms)
Typing.CheckRHS              640,493ms            
Typing.OccursCheck           117,079ms            
Typing.CheckLHS               17,079ms  (23,183ms)
Typing.CheckLHS.UnifyIndices   6,104ms            
Typing.TypeSig                10,452ms            
Typing.With                    2,899ms            
Termination                   16,926ms  (31,690ms)
Termination.Compare           11,089ms            
Termination.Graph              3,024ms            
Termination.RecCheck             650ms            
Serialization                 12,766ms  (22,340ms)
Serialization.BinaryEncode     6,216ms            
Serialization.BuildInterface   2,302ms            
Serialization.Compress           609ms            
Serialization.Sort               446ms            
Scoping                        4,804ms  (16,262ms)
Scoping.InverseScopeLookup    11,458ms            
Parsing                          819ms  (14,452ms)
Parsing.OperatorsExpr         12,041ms            
Parsing.OperatorsPattern       1,591ms            
DeadCode                       3,906ms            
Positivity                     3,427ms            
Coverage                       2,732ms            
Highlighting                   1,574ms            
Import                           239ms            
Injectivity                       88ms            
ProjectionLikeness                57ms            

Typing.CheckRHS is about six times slower, Typing.OccursCheck is about twice as slow, termination checking is about 50% slower, and Scoping.InverseScopeLookup is about three times as slow.

For the final module, with 2.5.4.2:

Total                        28,960ms           
Miscellaneous                    72ms           
Typing                        3,048ms (20,472ms)
Typing.CheckRHS              13,682ms           
Typing.OccursCheck            3,122ms           
Typing.CheckLHS                 202ms    (398ms)
Typing.CheckLHS.UnifyIndices    195ms           
Typing.TypeSig                  219ms           
Deserialization               6,459ms           
Scoping                          90ms    (469ms)
Scoping.InverseScopeLookup      378ms           
Serialization                   275ms    (442ms)
Serialization.BinaryEncode       98ms           
Serialization.BuildInterface     36ms           
Serialization.Compress           18ms           
Serialization.Sort               13ms           
Parsing                          19ms    (368ms)
Parsing.OperatorsExpr           299ms           
Parsing.OperatorsPattern         49ms           
Termination                     162ms    (351ms)
Termination.Compare             102ms           
Termination.Graph                77ms           
DeadCode                        104ms           
Positivity                       68ms           
Import                           50ms           
Highlighting                     47ms           
Coverage                         40ms           
ModuleName                       23ms           

With 2.6.0-4a5e1dc:

Total                        282,133ms            
Miscellaneous                    135ms            
Typing                         6,318ms (269,017ms)
Typing.CheckRHS              239,324ms            
Typing.OccursCheck            22,641ms            
Typing.CheckLHS                  182ms     (523ms)
Typing.CheckLHS.UnifyIndices     340ms            
Typing.TypeSig                   208ms            
Deserialization                6,033ms            
Scoping                           78ms   (4,629ms)
Scoping.InverseScopeLookup     4,551ms            
Serialization                    964ms   (1,151ms)
Serialization.BinaryEncode       110ms            
Serialization.BuildInterface      43ms            
Serialization.Compress            19ms            
Serialization.Sort                12ms            
Termination                      226ms     (382ms)
Termination.Graph                 73ms            
Termination.Compare               71ms            
Termination.RecCheck              10ms            
Parsing                           24ms     (368ms)
Parsing.OperatorsExpr            296ms            
Parsing.OperatorsPattern          47ms            
DeadCode                         126ms            
Import                           110ms            
Positivity                        68ms            
Highlighting                      56ms            
Coverage                          53ms            

Typing.CheckRHS is about seventeen times slower, Typing.OccursCheck is about seven times slower, Typing is about twice as slow, Scoping.InverseScopeLookup is more than ten times as slow, and serialisation is about three times as slow.

@nad
Copy link
Contributor Author

nad commented Dec 7, 2018

Do you have a bisect running?

Not yet, but I plan to.

@UlfNorell
Copy link
Member

I've done some investigation and I believe the problem is @jespercockx's lifting of the restriction on unconstrained metas in instance goals. In the example there are a lot of instances in scope and I suspect they are now all attempted instead of waiting for the goal type to be solved. I'm working on a stand-alone test case.

@UlfNorell
Copy link
Member

Self-contained test case. Checks more or less instantly on 2.5.4.2 and (at least) quadratic in number of eq reasoning steps on master.

infix  -1 finally₂
infixr -2 step-∼

postulate
  Transitive : {A B : Set} (P : A  A  Set) (Q : A  B  Set)  Set

  step-∼ :  {A B} {P : A  A  Set} {Q : A  B  Set}
             ⦃ t : Transitive P Q ⦄ x {y z} 
             Q y z  P x y  Q x z

  Convertible : {A B : Set} (P Q : A  B  Set)  Set

  finally₂ :  {A B} {P Q : A  B  Set}
               ⦃ c : Convertible P Q ⦄ x y 
               P x y  Q x y

syntax step-∼  x Qyz Pxy = x ∼⟨ Pxy ⟩ Qyz
syntax finally₂  x y x∼y = x ∼⟨ x∼y ⟩■ y

infix 4 _≈_
postulate
  Proc : Set
  _≈_ : Proc  Proc  Set

instance
  postulate
    trans   : Transitive  _≈_ _≈_
    convert : Convertible _≈_ _≈_

module DummyInstances where

  postulate
    R₁ R₂ R₃ R₄ R₅ : Proc  Proc  Set

  postulate
    instance
      transitive₁₁ : Transitive R₁ R₁
      transitive₁₂ : Transitive R₁ R₂
      transitive₁₃ : Transitive R₁ R₃
      transitive₁₄ : Transitive R₁ R₄
      transitive₁₅ : Transitive R₁ R₅
      transitive₂₁ : Transitive R₂ R₁
      transitive₂₂ : Transitive R₂ R₂
      transitive₂₃ : Transitive R₂ R₃
      transitive₂₄ : Transitive R₂ R₄
      transitive₂₅ : Transitive R₂ R₅
      transitive₃₁ : Transitive R₃ R₁
      transitive₃₂ : Transitive R₃ R₂
      transitive₃₃ : Transitive R₃ R₃
      transitive₃₄ : Transitive R₃ R₄
      transitive₃₅ : Transitive R₃ R₅
      transitive₄₁ : Transitive R₄ R₁
      transitive₄₂ : Transitive R₄ R₂
      transitive₄₃ : Transitive R₄ R₃
      transitive₄₄ : Transitive R₄ R₄
      transitive₄₅ : Transitive R₄ R₅
      transitive₅₁ : Transitive R₅ R₁
      transitive₅₂ : Transitive R₅ R₂
      transitive₅₃ : Transitive R₅ R₃
      transitive₅₄ : Transitive R₅ R₄
      transitive₅₅ : Transitive R₅ R₅

postulate
  ⊕-comm :  {P Q}  P ≈ Q
  P Q    : Proc

⊕-cong : P ≈ P
⊕-cong =
  P ∼⟨ ⊕-comm ⟩
  Q ∼⟨ ⊕-comm ⟩
  P ∼⟨ ⊕-comm ⟩
  Q ∼⟨ ⊕-comm ⟩
  P ∼⟨ ⊕-comm ⟩
  Q ∼⟨ ⊕-comm ⟩
  P ∼⟨ ⊕-comm ⟩
  Q ∼⟨ ⊕-comm ⟩
  P ∼⟨ ⊕-comm ⟩
  Q ∼⟨ ⊕-comm ⟩■
  P

@nad
Copy link
Contributor Author

nad commented Dec 7, 2018

OK, then I won't run a bisection.

@nad
Copy link
Contributor Author

nad commented Dec 7, 2018

In case anyone else wants to do this the following code might be a suitable starting point (the code assumes that the setup described in the OP was performed in regression, a subdirectory of the top-level of an Agda source tree):

$ cat script.hs 
#!/usr/bin/env runhaskell

{-# LANGUAGE RecordWildCards #-}

import Control.Monad
import Data.List
import System.Directory
import System.Exit
import System.Timeout

import RunAgda

main :: IO ()
main = do
  setCurrentDirectory "regression/up-to"

  runAgda [ "--no-default-libraries"
          , "--library-file=../libraries"
          ] $ \(AgdaCommands { .. }) -> do

    echoUntilPrompt

    loadAndEcho "src/Dependencies.agda"

    let mainFile = "src/Bisimilarity/Weak/CCS.agda"

    let interface = mainFile ++ "i"
    ex <- doesFileExist interface
    when ex $ removeFile interface

    r <- timeout (70*10^6) $ loadAndEcho mainFile

    case r of
      Nothing -> do
        putStrLn "Timeout!"
        exitFailure
      Just ss ->
        if any ("(agda2-status-action \"Checked\")" `isInfixOf`) ss then
          return ()
         else
          exitWith (ExitFailure 127)
$ agda-bisect --cache --good v2.5.4.2 --bad 4a5e1dc --log log --compiler <ghc-7.10.3> --script ./script.hs

Note that this code uses some features of agda-bisect that I pushed just now.

@nad nad added the instance Instance resolution label Dec 7, 2018
@nad
Copy link
Contributor Author

nad commented Dec 7, 2018

The code also relies on test/interaction/RunAgda.hs being symlinked to the top-level directory. However, I noticed now that the bisection process chooses a commit from before the introduction of RunAgda.hs, so the script above does not work.

In general I think bisection would be easier if releases were part of the main "trunk" of the commit tree (so that the current HEAD was a descendant of the releases).

@UlfNorell
Copy link
Member

I think you might want to use the stand-alone test case rather than the one from the OP. But in this case I think a manual bisect would be faster. I'm quite confident that the unconstrained metas in instance search change is the culprit.

@nad
Copy link
Contributor Author

nad commented Dec 9, 2018

In general I think bisection would be easier if releases were part of the main "trunk" of the commit tree (so that the current HEAD was a descendant of the releases).

On closer thought the problem is perhaps rather that the "commit tree" is not a tree but a DAG. I imagine that things would be easier if long-lived branches were rebased before they were incorporated into the main branch.

I think you might want to use the stand-alone test case rather than the one from the OP.

I dumped my partial work above in order to perhaps save time for future investigations of performance regressions.

@UlfNorell
Copy link
Member

PR #3455 fixes the problem, but changed order of constraint solving means a few more sizes need to be given explicitly in the original test case.

@nad
Copy link
Contributor Author

nad commented Dec 17, 2018

I tried this PR. A single experiment gave a slowdown of about 15% for the dependencies. For the final module the slowdown was about 28%. (Note that the code linked above does not quite work using Agda 2.6.0-1b51287. I patched the code and used the same code for both tests.)

I noticed that I have compiled 2.5.4.3 using GHC 8.4.4, and Agda 2.6.0-1b51287 using GHC 8.6.3, so changes to GHC could affect the result, even though I suspect that changes to Agda are to blame.

Profiling output for the dependencies, with 2.5.4.2:

Total                        337,938ms            
Miscellaneous                    473ms            
Typing                        14,143ms (251,642ms)
Typing.CheckRHS              129,994ms            
Typing.OccursCheck            59,494ms            
Typing.CheckLHS               21,818ms  (30,284ms)
Typing.CheckLHS.UnifyIndices   8,466ms            
Typing.TypeSig                13,325ms            
Typing.With                    4,399ms            
Termination                    9,246ms  (25,082ms)
Termination.Compare           13,185ms            
Termination.Graph              1,338ms            
Termination.RecCheck           1,311ms            
Serialization                 12,280ms  (21,374ms)
Serialization.BinaryEncode     5,130ms            
Serialization.BuildInterface   2,818ms            
Serialization.Compress           651ms            
Serialization.Sort               492ms            
Parsing                          978ms  (17,302ms)
Parsing.OperatorsExpr         14,382ms            
Parsing.OperatorsPattern       1,940ms            
Scoping                        5,189ms   (8,970ms)
Scoping.InverseScopeLookup     3,781ms            
DeadCode                       4,546ms            
Positivity                     4,510ms            
Highlighting                   1,925ms            
Coverage                       1,703ms            
Import                           198ms            
Injectivity                      112ms            
ProjectionLikeness                73ms            
ModuleName                        21ms            

With 2.6.0-1b51287:

Total                        388,910ms            
Miscellaneous                  1,044ms            
Typing                        16,148ms (274,763ms)
Typing.CheckRHS              144,724ms            
Typing.OccursCheck            55,884ms            
Typing.CheckLHS               24,073ms  (39,201ms)
Typing.CheckLHS.UnifyIndices  15,128ms            
Typing.TypeSig                14,877ms            
Typing.With                    3,927ms            
Termination                   24,943ms  (42,248ms)
Termination.Compare           12,949ms            
Termination.Graph              3,594ms            
Termination.RecCheck             760ms            
Serialization                 18,259ms  (27,737ms)
Serialization.BinaryEncode     6,111ms            
Serialization.BuildInterface   2,086ms            
Serialization.Compress           739ms            
Serialization.Sort               539ms            
Parsing                        1,486ms  (18,159ms)
Parsing.OperatorsExpr         14,666ms            
Parsing.OperatorsPattern       2,006ms            
Scoping                        5,223ms   (9,828ms)
Scoping.InverseScopeLookup     4,605ms            
DeadCode                       5,147ms            
Positivity                     4,588ms            
Coverage                       2,935ms            
Highlighting                   1,983ms            
Import                           291ms            
Injectivity                      107ms            
ProjectionLikeness                73ms            

Profiling output for the final module, with 2.5.4.2:

Total                        36,008ms           
Miscellaneous                    90ms           
Typing                        3,857ms (26,130ms)
Typing.CheckRHS              17,705ms           
Typing.OccursCheck            3,817ms           
Typing.CheckLHS                 240ms    (468ms)
Typing.CheckLHS.UnifyIndices    228ms           
Typing.TypeSig                  280ms           
Deserialization               7,261ms           
Serialization                   394ms    (617ms)
Serialization.BinaryEncode      130ms           
Serialization.BuildInterface     52ms           
Serialization.Compress           23ms           
Serialization.Sort               16ms           
Scoping                         109ms    (564ms)
Scoping.InverseScopeLookup      455ms           
Parsing                          25ms    (473ms)
Parsing.OperatorsExpr           385ms           
Parsing.OperatorsPattern         61ms           
Termination                     194ms    (431ms)
Termination.Compare             128ms           
Termination.Graph               100ms           
DeadCode                        148ms           
Positivity                       96ms           
Highlighting                     60ms           
Import                           59ms           
Coverage                         54ms           
ModuleName                       26ms           

With 2.6.0-1b51287:

Total                        46,246ms           
Miscellaneous                   174ms           
Typing                        7,588ms (35,309ms)
Typing.CheckRHS              24,025ms           
Typing.OccursCheck            2,800ms           
Typing.CheckLHS                 338ms    (635ms)
Typing.CheckLHS.UnifyIndices    297ms           
Typing.TypeSig                  259ms           
Deserialization               7,709ms           
Scoping                          83ms    (983ms)
Scoping.InverseScopeLookup      899ms           
Serialization                   345ms    (573ms)
Serialization.BinaryEncode      137ms           
Serialization.BuildInterface     50ms           
Serialization.Compress           25ms           
Serialization.Sort               14ms           
Termination                     329ms    (523ms)
Termination.Compare              93ms           
Termination.Graph                88ms           
Termination.RecCheck             11ms           
Parsing                          52ms    (467ms)
Parsing.OperatorsExpr           354ms           
Parsing.OperatorsPattern         59ms           
DeadCode                        153ms           
Import                          125ms           
Positivity                       86ms           
Coverage                         72ms           
Highlighting                     66ms           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
instance Instance resolution performance Slow type checking, interaction, compilation or execution of Agda programs regression on master Unreleased regression in development version (Change to "regression in ..." should it be released!)
Projects
None yet
Development

No branches or pull requests

2 participants