Code generation for large Jacobians in nonlinear system initialization scales badly #11302

casella · 2023-10-02T21:19:48Z

Description

I am experiencing really bad performance of the simcode: created initialization part on a power plant model of about 100,000 equations, where this part takes a whooping 4500 s to generate the code, and over 10,000 s to compile it.

I think I managed to capture the issue in a much simpler MWE, that we can use to identify the issue and solve it.

Steps to Reproduce

Consider the following MWE:

package TestLargeJacobian
  model M
    parameter Integer N = 4;
    Real x[N];
    Real u;
    parameter Real u_start(fixed = false);
  initial equation
    der(x) = zeros(N);
    x[N] = 0;
  equation
    u = u_start;
    der(x[1]) = x[1] - u;
    for i in 2:N - 1 loop
      der(x[i]) = -x[i] + x[i - 1]*(1 + 1e-6*sin(x[i]));
    end for;
    der(x[N]) = -x[N] - x[1] - x[N - 1]*(1 + 1e-6*sin(x[N]));
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M;
  
  model M1000
    extends M(N = 1000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M1000;
  
  model M2000
    extends M(N = 2000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M2000;
  
  model M4000
    extends M(N = 4000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M4000;
end TestLargeJacobian;

Model M replicates the crucial feature of the original model, namely the need to solve a large sparse nonlinear system of equations for initialization. In this case, the initialization problem has about N initial equations, with about 2N non-zero elements in the Jacobian.

On my Windows PC, simcode: create initialization part takes 0.79 s for N = 1000, 2.42 s for N = 2000, 12.2 s for N = 4000. Clearly, this does not scale well, given that the number of nonzero elements in the Jacobian is proportional to N.

I checked the size of generated code: all files scale linearly, except 06.inz. This is a bit weird: for N = 1000 there is one such file of 600 kB, for N = 2000 there are for files totalling 2.8 MB, but for N = 4000 they are actually a bit smaller, 2.4 MB. Not sure why this happens.

@mahge if you want to try profiling this test case, the model code is trivial.

@phannebohm any idea what could be the root cause?

Expected Behavior

Simcode time should scale linearly with N.

The text was updated successfully, but these errors were encountered:

casella · 2023-10-02T21:20:11Z

Keeping @matteodepascali in the loop.

mahge · 2023-10-03T15:44:23Z

More than 30% of the total translation time is spent in the function BackendDAEUtil.markNonlinearIterationVariable. This function is called (only by) markNonlinearIterationVariablesStrongComponent. The relvant parts of the code are these:

OpenModelica/OMCompiler/Compiler/BackEnd/BackendDAEUtil.mo

Lines 10288 to 10319 in 865a1cc

    
           protected function markNonlinearIterationVariablesStrongComponent 
        
             input BackendDAE.StrongComponent comp; 
        
             input output BackendDAE.Variables vars; 
        
           protected 
        
             list<BackendDAE.Var> nonlinear_iteration_vars; 
        
             UnorderedSet<DAE.ComponentRef> set = UnorderedSet.new(ComponentReference.hashComponentRef, ComponentReference.crefEqual); 
        
           algorithm 
        
             nonlinear_iteration_vars := match comp 
        
               local 
        
                 BackendDAE.Jacobian jac; 
        
               case BackendDAE.TORNSYSTEM(strictTearingSet=BackendDAE.TEARINGSET(jac=jac), linear=false)   then SymbolicJacobian.getNonLinearVariables(jac); 
        
               case BackendDAE.EQUATIONSYSTEM(jac=jac, jacType=BackendDAE.JAC_GENERIC())                   then SymbolicJacobian.getNonLinearVariables(jac); 
        
                                                                                                           else {}; 
        
             end match; 
        
             for var in nonlinear_iteration_vars loop 
        
               UnorderedSet.add(var.varName, set); 
        
             end for; 
        
             (vars, _) := BackendVariable.traverseBackendDAEVarsWithUpdate(vars, markNonlinearIterationVariable, set); 
        
           end markNonlinearIterationVariablesStrongComponent; 
        
           protected function markNonlinearIterationVariable 
        
             input output BackendDAE.Var var; 
        
             input output UnorderedSet<DAE.ComponentRef> set; 
        
           algorithm 
        
             if UnorderedSet.contains(var.varName, set) then 
        
               var := BackendVariable.setVarInitNonlinear(var, true); 
        
             end if; 
        
           end markNonlinearIterationVariable; 
        
           annotation(__OpenModelica_Interface="backend"); 
        
           end BackendDAEUtil;

I am not sure if I understood the whole thing but basically the function markNonlinearIterationVariable tries to get a list of non-linear iteration vars from a given strong component. Then it adds these variables to an UnorderedSet (I am guessing for quick check). Then the function markNonlinearIterationVariable (using traverseBackendDAEVarsWithUpdate) goes through each variable in the whole system and marks the ones that exist in the UnorderedSet.

The check for existence in the UnorderedSet is done using the function UnorderedSet.contains which needs to hash the component reference in order to check if it exists. This is where the whole thing actually spends almost all the computation but hashing is of course at the core of how a set would work so there is nothing much to be improved there. Maybe we can improve the hashing function for Component References but it is not the immediate issue here since the hashing is used all over the compiler and works fine.

@kabdelhak implemented the relevant code fairly recently in #10397 to fix issues introduced by #9263 . Maybe he or @phannebohm can give you a more complete analysis.

casella · 2023-10-03T21:28:58Z

Thanks @mahge for the analysis!

Alas, I'm not enough into the details of how the backend works to be able to give any technical suggestion. What I understand, also given the reference to #9263, is that the point of this part of the code is to figure out which variables appear nonlinearly in a given strong component. These strong components are normally sparse, so there are N equations with max M << N variables in each of them. The following is a naive pseudo code to build the list of nonlinear variables:

for eq in <set of equations>
  for var in <set of variables in the equation
    if var shows up non linearly in eq then 
      look it up in the (lexicographically) ordered list of nonlinear variables
      add it if it's not there
    end if
  end for
end for

The complexity is O(N*M*P) where O(P) is the complexity of looking up one one element in an ordered list, which should be O(log(N))

Then, I would expect the complexity of this algorithm to be O(N*log(N)*M), not O(N^2) or worse. Do I miss something?

Maybe the problem is using an unordered set instead of an ordered list?

Fixes OpenModelica#11302 For each SCC of the system all variables were traversed, so the time complexity was O(N*S) where N is the number of variables and S is the number of SCCs. By first collecting all nonlinear iteration vars in the same set and then marking them once it should now be O(N).

phannebohm · 2023-10-04T10:21:27Z

Thanks @mahge, without your measurement I would never have found the issue so quickly 🚀

The problem was that for each of the S strong components the list of all N variables in the system was traversed, so complexity was something like O(N*S). I fixed that in #11312 to only traverse once so it should be O(N) now.

Maybe the problem is using an unordered set instead of an ordered list?

Unordered sets have approximately constant lookup times because they use hashes for indexing, that's the whole idea of using them over lists or trees 😄

casella · 2023-10-04T10:40:01Z

@phannebohm this looks great if you have many relatively small systems (which we also have in our power plant model at lambda = 0, thanks to smart simplifications). But what if you have only one very big strong component? That is the case of the MWE I posted in this ticket. Did you check how long it takes to carry out simcode: create initialization part after your fix?

phannebohm · 2023-10-04T10:56:38Z

You're right and I thought about that too. But as far as I can see #11312 only saved computation time, no real trade-off except that the unordered set gets larger because it is one big set compared to many smaller sets. But that should still be no real issue since lookup is practically constant.

I tested your MWE. Times went down significantly:

N	old	fixed
1000	1.21	0.2025
2000	4.947	0.482
4000	(whatever)	0.8655

Fixes OpenModelica#11302 For each SCC of the system all variables were traversed, so the time complexity was O(N*S) where N is the number of variables and S is the number of SCCs. By first collecting all nonlinear iteration vars in the same set and then marking them once it should now be O(N).

phannebohm · 2023-10-04T11:09:33Z

BTW, the NB has a completely different structure and things like this should not happen there because we have direct pointers so there is no traversing, only direct access. I think...

Fixes #11302 For each SCC of the system all variables were traversed, so the time complexity was O(N*S) where N is the number of variables and S is the number of SCCs. By first collecting all nonlinear iteration vars in the same set and then marking them once it should now be O(N).

casella · 2023-10-05T18:32:28Z

Judging from the regression report, this commit had very beneficial effects on simcode performance when dealing with large models, including those of the ClaRa library 😃

casella assigned mahge and phannebohm Oct 2, 2023

phannebohm mentioned this issue Oct 4, 2023

Fix slow marking of nonlinear iteration vars #11312

Merged

phannebohm closed this as completed in #11312 Oct 4, 2023

This was referenced Oct 5, 2023

Causalization in component-based ScalableTestSuite models scales as O(N^3) #10131

Closed

Partitioning in the NB scales badly #10122

Closed

Template time growing as O(N^2) when using the NB on heat exchanger model #10252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code generation for large Jacobians in nonlinear system initialization scales badly #11302

Code generation for large Jacobians in nonlinear system initialization scales badly #11302

casella commented Oct 2, 2023

casella commented Oct 2, 2023

mahge commented Oct 3, 2023

casella commented Oct 3, 2023 •

edited

phannebohm commented Oct 4, 2023

casella commented Oct 4, 2023

phannebohm commented Oct 4, 2023

phannebohm commented Oct 4, 2023

casella commented Oct 5, 2023

Code generation for large Jacobians in nonlinear system initialization scales badly #11302

Code generation for large Jacobians in nonlinear system initialization scales badly #11302

Comments

casella commented Oct 2, 2023

Description

Steps to Reproduce

Expected Behavior

casella commented Oct 2, 2023

mahge commented Oct 3, 2023

casella commented Oct 3, 2023 • edited

phannebohm commented Oct 4, 2023

casella commented Oct 4, 2023

phannebohm commented Oct 4, 2023

phannebohm commented Oct 4, 2023

casella commented Oct 5, 2023

casella commented Oct 3, 2023 •

edited