Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generation for large Jacobians in nonlinear system initialization scales badly #11302

Closed
casella opened this issue Oct 2, 2023 · 8 comments · Fixed by #11312
Closed
Assignees

Comments

@casella
Copy link
Contributor

casella commented Oct 2, 2023

Description

I am experiencing really bad performance of the simcode: created initialization part on a power plant model of about 100,000 equations, where this part takes a whooping 4500 s to generate the code, and over 10,000 s to compile it.

I think I managed to capture the issue in a much simpler MWE, that we can use to identify the issue and solve it.

Steps to Reproduce

Consider the following MWE:

package TestLargeJacobian
  model M
    parameter Integer N = 4;
    Real x[N];
    Real u;
    parameter Real u_start(fixed = false);
  initial equation
    der(x) = zeros(N);
    x[N] = 0;
  equation
    u = u_start;
    der(x[1]) = x[1] - u;
    for i in 2:N - 1 loop
      der(x[i]) = -x[i] + x[i - 1]*(1 + 1e-6*sin(x[i]));
    end for;
    der(x[N]) = -x[N] - x[1] - x[N - 1]*(1 + 1e-6*sin(x[N]));
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M;
  
  model M1000
    extends M(N = 1000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M1000;
  
  model M2000
    extends M(N = 2000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M2000;
  
  model M4000
    extends M(N = 4000);
    annotation(__OpenModelica_commandLineOptions = "--tearingMethod=minimalTearing");
  end M4000;
end TestLargeJacobian;

Model M replicates the crucial feature of the original model, namely the need to solve a large sparse nonlinear system of equations for initialization. In this case, the initialization problem has about N initial equations, with about 2N non-zero elements in the Jacobian.

On my Windows PC, simcode: create initialization part takes 0.79 s for N = 1000, 2.42 s for N = 2000, 12.2 s for N = 4000. Clearly, this does not scale well, given that the number of nonzero elements in the Jacobian is proportional to N.

I checked the size of generated code: all files scale linearly, except 06.inz. This is a bit weird: for N = 1000 there is one such file of 600 kB, for N = 2000 there are for files totalling 2.8 MB, but for N = 4000 they are actually a bit smaller, 2.4 MB. Not sure why this happens.

@mahge if you want to try profiling this test case, the model code is trivial.

@phannebohm any idea what could be the root cause?

Expected Behavior

Simcode time should scale linearly with N.

@casella
Copy link
Contributor Author

casella commented Oct 2, 2023

Keeping @matteodepascali in the loop.

@mahge
Copy link
Contributor

mahge commented Oct 3, 2023

More than 30% of the total translation time is spent in the function BackendDAEUtil.markNonlinearIterationVariable. This function is called (only by) markNonlinearIterationVariablesStrongComponent. The relvant parts of the code are these:

protected function markNonlinearIterationVariablesStrongComponent
input BackendDAE.StrongComponent comp;
input output BackendDAE.Variables vars;
protected
list<BackendDAE.Var> nonlinear_iteration_vars;
UnorderedSet<DAE.ComponentRef> set = UnorderedSet.new(ComponentReference.hashComponentRef, ComponentReference.crefEqual);
algorithm
nonlinear_iteration_vars := match comp
local
BackendDAE.Jacobian jac;
case BackendDAE.TORNSYSTEM(strictTearingSet=BackendDAE.TEARINGSET(jac=jac), linear=false) then SymbolicJacobian.getNonLinearVariables(jac);
case BackendDAE.EQUATIONSYSTEM(jac=jac, jacType=BackendDAE.JAC_GENERIC()) then SymbolicJacobian.getNonLinearVariables(jac);
else {};
end match;
for var in nonlinear_iteration_vars loop
UnorderedSet.add(var.varName, set);
end for;
(vars, _) := BackendVariable.traverseBackendDAEVarsWithUpdate(vars, markNonlinearIterationVariable, set);
end markNonlinearIterationVariablesStrongComponent;
protected function markNonlinearIterationVariable
input output BackendDAE.Var var;
input output UnorderedSet<DAE.ComponentRef> set;
algorithm
if UnorderedSet.contains(var.varName, set) then
var := BackendVariable.setVarInitNonlinear(var, true);
end if;
end markNonlinearIterationVariable;
annotation(__OpenModelica_Interface="backend");
end BackendDAEUtil;

I am not sure if I understood the whole thing but basically the function markNonlinearIterationVariable tries to get a list of non-linear iteration vars from a given strong component. Then it adds these variables to an UnorderedSet (I am guessing for quick check). Then the function markNonlinearIterationVariable (using traverseBackendDAEVarsWithUpdate) goes through each variable in the whole system and marks the ones that exist in the UnorderedSet.

The check for existence in the UnorderedSet is done using the function UnorderedSet.contains which needs to hash the component reference in order to check if it exists. This is where the whole thing actually spends almost all the computation but hashing is of course at the core of how a set would work so there is nothing much to be improved there. Maybe we can improve the hashing function for Component References but it is not the immediate issue here since the hashing is used all over the compiler and works fine.

@kabdelhak implemented the relevant code fairly recently in #10397 to fix issues introduced by #9263 . Maybe he or @phannebohm can give you a more complete analysis.

@casella
Copy link
Contributor Author

casella commented Oct 3, 2023

Thanks @mahge for the analysis!

Alas, I'm not enough into the details of how the backend works to be able to give any technical suggestion. What I understand, also given the reference to #9263, is that the point of this part of the code is to figure out which variables appear nonlinearly in a given strong component. These strong components are normally sparse, so there are N equations with max M << N variables in each of them. The following is a naive pseudo code to build the list of nonlinear variables:

for eq in <set of equations>
  for var in <set of variables in the equation
    if var shows up non linearly in eq then 
      look it up in the (lexicographically) ordered list of nonlinear variables
      add it if it's not there
    end if
  end for
end for

The complexity is O(N*M*P) where O(P) is the complexity of looking up one one element in an ordered list, which should be O(log(N))

Then, I would expect the complexity of this algorithm to be O(N*log(N)*M), not O(N^2) or worse. Do I miss something?

Maybe the problem is using an unordered set instead of an ordered list?

phannebohm added a commit to phannebohm/OpenModelica that referenced this issue Oct 4, 2023
Fixes OpenModelica#11302

For each SCC of the system all variables were traversed, so the time
complexity was O(N*S) where N is the number of variables and S is
the number of SCCs.
By first collecting all nonlinear iteration vars in the same set and
then marking them once it should now be O(N).
@phannebohm
Copy link
Contributor

Thanks @mahge, without your measurement I would never have found the issue so quickly 🚀

The problem was that for each of the S strong components the list of all N variables in the system was traversed, so complexity was something like O(N*S). I fixed that in #11312 to only traverse once so it should be O(N) now.

Maybe the problem is using an unordered set instead of an ordered list?

Unordered sets have approximately constant lookup times because they use hashes for indexing, that's the whole idea of using them over lists or trees 😄

@casella
Copy link
Contributor Author

casella commented Oct 4, 2023

@phannebohm this looks great if you have many relatively small systems (which we also have in our power plant model at lambda = 0, thanks to smart simplifications). But what if you have only one very big strong component? That is the case of the MWE I posted in this ticket. Did you check how long it takes to carry out simcode: create initialization part after your fix?

@phannebohm
Copy link
Contributor

You're right and I thought about that too. But as far as I can see #11312 only saved computation time, no real trade-off except that the unordered set gets larger because it is one big set compared to many smaller sets. But that should still be no real issue since lookup is practically constant.

I tested your MWE. Times went down significantly:

N old fixed
1000 1.21 0.2025
2000 4.947 0.482
4000 (whatever) 0.8655

phannebohm added a commit to phannebohm/OpenModelica that referenced this issue Oct 4, 2023
Fixes OpenModelica#11302

For each SCC of the system all variables were traversed, so the time
complexity was O(N*S) where N is the number of variables and S is
the number of SCCs.
By first collecting all nonlinear iteration vars in the same set and
then marking them once it should now be O(N).
@phannebohm
Copy link
Contributor

BTW, the NB has a completely different structure and things like this should not happen there because we have direct pointers so there is no traversing, only direct access. I think...

phannebohm added a commit that referenced this issue Oct 4, 2023
Fixes #11302

For each SCC of the system all variables were traversed, so the time
complexity was O(N*S) where N is the number of variables and S is
the number of SCCs.
By first collecting all nonlinear iteration vars in the same set and
then marking them once it should now be O(N).
@casella
Copy link
Contributor Author

casella commented Oct 5, 2023

Judging from the regression report, this commit had very beneficial effects on simcode performance when dealing with large models, including those of the ClaRa library 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants