Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document performance of model building functions #480

Closed
lovasoa opened this issue Feb 28, 2021 · 15 comments
Closed

document performance of model building functions #480

lovasoa opened this issue Feb 28, 2021 · 15 comments

Comments

@lovasoa
Copy link
Contributor

lovasoa commented Feb 28, 2021

Hello !
Looking at the code, it looks like the model is stored internally column-wise. Is there a performance penalty in building a model row by row using Highs_addRows ? When adding rows, does highs need to move large quantities of data in its internal sparse matrix representation ?If so, then could it be documented ?

@lovasoa lovasoa changed the title document performance of various functions document performance of model building functions Feb 28, 2021
@jajhall
Copy link
Member

jajhall commented Feb 28, 2021

If you build a model by calling Highs_addRows repeatedly - in the limit, once for each row in the model - there will be a performance penalty. A pass through the column-wise matrix is necessary to insert any number of new rows since, as you identify, data in the column-wise representation must be moved to accommodate it. However, relative to the cost of solving the subsequent LP, the overhead of building the model row-wise is negligible, so it's not worth documenting.

@jajhall jajhall closed this as completed Feb 28, 2021
@lovasoa
Copy link
Contributor Author

lovasoa commented Feb 28, 2021

In the worst case, the complexity of building a problem row by row will be O(n²) (with n the number of constraints), right ? Is this always negligible compared to solving a large problem ?

Other solvers not only have a warning for inefficient model building, but lpsolve for instance even has a way to configure how the matrix is stored internally, and states :

set_add_rowmode
Specifies which add routine performs best. [...]
The speed improvement is spectacular, especially for bigger models, so it is advisable to call this routine to set the mode.
http://lpsolve.sourceforge.net/5.5/set_add_rowmode.htm

@odow
Copy link
Collaborator

odow commented Feb 28, 2021

If possible, create the LP at once:

int Highs_passLp(void* highs, const int numcol, const int numrow,
const int numnz, const double* colcost, const double* collower,
const double* colupper, const double* rowlower,
const double* rowupper, const int* astart, const int* aindex,
const double* avalue) {

If not possible, create row-by-row.

Discussion in the Julia wrapper: jump-dev/HiGHS.jl#41

@lovasoa
Copy link
Contributor Author

lovasoa commented Feb 28, 2021

If not possible, create row-by-row.

I think you mean column-by-column, right ?

I think this should be be explicitly documented. The quadratic behavior of addRows is not obvious and I had to read the code (and to be used to other solver libraries) to understand the drastic performance difference between addRows and addColumns. Adding it to the documentation of the function is a small change and could avoid hours of performance debugging to users.

@odow
Copy link
Collaborator

odow commented Feb 28, 2021

It depends on the modeling environment. There are two likely scenarios:

  • The user supplies the constraint matrix, variable bounds, and objective coefficients all at once from some matrix generator. If so, call Highs_passLp.
  • The user builds the model incrementally, defining variables, and objective function, and constraints. Then potentially modifying the data they have already supplied. In this case, they want to call addRow.

The only time a user would want to call addColumn is if they are doing column generation.

@lovasoa
Copy link
Contributor Author

lovasoa commented Feb 28, 2021

This is what I am suggesting to document. Your comment above should be in the documentation for the addRow function.
And I think you are confusing addRow and addColumn. The efficient one is addColumn, isn't it ? passLp takes a column-wise matrix.

@odow
Copy link
Collaborator

odow commented Feb 28, 2021

And I think you are confusing addRow and addColumn.

No, I really mean addRow. Modellers think in terms of constraints as rows. They will not give you columns containing the non-zero constraint coefficients, because at the point they are defining their variables, they have not defined any constraints yet.

Here's a typical JuMP model (although every other modeling language has something similar):

model = Model(HiGHS.Optimizer)
@variable(model, x[1:5] >= 0)
@constraint(model, sum(x) <= 1)
@objective(model, Max, sum(x))
optimize!(model)

If we decide to start building the model incrementally after the first @variable, we can't supply the complete column because we haven't defined the constraints yet. Therefore, when get get to @constraint, we need to call addRow.
Alternatively, we can store a cache of the problem and call passLp at the start of optimize!.

At no point do we call addColumn with a list of constraint coefficients. (There is one specific use-case for addColumn, and that is column generation https://en.wikipedia.org/wiki/Column_generation.)

This is not really about the difference between addRow and addColumn. It's about the difference between should I pass everything at once, or should I build the model incrementally. If you're using an algebraic modeling language, the AML should take care of this for you (as JuMP does). If you're using HiGHS directly and you don't need the modeling convenience of addRow, you may as well just build the full A matrix and call passLp instead of calling addColumns.

@lovasoa
Copy link
Contributor Author

lovasoa commented Feb 28, 2021

This is not really about the difference between addRow and addColumn

These two functions, despite having similar names, have very different performance characteristics. All that I am saying is that this should be documented.

You are saying that when building the model incrementally, it should be done in the way that leads to a quadractic runtime. Users should probably at least be aware of that.

Some modelers are internally smarter than JuMP and would be able to leverage addRow if it were performant.

@lovasoa
Copy link
Contributor Author

lovasoa commented Feb 28, 2021

Actually, looking at the issue you link, it looks like JuMP does the exact same thing as other modelers, and uses addRow, which makes it slow with HiGHS.

@jajhall
Copy link
Member

jajhall commented Mar 1, 2021

Interesting discussion. The cost of adding rows one-by-one with addRow is, indeed O(n^2) for n constraints. I'd assumed that most models would be built outside HiGHS and passed in as a HighsLp, with addRow being used by - say - a MIP solver adding cuts. It's not hard to have the empty constraint matrix defined as being "neutral", and row-wise storage used so long as addRow is called, switching to column-wise storage when run() is called, or if addCol is called. An LP defined as a HighsLp would immediately be stored column-wise.

Happy to document this

@jajhall jajhall reopened this Mar 1, 2021
@odow
Copy link
Collaborator

odow commented Mar 1, 2021

it looks like JuMP does the exact same thing as other modelers, and uses addRow, which makes it slow with HiGHS.

JuMP has the option, but it's currently not implemented in HiGHS.jl. This issue is to enable the method that allows JuMP to use the cache.

@jajhall
Copy link
Member

jajhall commented Mar 9, 2021

My branch of Highs now switches to row-wise storage of the LP matrix if a row is added to a model with only columns. Thus adding rows one-by-one does not incur the O(n^2) cost. I'll update this issue when the branch is merged in to master.

@feldmeier
Copy link
Collaborator

The matrix orientation code seems to be in master now.
The first call to addRow still has O(nnz) complexity if the matrix had column-wise orientation previoulsy, right? subsequent calls to addRow will have linear complexity.

@jajhall
Copy link
Member

jajhall commented Oct 21, 2021

The matrix orientation code seems to be in master now. The first call to addRow still has O(nnz) complexity if the matrix had column-wise orientation previously, right? subsequent calls to addRow will have linear complexity.

Indeed, and more so. HiGHS looks at the number of entries being added compared with the number in the current matrix and transposes it if it is advantageous. And it's all in master, now

@jajhall
Copy link
Member

jajhall commented Dec 30, 2021

This was fixed long ago!

@jajhall jajhall closed this as completed Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants