Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to cbc_solve to specify initial solutions #54

Merged
merged 11 commits into from
Oct 19, 2021
Merged

Add parameter to cbc_solve to specify initial solutions #54

merged 11 commits into from
Oct 19, 2021

Conversation

jeffreyhanson
Copy link
Collaborator

@jeffreyhanson jeffreyhanson commented Aug 17, 2021

Hi,

Thanks again for creating this R package @dirkschumacher! I've been using it to solve a bunch of optimization problems, and I'm now applying it to some multi-objective optimization problems. Specifically, the multi-objective optimization process involves (i) formulating a problem using an objective, (ii) solving it to obtain a solution, (ii) updating the problem formulation with a second objective (replacing the first objective), (iv) adding a constraint to the problem formulation based on first objective and first solution (e.g. ensure performance based on first objective is within 10% of optimality based on first optimization run), and (v) solving it to generate a second solution. This works really well, and I was thinking it might be possible to improve performance by using first solution as the starting/initial solution when generating the second solution.

So, this PR adds functionality so that users can supply a starting solution for the CBC solver. To acheive this, a new parameter (i.e. initial_solution) is added to the cbc_solve function where users can supply a numeric vector containing the initial solution values. Missing (NA) values can be used when initial values do not need to be supplied for certain variables. We can see that the CBC does indeed use the supplied starting solution values because it prints MIPStart solution provided values for [....] when using the initial_solution to supply starting values (e.g. see example below). Additionally, to maintain test coverage, this PR also includes a test for the initial solutions in tests/testthat/test-cbc-solver.R. Also, although CBC can take longer to solve problems when initial solution are supplied (even when the initial solution is an optimal solution), it seems that this only the case for small/simple problems.

Let me know what you think?


Example using intial solution:

# load pacakge
library(rcbc)

# run optimization
A <- as(matrix(c(1, 2, 3, 4), ncol = 2, nrow = 2), "dgTMatrix")
result <- cbc_solve(
          obj = c(1, 2),
          is_integer = c(TRUE, TRUE),
          mat = A,
          row_lb = c(0, 0),
          row_ub = c(1, 2),
          max = TRUE,
          cbc_args = list("logLevel" = 3),
          initial_solution = c(0, NA_real_))

@dirkschumacher
Copy link
Owner

Thanks for the PR. Exciting! Sorry I totally missed that. Will try to review it until end of month.

@jeffreyhanson
Copy link
Collaborator Author

Awesome - thanks - no worries! Let me know if there's anything I can do to help.

Copy link
Owner

@dirkschumacher dirkschumacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I am wondering if we can manage to not depend on {withr}.

I think the interface is fine with NAs for missing initial values.

assert_that(length(initial_solution) == length(obj))
assert_that(
all(
initial_solution[is_integer] == round(initial_solution[is_integer]),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with CBC if you pass fractional initial solutions? (just wondering)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a new branch with this argument validation check removed to explore what happens if fractional values are passed as integer values for initial solutions. Specifically, I've added a test that attempts to do this and it appears to result in a segfault (on my computer at least, Kubuntu 21.04) (see here). I did notice that this wasn't an issue for smaller and simpler problems though. The output on my computer when running this test is as follows:

Welcome to the CBC MILP Solver 
Version: 2.10.5 
Build Date: Oct 15 2020 

command line - problem -logLevel 3 -solve -quit (default strategy 1)
logLevel was changed from 1 to 3
Continuous objective value is 0.465866 - 0.00 seconds
Cgl0004I processed model has 20 rows, 50 columns (30 integer (30 of which binary)) and 600 elements
Cbc0045I MIPStart provided solution with cost 5.72608
BFeasible (0) - obj 3.09387 3.09387
Cbc0012I Integer solution of 3.0938721 found by Reduced search after 0 iterations and 0 nodes (0.00 seconds)
Cbc0045I Heuristic DiveCoefficient took 2.5e-05 seconds (good)
BFeasible (4.44089e-16) - obj 0.722119 0.722119
Cbc0012I Integer solution of 0.72211943 found by DiveCoefficient after 0 iterations and 0 nodes (0.00 seconds)
Cgl0009I 8 elements changed
Cgl0004I processed model has 17 rows, 24 columns (4 integer (4 of which binary)) and 253 elements
Cbc0038I Full problem 20 rows 50 columns, reduced to 17 rows 24 columns
Cbc0028I Starting sub-tree for CbcHeuristicRINS - maximum nodes 200
Cbc0045I Heuristic rounding took 0 seconds (no good)
Cbc0019I Exiting on maximum solutions
Cbc0005I Partial search - best objective 1e+50 (best possible 0.46586644), took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cbc0029I Ending sub-tree for CbcHeuristicRINS
Cbc0045I Heuristic RINS took 0.000663 seconds (no good)
Cbc0046I Root node pass 1, 20 rows, 0 total tight cuts  -  objective 0.46586644
Cbc0046I Root node pass 2, 24 rows, 4 total tight cuts  -  objective 0.4791873
Cbc0046I Root node pass 3, 24 rows, 4 total tight cuts  -  objective 0.48074106
Cbc0046I Root node pass 4, 23 rows, 3 total tight cuts  -  objective 0.48084226
BFeasible (0) - obj 0.480842 0.480842
Cbc0012I Integer solution of 0.48084226 found by DiveCoefficient after 10 iterations and 0 nodes (0.00 seconds)
Cbc0031I 3 added rows had average density of 27.333333
Cbc0013I At root node, 3 cuts changed objective from 0.46586644 to 0.48084226 in 4 passes
Cbc0014I Cut generator 0 (Probing) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 1 (Gomory) - 7 row cuts average 31.7 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 6 row cuts average 26.2 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 14 row cuts average 30.1 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0001I Search completed - best objective 0.4808422567117744, took 10 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 16 variables fixed on reduced cost
BFeasible (0) - obj 0.480842 0.480842
Cuts at root node changed objective from 0.465866 to 0.480842
Probing was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 4 times and created 7 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 4 times and created 6 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 4 times and created 14 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: .Call(`_rcbc_cpp_cbc_solve`, obj, isMaximization, rowIndices,     colIndices, elements, integerIndices, colLower, colUpper,     rowLower, rowUpper, arguments, initialIndex, initialSolution,     initialNames, useInitialSolution)
 2: cpp_cbc_solve(obj = obj, isMaximization = max, rowIndices = mat@i,     colIndices = mat@j, elements = mat@x, integerIndices = which(is_integer) -         1, colLower = col_lb, colUpper = col_ub, rowLower = row_lb,     rowUpper = row_ub, arguments = cbc_args, initialIndex = initial_index -         1, initialSolution = initial_solution, initialNames = initial_names,     useInitialSolution = use_initial_solution)
 3: cbc_solve(obj = obj, is_integer = is_integer, mat = A, row_lb = row_lb,     row_ub = row_ub, col_lb = rep(0, n_cols), col_ub = rep(1,         n_cols), max = FALSE, initial_solution = runif(n_cols,         0.01, 0.99), cbc_args = list(logLevel = 3))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 
Selection: 

Given the potential chance of segfault, I suggest we keep these checks for integer-ish numbers. How does that sound? Do you have any better ideas for checking if a numeric vector is integer-ish?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks. I agree. I think it is fine to compare to rounded values. It would probably better to also convert them to integers and then compare for equality.

R/cbc_solve.R Outdated
initial_index <- which(is.finite(initial_solution[is_integer]))
## create names for starting solution variables
## note: this is because the C++ CBC interface need them for starting values
withr::with_options(list(scipen = 9999), {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Is this the only way to do it? Without withr, you could set the option and then use on.exit to restore it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can you give me an example where this fails?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can replace it with the on.exit approach

Yeah, under the default options, R can sometimes convert the different numbers to the same character representation the number. For example:

as.character(c(1e15 + 1, 1e15 + 2))
#> [1] "1e+15" "1e+15"

This causes issues when we want to create a series of unique/distinct identifiers. To avoid this, we can set the scipen option to be really high:

withr::with_options(list(scipen = 9999), as.character(c(1e15 + 1, 1e15 + 2)))
#> [1] "1000000000000001" "1000000000000002"

Of course, this approach isn't infallible. We still run into issues with exceptionally large numbers, and would need something like GMP or MPRF to handle those correctly.

withr::with_options(list(scipen = 9999), as.character(c(1e30 + 1, 1e30+ 2)))
#> [1] "1000000000000000019884624838656" "1000000000000000019884624838656"

For our purposes though, perhaps we might safely assume that people won't try solving problem with so many variables given today's level of technology :)

Copy link
Collaborator Author

@jeffreyhanson jeffreyhanson Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I suppose we don't even need the on.exit() approach unless the user has >1e15 variables, since 1e14 works fine:

as.character(c(1e14 + 1, 1e14+2))
#> [1] "100000000000001" "100000000000002"

How about I just remove this code and add a check to ensure that the character-based numbers are unique?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've that in 529b5a3, but let me know if you want the on.exit() approach instead?

R/cbc_solve.R Show resolved Hide resolved
src/cpp_cbc_solve.cpp Outdated Show resolved Hide resolved
R/cbc_solve.R Show resolved Hide resolved
na.rm = TRUE),
msg = "argument to initial solution does not obey is_integer"
)
assert_that(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, shall we not rather let CBC decide what a feasible initial solution is and how to handle that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! If CBC can automically and reliably determine if initial solutions are feasible, then it doesn't make sense to try and duplicate this in rcbc. I added another test to the branch discussed above to see what would happen in infeasible initial solutions were specified (see here). It segfault on my computer, with the following output:

Welcome to the CBC MILP Solver 
Version: 2.10.5 
Build Date: Oct 15 2020 

command line - problem -logLevel 3 -solve -quit (default strategy 1)
logLevel was changed from 1 to 3
Continuous objective value is 0.465866 - 0.00 seconds
Cgl0004I processed model has 20 rows, 50 columns (30 integer (30 of which binary)) and 600 elements
Cbc0045I MIPStart provided solution with cost 39.9926
BFeasible (0) - obj 8.4163 8.4163
Cbc0012I Integer solution of 8.4162963 found by Reduced search after 0 iterations and 0 nodes (0.00 seconds)
Cbc0045I Heuristic DiveCoefficient took 2.5e-05 seconds (good)
BFeasible (4.44089e-16) - obj 0.722119 0.722119
Cbc0012I Integer solution of 0.72211943 found by DiveCoefficient after 0 iterations and 0 nodes (0.00 seconds)
Cgl0009I 8 elements changed
Cgl0004I processed model has 17 rows, 24 columns (4 integer (4 of which binary)) and 253 elements
Cbc0038I Full problem 20 rows 50 columns, reduced to 17 rows 24 columns
Cbc0028I Starting sub-tree for CbcHeuristicRINS - maximum nodes 200
Cbc0045I Heuristic rounding took 0 seconds (no good)
Cbc0019I Exiting on maximum solutions
Cbc0005I Partial search - best objective 1e+50 (best possible 0.46586644), took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cbc0029I Ending sub-tree for CbcHeuristicRINS
Cbc0045I Heuristic RINS took 0.000699 seconds (no good)
Cbc0046I Root node pass 1, 20 rows, 0 total tight cuts  -  objective 0.46586644
Cbc0046I Root node pass 2, 24 rows, 4 total tight cuts  -  objective 0.4791873
Cbc0046I Root node pass 3, 24 rows, 4 total tight cuts  -  objective 0.48074106
Cbc0046I Root node pass 4, 23 rows, 3 total tight cuts  -  objective 0.48084226
BFeasible (0) - obj 0.480842 0.480842
Cbc0012I Integer solution of 0.48084226 found by DiveCoefficient after 10 iterations and 0 nodes (0.00 seconds)
Cbc0031I 3 added rows had average density of 27.333333
Cbc0013I At root node, 3 cuts changed objective from 0.46586644 to 0.48084226 in 4 passes
Cbc0014I Cut generator 0 (Probing) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 1 (Gomory) - 7 row cuts average 31.7 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 6 row cuts average 26.2 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 14 row cuts average 30.1 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is 1
Cbc0001I Search completed - best objective 0.4808422567117744, took 10 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 16 variables fixed on reduced cost
BFeasible (0) - obj 0.480842 0.480842
Cuts at root node changed objective from 0.465866 to 0.480842
Probing was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 4 times and created 7 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 4 times and created 6 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 4 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 4 times and created 14 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: .Call(`_rcbc_cpp_cbc_solve`, obj, isMaximization, rowIndices,     colIndices, elements, integerIndices, colLower, colUpper,     rowLower, rowUpper, arguments, initialIndex, initialSolution,     initialNames, useInitialSolution)
 2: cpp_cbc_solve(obj = obj, isMaximization = max, rowIndices = mat@i,     colIndices = mat@j, elements = mat@x, integerIndices = which(is_integer) -         1, colLower = col_lb, colUpper = col_ub, rowLower = row_lb,     rowUpper = row_ub, arguments = cbc_args, initialIndex = initial_index -         1, initialSolution = initial_solution, initialNames = initial_names,     useInitialSolution = use_initial_solution)
 3: cbc_solve(obj = obj, is_integer = is_integer, mat = A, row_lb = row_lb,     row_ub = row_ub, col_lb = rep(0, n_cols), col_ub = rep(1,         n_cols), max = FALSE, initial_solution = rep(3, n_cols),     cbc_args = list(logLevel = 3))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

Do you get this behavior on your comptuer? If so, I think it might be good to keep these checks in rcbc? What do you think?

@jeffreyhanson
Copy link
Collaborator Author

Thank you very much for taking a look at this! I'll try and respond to your comments/questions and update the PR by the end of the week.

@jeffreyhanson
Copy link
Collaborator Author

@dirkschumacher, I think I've responded to and addressed all your comments - but please let me know if I missed anything or if anything I've written isn't clear?

@dirkschumacher
Copy link
Owner

Thanks a lot. Initial values are extremely useful. Often you would write a custom heuristic that finds a good first feasible solution and then start the search process. Because without a feasible solution you are missing a bound that can be used to prune the search tree.

DESCRIPTION Outdated
@@ -30,10 +30,10 @@ Imports:
Rcpp,
methods,
Matrix,
assertthat
assertthat,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the comma here? I tried to do it myself but I don't have permission to push to your branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I'll do that now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c7753bb

@dirkschumacher
Copy link
Owner

PS: I cannot test it as CBC fails to install on my new m1 Mac 🙈 But I trust the CI

@jeffreyhanson
Copy link
Collaborator Author

Thanks a lot. Initial values are extremely useful. Often you would write a custom heuristic that finds a good first feasible solution and then start the search process. Because without a feasible solution you are missing a bound that can be used to prune the search tree.

No worries! Yeah, that's basically the use case that motivated this PR.

@jeffreyhanson
Copy link
Collaborator Author

jeffreyhanson commented Oct 19, 2021

PS: I cannot test it as CBC fails to install on my new m1 Mac see_no_evil But I trust the CI

Ah - that must be a bit annoying. Let me know if there's any tests I can run on my computer that would be helpful? E.g., I could post the logs from running checks on my computer (Ubuntu 21.4)?

@dirkschumacher
Copy link
Owner

I seems the checks have not been triggered. Can you make another commit to see if they work now?

@jeffreyhanson
Copy link
Collaborator Author

Sure

@jeffreyhanson
Copy link
Collaborator Author

Just pushed a small doc tweak - that seems to have done it.

@jeffreyhanson
Copy link
Collaborator Author

I'll add you to my fork so you can make any further changes if needed.

@jeffreyhanson
Copy link
Collaborator Author

Looks like the Ubuntu builds failed because curl needs to be installed:

2021-10-19T08:56:57.7176177Z * installing *source* package ‘curl’ ...
2021-10-19T08:56:57.7268232Z ** package ‘curl’ successfully unpacked and MD5 sums checked
2021-10-19T08:56:57.7269637Z ** using staged installation
2021-10-19T08:56:57.7323088Z Package libcurl was not found in the pkg-config search path.
2021-10-19T08:56:57.7325022Z Perhaps you should add the directory containing `libcurl.pc'
2021-10-19T08:56:57.7326234Z to the PKG_CONFIG_PATH environment variable
2021-10-19T08:56:57.7327383Z No package 'libcurl' found
2021-10-19T08:56:57.7340117Z Package libcurl was not found in the pkg-config search path.
2021-10-19T08:56:57.7341545Z Perhaps you should add the directory containing `libcurl.pc'
2021-10-19T08:56:57.7342617Z to the PKG_CONFIG_PATH environment variable
2021-10-19T08:56:57.7343605Z No package 'libcurl' found
2021-10-19T08:56:57.8366606Z Using PKG_CFLAGS=
2021-10-19T08:56:57.8368303Z Using PKG_LIBS=-lcurl

@jeffreyhanson
Copy link
Collaborator Author

I'll update the GitHub actions file to manually install it

@dirkschumacher dirkschumacher merged commit fe51b41 into dirkschumacher:master Oct 19, 2021
@jeffreyhanson
Copy link
Collaborator Author

Awesome - thanks for merging this!

@dirkschumacher dirkschumacher mentioned this pull request Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants