Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read in CPLEX Results #190

Merged
merged 6 commits into from
Aug 23, 2023
Merged

Read in CPLEX Results #190

merged 6 commits into from
Aug 23, 2023

Conversation

trevorb1
Copy link
Member

@trevorb1 trevorb1 commented Aug 21, 2023

Description

In this PR I implement logic to read in raw CPLEX solution files. The ReadCplex class has been updated to directly read in the solution file via the pandas method read_xml() (as CPLEX solution files are written out in XML format). Tests have not been added yet as I am unsure if this is the logic we want. If we do decide to implement this logic, I will first need to add tests.

Issue Ticket Number

Closes #2 and
Closes #20 and
Closes #29

Questions

The logic has been changed so the transformation and sorting of the CPLEX solution files are no longer needed. Instead, the pandas.read_xml() method is used to directly read in only the variables from the solution file (via the xpath argument).

My question is, is there a reason we are not currently using the read_xml() method to read in CPLEX solution files? I see that read_xml() was released in pandas 1.3.0 (July 2021), after these issues were created. So I am not sure if it's as simple as read_xml() not being available at the time? Or if there are performance issues associated with read_xml() I am not aware of? Or of some other reason?

A notable advantage of using read_xml() is that the ReadCplex class can be significantly simplified to be very similar to the ReadGurobi class (as implemented in this PR). Moreover, this solution reuses the logic in the ReadWideResults class to convert data. Finally, as this solution works with the parser etree, we do not need the optional dependency of lxml.

Documentation

To be done if this suggestion in implemented. But using the following commands on the Simplicity repository worked fine with this change

glpsol -m osemosys_fast.txt -d data.txt --wlp model.lp --check
cplex -c "read model.lp" "optimize" "write cplex.sol"
otoole results cplex csv cplex.sol results csv data config.yaml

Example

For clarity, below is the data structure returned by the ReadCplex._convert_to_dataframe("cplex.sol") method based on the simplicity example

image

@trevorb1 trevorb1 requested a review from willu47 August 21, 2023 19:21
@willu47
Copy link
Member

willu47 commented Aug 22, 2023

This seems like a much simpler solution to parsing the CPLEX solution file. I suspect that we haven't used read_xml because i) we didn't think to look for it, ii) the existing CPLEX script worked well enough.

So, I'm fully supportive of this approach, the code is considerably simpler, and I am sure that the Pandas implementation will be faster and more performant than what we had before anyway.

My only remaining question is how easily can we extract dual values from the CPLEX solution file using this approach?

@trevorb1
Copy link
Member Author

Great, thanks @willu47!

To summarize the changes in this PR:

  • ReadCplex now inherits from the ReadWideResults class and implements the pandas.read_xml(...) method in ReadCplex._convert_to_dataframe( .. )
  • TestReadCplex has been updated to reflect reading in raw CPLEX solution files
  • The CLI Examples documentation page has been updated to show how to process CPLEX results
  • The CLI Examples documentation page has been restructured so convert and results examples are now seperate
  • Tests have been added for the preprocess.longify_data.check_datatypes() function. I didn't change anything in this function, just noticed that it didn't have tests.

If you have any remaining comments on this implementation, please just let me know. Else, I will go ahead and merge this branch tomorrow. Thanks!!

@trevorb1
Copy link
Member Author

Hi @willu47! Regarding extracting dual values from CPLEX solution files; I'm not sure what process people currently use to extract dual values, so I am not sure if this process will be any better. But if we wanted to use similar logic to what is implemented in this PR, I think we could use what is shown below! (Note, this snippet removes all dual values that are zero)

Raw CPLEX Solution
image

Read in Dual Values
image

A notable advantage of this approach is that the data structure returned here is very similar to what is returned from the ReadWideResults._convert_to_dataframe( .. ) method. Therefore, otoole has logic to process dataframes similar to this already :) A notable disadvantage of this approach is that we are reading in the solution file again, separate from reading in of the variables. But given we are filtering for only the constraints when reading, this may not be a huge performance hit?

@trevorb1
Copy link
Member Author

df = pd.read_xml("sol.sol", xpath=".//constraint", parser="etree")
df[["Constraint", "Index"]] = df["name"].str.split("(", expand=True)
df["Index"] = df["Index"].str.replace(")", "", regex=False)
df = df[(df["dual"] != 0)].reset_index().rename(columns={"value":"Value", "dual":"Dual"})
df = df[["Constraint", "Index", "Dual"]]
df

Copy link
Member

@willu47 willu47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trevorb1 - happy for this to be merged as is.

Can you add the dual value reading as an issue? We should think about how to include this in the results config e.g. signifying that you want to read in the dual values of particular constraints, and then provide an implementation that works for all the solution files supported. It would make sense to read in this information at the same time as the variables, rather than using a second pass through the file.

This could take us more towards creating a wrapper around solvers rather like that provided by linopy, or other Python libraries such as pyomo or PuLP.

Other information which could be useful includes:

  • reducedcost of variables
  • slack of constraints
  • solution quality
  • objective value
  • solving statistics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants