Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem serialization #532

Open
echu opened this issue Jun 24, 2018 · 8 comments
Open

Problem serialization #532

echu opened this issue Jun 24, 2018 · 8 comments

Comments

@echu
Copy link
Contributor

echu commented Jun 24, 2018

@SteveDiamond, any idea if there's a good way to serialize a CVXPY problem?

I know that get_problem_data exists, but that returns the problem data for specific solvers. I basically want to construct some sort of JSON (protobuf, etc.) that contains all the data for expressions to allow me to recreate a CVXPY Problem. This is different from get_problem_data in that I don't want all the data necessary to recreate, say, the input to ECOS or SCS--I just want all the data to recreate the original CVXPY problem.

I essentially want to pickle a Problem object, but would like to avoid all the security issues around python pickles (arbitrary code execution!).

@rileyjmurray
Copy link
Collaborator

Not that I speak for Steven, but I'd be surprised if there were such a way (at least, for now).

One obstacle is that Expression objects are highly recursive data structures. Even if a user has a small number of Constraint objects, the Expressions involved in those objects might take a massive amount of space if written in a format like JSON.

At a higher level, you say that you "just" want the data to recreate the original CVXPY problem, but, that is necessarily more data than the input to ECOS or SCS!

@rileyjmurray
Copy link
Collaborator

rileyjmurray commented Mar 10, 2019

@SteveDiamond actually something recently occured to me about this issue. Is it possible to serialize and de-serialize cvxpy Problems using pickle? Security issues aside, model serialization is a very useful tool. I think it would be far too much work to create a "secure" serialization protocol, but it might be reasonable to have support at least for pickle.

If cvxpy doesn't currently support pickle (and this can accidentally happen for dozens of different reasons) then it may be worth keeping this issue open.

@SteveDiamond
Copy link
Collaborator

@rileyjmurray good point! I added a test for pickling and unpickling a problem, and it seems to work.

@bstellato
Copy link
Contributor

Pickle seems to be an issue again when s.EXTRA_STATS is returned. In particular, the problem becomes unpickleable (is that a word?) when you return solver objects such as Gurobi model. The problematic attributes are:

  • problem._solution.attr[s.EXTRA_STATS]
  • problem._solver_stats.extra_stats

If you are running it in a joblib loop you will get lots of errors.

@SteveDiamond
Copy link
Collaborator

That's unfortunate. What do you think we should do?

@bstellato
Copy link
Contributor

In my code I just did

 if s.EXTRA_STATS in problem._solution.attr:
    problem._solution.attr.pop(s.EXTRA_STATS)
 if hasattr(problem._solver_stats, "extra_stats"):
     del problem._solver_stats.extra_stats

before calling the parallel loop in joblib. However, that defies the purpose of storing s.EXTRA_STATS to recover information from the low level solver model object. One alternative would be to pass an option, e.g., serializable when creating the problem, to avoid storing objects that are not pickleable, but I am not sure for how maintainable it would be.

@rileyjmurray
Copy link
Collaborator

My understanding is that we should only run into pickling issues when dealing with objects from third-party libraries like solver interfaces. If that's the case then we can take advantage of qualitative "serialization" features that are sure to exist for those interfaces. For mosek we have a save_file keyword argument that controls where a mosek Task object is dumped to disk. I'm sure gurobi has an option to export to (and read from) a dozen different file formats; if we don't want to write the model to disk, then we might be able to set the target "file" as a buffer in python memory (or otherwise store that converted low-level data into a serializable python object).

@rileyjmurray rileyjmurray reopened this Feb 24, 2021
@bstellato
Copy link
Contributor

I agree it is not a major issue when dealing with external libraries. I was not referring to serializing to store the problem on disk. I was mostly considering the case when you are solving cvxpy problems in parallel in a loop and you already happen to have an unserializable object inside problem. For example, this is the case when you have solved the problem once before starting the parallel loop.

A simple solution would be to include the code above into a small problem method called, for example, problem.ensure_serializable(). However, I understand the usage might be limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants