Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickled solutions are very large #1497

Closed
pgkirsch opened this issue Jul 1, 2020 · 11 comments
Closed

Pickled solutions are very large #1497

pgkirsch opened this issue Jul 1, 2020 · 11 comments

Comments

@pgkirsch
Copy link
Contributor

pgkirsch commented Jul 1, 2020

SolutionArray.save() is a wonderful feature but it can produce some seriously big files. I recently solved a 42000 free variable model and the pickle file was 110 MB! Even the models I solve on a more frequent basis yield 10-20 MB pickle files. In case anyone is curious the simple text file representation of the solution (output of SolutionArray.savetxt() is ~700 kB.

I previously spoke to @bqpd about this and he mentioned that it should be possible to make these files smaller. If it is a trade-off between how much original model data/functionality is preserved and file size, it would be great if there was an argument to specify tiers of data preservation.

@bqpd
Copy link
Contributor

bqpd commented Jul 2, 2020

without looking into changing just what is pickled, using cpickle and bz2 takes it from 12 to 3.6 MB:

class SolutionArray...

    def save_compressed(self, title="solution", **cpickleargs):
        "Pickle a file and then compress it into a file with extension."
        with bz2.BZ2File(title + ".pbz2", "w") as f:
            cPickle.dump(self, f, **cpickleargs)

    @staticmethod
    def decompress_file(file):
        "Load any compressed pickle file"
        return cPickle.load(bz2.BZ2File(file, "rb"))

@bqpd
Copy link
Contributor

bqpd commented Jul 2, 2020

@pgkirsch check out #1498; what's that bring the 110MB file down to?

There's obviously a lot more that can be done by changing what is pickled as well...

@pgkirsch
Copy link
Contributor Author

pgkirsch commented Jul 3, 2020

27 MB!

@pgkirsch
Copy link
Contributor Author

pgkirsch commented Jul 3, 2020

Definitely a performance trade off, as I'm sure you're aware. In case you're curious: compressing takes ~20 seconds and decompressing takes ~32 seconds (coincidentally the original solution time takes 52 seconds!). With regular pickle it takes 7 seconds to save and 12 seconds to load.

What kinds of things could be selectively left out of the pickle?

@bqpd
Copy link
Contributor

bqpd commented Jul 3, 2020

I think there's something about the way it's including all the constraints that's taking more space than it should. I haven't found any debugging tools which show what's taking up space in a pickle file, so it'll be trial and error figuring out what that might be...

@bqpd
Copy link
Contributor

bqpd commented Jul 10, 2020

(note: down to 1.9MB, much faster loads in lastest commit on #1498)

@bqpd
Copy link
Contributor

bqpd commented Jul 13, 2020

Looking at where the bulk is coming from, 92% is in sol["sensitivities"]["constraints"], and another 2% in sol["sensitivities"]["variables"], both of which are mostly near-zeroes. Only storing constraints with |senss| >= 0.01 results in a file one-quarter the original size.

Further work should be done to determine just why these constraint objects are so large.

@bqpd
Copy link
Contributor

bqpd commented Jul 14, 2020

@pgkirsch between the branches merged above, pickle size should be down to about twelfth of what it was before. It can probably be reduced by another factor of four by cutting insensitive constraints, but I'm a little more hesitant to do that by default.

@bqpd bqpd closed this as completed Jul 14, 2020
@pgkirsch
Copy link
Contributor Author

This is great, thanks so much @bqpd!

@pgkirsch
Copy link
Contributor Author

@bqpd 107 MB --> 4 MB!

@bqpd
Copy link
Contributor

bqpd commented Jul 15, 2020

nice!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants