[IDEA] Generation of problem files with mixed-representation index sets #602

whart222 · 2018-07-04T18:34:42Z

Fixes #567 (partial).

Summary/Motivation:

We recently made a small change that makes the default representation of Sets in Pyomo rely on insertion order. This PR completes that activity by resolving the simple test failures outlined in #567. Specifically, these changes allow mixed-representation sets to be used as index sets. The result is that problem files can be generated with determinism=0, which does no sorting of index values.

However, these changes only work with CPython (3.6, 3.7) and PyPy. Starting with Python3.6, the CPython and PyPy Python implementations have deterministic key orderings in their dictionary representations. And as of Python 3.7 this property is a part of the Python language specification. This feature is exploited to provide deterministic file generation without sorting index values.

NOTE: This is a partial fix of #567, since it only applies to more recent versions of Python. However, there is not a clear motivation for extending this fix for older versions of Python. However, this PR is motivated by the fact that sorting is not done during file generation, and hence files are generated more quickly (especially for models with constraints that have large index sets).

NOTE: Sorted ordering of mixed-representation sets often works for Python 2.7 (using determinism=1), since that version allows for comparison of more data types.

NOTE: This PR does not simply use the sorted_robust() function to sort when generating problem files. The sorted_robust() function is significantly slower than sorted() with mixed-representation data. Thus using sorted_robust() with Python 3.x would make it appear that Python3 is slower than Python2, when in fact there are faster alternatives.

Changes proposed in this PR:

Changing iteration in Set objects to use the insertion order by default.
Adding tests that confirm that mixed-representation sorts can be solved.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

I agree my contributions are submitted under the BSD license.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

These tests verify that implicitly ordered sets work as expected, including when the set members are not comparable.

Only sort data when the sorted order is set.

Also updating a baseline test where ordering changed with the default set order

Instead of randomizing, this simple inserts components in order

1. Updating files for small16 tests 2. Disabling GAMS/Baron tests by changing their testname 3. Updating small16 GAMS/Baron baselines

whart222 · 2018-07-04T18:38:23Z

@jsiirola @ghackebeil I'm marking this as WIP because there are known test failures in PySP and GDP. The tests pass for pyomo.repn and pyomo.solvers, so I'm cautiously optimistic that we can safely disable the sorting logic in the writers. However, I suspect that PySP and GDP are generating models nondeterministically (e.g. through reformulations), and hence we have baseline test failures.

NOTE: I've currently disabled the sorting logic. However, an alternative is to leave that logic in place, but make the default writer behavior to not sort. This would allow testing of components where models are generated nondeterministically. However, I think that we do ultimately want determinism throughout Pyomo.

whart222 · 2018-07-04T19:16:27Z

@jsiirola @ghackebeil Hold off on reviewing this and resolving issues. I did some more careful testing, and there's an issue that I haven't accounted for yet. While we can easily make sets ordered, it's harder to make sparse indexing of components ordered the same way without using OrderedDict (which has performance impliactions, I think).

jsiirola · 2018-07-04T19:38:08Z

FWIW, I would rather work this issue on the set-rewrite branch instead of continuing to backpatch the current implementation, because in my experience working there, we need some rather significant changes to the design of sets to cleanly support this.

The iterator now depends on the set data iterator, which reflects the fixed ordering of the underlying set.

Updated the small16 problem to ensure that it could generate differences when nondeterminism exists in the model.

Has an indexed constaint that is sparsely populated.

whart222 · 2018-07-04T20:07:36Z

@jsiirola @ghackebeil OK. I think I resolved the issue.

whart222 · 2018-07-04T20:12:45Z

@jsiirola I don't quite see that this is a backpatch. I think these changes here are self-sufficient. I understand that we're working on the same code, but I'm expecting that the changes in that branch will simply replace the logic in sets.py. The changes to other files in this branch are orthogonal to that change. Right?

whart222 · 2018-07-04T20:13:52Z

BTW, my last change didn't require the use of OrderedDict. It was merely a change to the iterator for Set objects.

whart222 · 2018-07-05T06:16:04Z

@jsiirola This last change may resolved the GDP tests. I'll try to confirm tomorrow.

whart222 · 2018-07-05T15:19:51Z

@jsiirola It looks like the GDP test failures are gone in Python 3.6, but there are some test failures in Python 3.5. This is likely due to nondeterminism in the use of dict objects. I'll look into this.

whart222 · 2018-07-05T15:20:57Z

@ghackebeil On Python 3.6, there remain some test failures for PySP under tests/convert. I'm having trouble replicating these on old-sisu. Let me know if you have any thoughts about what might be going on here.

ghackebeil · 2018-07-05T17:53:12Z

They failed locally for me, and the baseline updates I applied made sense to me. If they were not failing on another 3.6 build, that might indicate that there is another source of non-determinism. Would you mind testing on that machine again with the new baselines and let me know if it fails?

Cannot tests small16 and small17 with file_determinism=3

whart222 · 2018-07-06T04:17:25Z

@ghackebeil I just reran the Travis tests, and there are 4 failures for PySP in Python 3.7/3.6, and 13 failures for Python 3.5/2.7. I can't replicate these tests on old-sisu. Do you know if I need to have a specific package installed to do that?

I could imagine that there are additional sources of nondeterminism in PySP, though pyomo.core/pyomo.repn tests seem stable. That's where the components are defined and the problem writers are defined.

This should obviate testing baseline changes outside of pyomo.repn. The small16 and small17 tests are tested with determinism=0, since that is the only mode suitable for models with mixed representations.

whart222 · 2018-07-07T05:01:42Z

@ghackebeil I changed the logic to default to determinism=1 (sorting of indices). Hence, I reverted your baseline changes.

ghackebeil · 2018-07-07T16:00:29Z

So are the changes in this PR now solely related to Set pprint baselines?

For python version < 3.6, don't run these tests.

…_567

whart222 · 2018-07-08T18:32:54Z

@ghackebeil I do have tests that confirm that writers work in Python 3.6, 3.7 with determinism=0.

whart222 · 2018-07-08T18:39:25Z

@jsiirola @ghackebeil I think my latest commits will resolve the outstanding test failures in this PR. In recent changes, I've made determinism=1 again the default for writers. Thus, the current code will fail if a user uses an incomparable index set in Python3.x and tries to write a problem file.

As John has noted, we could use the sorted_robust() function in the code (in the Block class?) to resolve this issue. However, I did some simple testing that shows that sorted_robust() is 5-10x slower for sets that contain string and integer values:

from pyomo.core.base.misc import sorted_robust
import time
import random

random.seed(12938470)

core = []
for i in range(100000):
    core.append(random.random())

start = time.time()
ans = list(sorted(core))
print("Baseline %f" % (time.time()-start))

new1 = core + [1]
start = time.time()
ans = list(sorted(new1))
print("Baseline %f" % (time.time()-start))

new1 = core + [1]
start = time.time()
ans = list(sorted_robust(new1))
print("Baseline %f" % (time.time()-start))

new2 = core + ['1']
start = time.time()
ans = list(sorted_robust(new2))
print("Baseline %f" % (time.time()-start))

On my Mac, I get statistics like:

Baseline 0.031980
Baseline 0.032028
Baseline 0.034850
Baseline 0.289933

So, this shouldn't impact our test performance, and I would expect it wouldn't impact most users. But using sorted_robust() would create a situation where a user with an incomparable set sees a nontrivial slowdown when moving from Python 2.7 to 3.x. And that is despite the fact that sorting isn't necessary to generate a reproducible problem representation (which I think is really the main point of the determinism option).

So, I'm reluctant to make this change. Thoughts?

codecov-io · 2018-07-08T19:53:31Z

Codecov Report

Merging #602 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #602      +/-   ##
=========================================
- Coverage   67.21%   67.2%   -0.01%     
=========================================
  Files         392     392              
  Lines       62208   62215       +7     
=========================================
+ Hits        41812   41814       +2     
- Misses      20396   20401       +5

Impacted Files	Coverage Δ
pyomo/core/base/sets.py	`85.82% <100%> (-0.12%)`	⬇️
pyomo/repn/plugins/ampl/ampl_.py	`87.71% <0%> (-0.1%)`	⬇️
pyomo/pysp/ph.py	`61.14% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8be17fd...8ab6e17. Read the comment docs.

whart222 · 2018-07-15T16:15:26Z

@jsiirola @ghackebeil Can we finalize this PR?

jsiirola · 2018-07-16T18:32:38Z

I assumed you were still working on it (it is marked [WIP]).

I also do not understand the point of this PR. You are adding a bunch of tests, but they seem to only apply to one version of python (IMHO, not good). You are also changing the behavior of Set's pprint() (apparently is only sorted for sorted sets, otherwise is insertion order, but you have two ways of handling insertion order?). Is there a reason you are not using the sorted_robust utility instead of sorted in that change?

Using sorted_robust() to sort all sets. This method is slow for mixed representations, but that doesn't matter when writing diagnostic output.

This should never have been committed.

whart222 · 2018-09-01T05:17:58Z

@jsiirola @carldlaird I reworked the PR description to clarify the scope/intent of this change. Note that this PR does not change the default behavior in Pyomo, as was originally intended. That proved problematic (see #677).

jsiirola · 2018-09-05T19:18:22Z

I am not in favor of this PR, as my understanding is that it will potentially lead to Pyomo generating different solver input files on different versions of Python (before/after Python 3.6). Please correct me if I am mistaken.

FWIW, the design I am experimenting with in the set-rewrite branch could address sorting through a different approach: Sets would support two methods: .sorted() and .ordered(). The difference is the latter is guaranteed to be ordered (i.e., deterministic) but not necessarily sorted. This allows for the Set to implement efficient deterministic iteration, without requiring that iteration to be strictly sorted. This will allow things like the writers to avoid sorting in the bulk of cases. That in turn, lessens the sort penalty -- which I already don't think is very significant, as I have not seen it show up on profiles...

whart222 · 2020-03-18T18:41:59Z

I updated #567 to document that we are closing this PR because of inactivity.

whart222 added 9 commits July 4, 2018 10:57

Adding tests that exercise the set repn

91c2678

These tests verify that implicitly ordered sets work as expected, including when the set members are not comparable.

Change to pprint.

ae9eea9

Only sort data when the sorted order is set.

Disabling the sorter in the NL writer

d2c0540

Disabling sorting logic

281ecd8

Disabling sorting of output

4115000

Adding comparison tests

a70a91f

Also updating a baseline test where ordering changed with the default set order

Changed semantics of test w/o ordering

fe073ec

Instead of randomizing, this simple inserts components in order

Various updates

4c3d4d4

1. Updating files for small16 tests 2. Disabling GAMS/Baron tests by changing their testname 3. Updating small16 GAMS/Baron baselines

Merge remote-tracking branch 'origin/master' into issue_567

771533c

whart222 added 3 commits July 4, 2018 13:56

Set iterator logic change

ab2bee0

The iterator now depends on the set data iterator, which reflects the fixed ordering of the underlying set.

Baseline updates

7561148

Updated the small16 problem to ensure that it could generate differences when nondeterminism exists in the model.

Adding small17 example

622057d

Has an indexed constaint that is sparsely populated.

Updating pysp baselines

3ffa0b2

updating pysp baselines

3760178

whart222 added 2 commits July 5, 2018 21:20

Skipping tests in py35 and earlier

ac2c107

Fixing tests

30c56b5

Cannot tests small16 and small17 with file_determinism=3

blnicho changed the title ~~(WIP) Deterministic generation of problem files~~ [WIP] Deterministic generation of problem files Jul 6, 2018

whart222 added 2 commits July 6, 2018 21:19

Change to make determinism=1 the default

48bce2d

This should obviate testing baseline changes outside of pyomo.repn. The small16 and small17 tests are tested with determinism=0, since that is the only mode suitable for models with mixed representations.

Reverting baselines from master

8c883df

whart222 added 3 commits July 6, 2018 22:35

Reverting changes

224bcba

Reverting changes

e7a250c

Misc updates

99ea4d3

whart222 added 3 commits July 8, 2018 11:20

Disable small 16/17 tests

c9be5a0

For python version < 3.6, don't run these tests.

Updating documentation

86ba3c2

Merge branch 'issue_567' of https://github.com/Pyomo/pyomo into issue…

564e23a

…_567

whart222 added 5 commits August 31, 2018 21:57

Merge remote-tracking branch 'origin/master' into issue_567

99c3331

Reverting changes to Set writer

77ce864

Using sorted_robust() to sort all sets. This method is slow for mixed representations, but that doesn't matter when writing diagnostic output.

Updating baseline

c2cd7ef

Commenting out redundant code.

335aa4d

Removing temporary results file.

8ab6e17

This should never have been committed.

whart222 changed the title ~~[WIP] Deterministic generation of problem files~~ Generation of problem files with mixed-representation index sets Sep 1, 2018

whart222 requested review from jsiirola and carldlaird September 1, 2018 05:11

pyomo-autotest added the AT: STALE label Oct 9, 2018

blnicho assigned carldlaird Feb 19, 2019

jsiirola changed the title ~~Generation of problem files with mixed-representation index sets~~ [IDEA] Generation of problem files with mixed-representation index sets May 7, 2019

whart222 mentioned this pull request Mar 18, 2020

Should Pyomo allow indexing sets whose members are not comparable? #567

Closed

whart222 closed this Mar 18, 2020

whart222 deleted the issue_567 branch March 18, 2020 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IDEA] Generation of problem files with mixed-representation index sets #602

[IDEA] Generation of problem files with mixed-representation index sets #602

whart222 commented Jul 4, 2018 •

edited

Loading

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

jsiirola commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 5, 2018

whart222 commented Jul 5, 2018

whart222 commented Jul 5, 2018

ghackebeil commented Jul 5, 2018 •

edited

Loading

whart222 commented Jul 6, 2018

whart222 commented Jul 7, 2018

ghackebeil commented Jul 7, 2018

whart222 commented Jul 8, 2018

whart222 commented Jul 8, 2018

codecov-io commented Jul 8, 2018 •

edited

Loading

whart222 commented Jul 15, 2018

jsiirola commented Jul 16, 2018

whart222 commented Sep 1, 2018

jsiirola commented Sep 5, 2018

whart222 commented Mar 18, 2020

[IDEA] Generation of problem files with mixed-representation index sets #602

[IDEA] Generation of problem files with mixed-representation index sets #602

Conversation

whart222 commented Jul 4, 2018 • edited Loading

Fixes #567 (partial).

Summary/Motivation:

Changes proposed in this PR:

Legal Acknowledgement

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

jsiirola commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 4, 2018

whart222 commented Jul 5, 2018

whart222 commented Jul 5, 2018

whart222 commented Jul 5, 2018

ghackebeil commented Jul 5, 2018 • edited Loading

whart222 commented Jul 6, 2018

whart222 commented Jul 7, 2018

ghackebeil commented Jul 7, 2018

whart222 commented Jul 8, 2018

whart222 commented Jul 8, 2018

codecov-io commented Jul 8, 2018 • edited Loading

Codecov Report

whart222 commented Jul 15, 2018

jsiirola commented Jul 16, 2018

whart222 commented Sep 1, 2018

jsiirola commented Sep 5, 2018

whart222 commented Mar 18, 2020

whart222 commented Jul 4, 2018 •

edited

Loading

ghackebeil commented Jul 5, 2018 •

edited

Loading

codecov-io commented Jul 8, 2018 •

edited

Loading