In [1]:
import json

When I run all regression tests with the develop branch before my updates, I get the following: 

"Failure during regression testing @socrates for test(s): 1700, 3184, 5063, 9923."

When I run the first 200 tests after my updates (including your hotfix) I get only about twenty percent failures.

(After the updates the ambiguity section of the test dictionary is just ignored.)

In [2]:
# load the regression_vault:
with open('../../respy/tests/resources/regression_vault.respy.json') as j:
    vault = json.load(j)
    
# list of tests that failed after the update:
failed = [1, 4, 11, 19, 34, 37, 46, 55, 62, 63, 67, 70, 81, 85, 88, 127,
          131, 135, 137, 139, 145, 147, 149, 151, 155, 164, 165, 167, 175,
          180, 182, 183, 184, 185, 194, 199.]

Many of the non-failures are not surprising, since some models don't use ambiguity, have myopic agents or only one period. According to those criteria I construct a list of tests that should fail:

In [3]:
should_fail = []
for i, (test, _) in enumerate(vault[:200]):
    expected_to_pass = False
    # I'm not completely sure what to expect if the ambiguity coeff is 0 but not fixed
    # In practice most of those cases run through.
    if test['AMBIGUITY']['coeffs'][0] == 0.0: # and test['AMBIGUITY']['fixed'][0] is True:
        expected_to_pass = True
    if test['BASICS']['periods'] == 1:
        expected_to_pass = True
    if test['BASICS']['coeffs'][0] == 0.0:
        expected_to_pass = True
    
    if not expected_to_pass:
        should_fail.append(i)
        
print(should_fail)

[1, 4, 11, 16, 19, 24, 34, 37, 44, 46, 55, 62, 67, 70, 81, 85, 92, 98, 102, 115, 117, 127, 131, 135, 137, 139, 145, 147, 149, 151, 155, 164, 165, 167, 168, 175, 180, 183, 184, 185, 194]


Next we construct a list of tests that should fail, according to the criteria but don't:

In [4]:
surprisingly_passing = []
for test in should_fail:
    if test not in failed:
        surprisingly_passing.append(test)
surprisingly_passing

[16, 24, 44, 92, 98, 102, 115, 117, 168]

I did not find a strong pattern among surpsisingly passing tests but many of them have either a very low delta or a very low ambiguity level or both. 

Next we construct a list of tests that should not fail but do

In [5]:
surprisingly_failing = []
for test in failed:
    if test not in should_fail:
        surprisingly_failing.append(test)
surprisingly_failing

[63, 88, 182, 199.0]