Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLOC-4452] Delay invariant checks until all diff operations have been applied #2839

Merged
merged 26 commits into from
Jul 11, 2016

Conversation

wallrj
Copy link
Contributor

@wallrj wallrj commented Jun 21, 2016

…r so as not to trigger the invariant check until all attributes have been set.
@wallrj
Copy link
Contributor Author

wallrj commented Jun 22, 2016

Well I think I've fixed one version of this error,

[root@ip-172-31-0-243 ~]# find /data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/ -type f -name 'remote_logs.log' | xargs grep --count "Either all or none of"
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/1/archive/remote_logs.log:6
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/__main_multijob/workspace/run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/__main_multijob/workspace/run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/1/archive/remote_logs.log:4
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/2/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/1/archive/remote_logs.log:0
You have mail in /var/spool/mail/root
[root@ip-172-31-0-243 ~]# find /data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/ -type f -name 'remote_logs.log' | xargs grep --count "Either all or none of"
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/1/archive/remote_logs.log:16
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/2/archive/remote_logs.log:8
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/1/archive/remote_logs.log:14
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/__main_multijob/workspace/run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/__main_multijob/workspace/run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/remote_logs.log:0
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/5/archive/remote_logs.log:12
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/1/archive/remote_logs.log:16
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/1/archive/remote_logs.log:8
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/2/archive/remote_logs.log:8
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/1/archive/remote_logs.log:22
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/2/archive/remote_logs.log:16
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/1/archive/remote_logs.log:8
/data/jenkins-jobs/ClusterHQ-flocker/jobs/cancel-timer-FLOC-4450/jobs/_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/2/archive/remote_logs.log:8

http://ci-live.clusterhq.com:8080/job/ClusterHQ-flocker/job/invariant-error-FLOC-4452/view/cron/job/_run_acceptance_on_AWS_CentOS_7_with_EBS/ does still have the "Either all or none of" invariant error, but the stack trace suggests that those instances were not using the updated code....which is worrying and inexplicable right now.

cb133303-63a6-4542-acb5-4c85592e808e -> /1
2016-06-21 16:30:20.238646Z
  message_type: 'twisted:log'
  _HOSTNAME: 'ip-172-31-0-162.us-west-1.compute.internal'
  _PROCESS_NAME: 'flocker-container-agent.service'
  error: True
  message: 'Unhandled Error
         |  Traceback (most recent call last):
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1021, in _commandReceived
         |      deferred = self.dispatchCommand(box)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1079, in dispatchCommand
         |      return maybeDeferred(responder, box)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
         |      result = f(*args, **kw)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1166, in doit
         |      return maybeDeferred(aCallable, **kw).addCallback(
         |  --- <exception caught here> ---
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
         |      result = f(*args, **kw)
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_protocol.py", line 1152, in cluster_updated_diff
         |      self._current_state
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_diffing.py", line 115, in apply
         |      obj = c.apply(obj)
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_diffing.py", line 79, in apply
         |      return obj.transform(self.path, self.value)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_pclass.py", line 136, in transform
         |      return transform(self, transformations)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 65, in transform
         |      r = _do_to_path(r, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
         |      result = _do_to_path(v, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
         |      result = _do_to_path(v, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 113, in _update_structure
         |      return e.persistent()
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_precord.py", line 149, in persistent
         |      check_global_invariants(result, cls._precord_invariants)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_field_common.py", line 22, in check_global_invariants
         |      raise InvariantException(error_codes, (), \'Global invariant failed\')
         |  pyrsistent._checked_types.InvariantException: Global invariant failed, invariant_errors=[Either all or none of set([\'paths\', \'manifestations\', \'devices\']) must be set.], missing_fields=[]
         |  '

(note that there's no sign of the new _EvolverProxy.commit in that stack trace.)

@wallrj
Copy link
Contributor Author

wallrj commented Jun 22, 2016

There's another example of the invariant error in a different PClass

grep 'Global invariant' /data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452/jobs/_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/1/archive/remote_logs.log
...
Unhandled Error
Traceback (most recent call last):
  File "/opt/flocker/local/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1021, in _commandReceived
    deferred = self.dispatchCommand(box)
  File "/opt/flocker/local/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1079, in dispatchCommand
    return maybeDeferred(responder, box)
  File "/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/opt/flocker/local/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1166, in doit
    return maybeDeferred(aCallable, **kw).addCallback(
--- <exception caught here> ---
  File "/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_protocol.py", line 1157, in cluster_updated_diff
    self._current_configuration
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_diffing.py", line 179, in apply
    proxy = c.apply(proxy)
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_diffing.py", line 81, in apply
    self.path[:-1], lambda o: o.set(self.path[-1], self.value)
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_diffing.py", line 130, in transform
    self.commit()
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_diffing.py", line 155, in commit
    target,
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_pclass.py", line 136, in transform
    return transform(self, transformations)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 65, in transform
    r = _do_to_path(r, path, command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
    return _update_structure(structure, kvs, path[1:], command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
    result = _do_to_path(v, path, command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
    return _update_structure(structure, kvs, path[1:], command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
    result = _do_to_path(v, path, command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
    return _update_structure(structure, kvs, path[1:], command)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 113, in _update_structure
    return e.persistent()
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_pclass.py", line 248, in persistent
    return self._pclass_evolver_original.__class__(**self._pclass_evolver_data)
  File "/opt/flocker/local/lib/python2.7/site-packages/flocker/control/_model.py", line 491, in __new__
    return PClass.__new__(cls, **kwargs)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_pclass.py", line 69, in __new__
    check_global_invariants(result, cls._pclass_invariants)
  File "/opt/flocker/local/lib/python2.7/site-packages/pyrsistent/_field_common.py", line 22, in check_global_invariants
    raise InvariantException(error_codes, (), 'Global invariant failed')
pyrsistent._checked_types.InvariantException: Global invariant failed, invariant_errors=[Application(memory_limit=None, name=u'flocker_acceptance_obsolete_test_containers_ContainerAPITests_test_move_container_with_dataset-975570', links=LinkPSet([]), environment=pmap({}), command_line=UnicodePVector([u'python', u'-c', u'"""\nHTTP server that writes data to a specified file on POST, or reads and\nreturns data from a specified file on GET.\n"""\n\nfrom sys import argv\n\ntry:\n    from urlparse import parse_qs\nexcept ImportError:\n    from cgi import parse_qs\n\nfrom BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer\n\n\nclass Handler(BaseHTTPRequestHandler):\n    def do_POST(s):\n        length = int(s.headers[\'content-length\'])\n        postvars = parse_qs(\n            s.rfile.read(length),\n            keep_blank_values=1\n        )\n        try:\n            with open(argv[1] + \'/test\', "w") as f:\n                f.write(postvars["data"][0])\n        except Exception as e:\n            s.wfile.write(str(e.__class__) + ": " + str(e))\n        s.send_response(200)\n        s.end_headers()\n        s.wfile.write(b"ok")\n        s.wfile.close()\n\n    def do_GET(s):\n        s.send_response(200)\n        s.end_headers()\n        if len(argv) > 1:\n            try:\n                with open(argv[1] + \'/test\', "r") as f:\n                    data = f.read()\n            except Exception as e:\n                s.wfile.write(str(e.__class__) + ": " + str(e))\n            else:\n                s.wfile.write(data)\n        else:\n                s.wfile.write(b"ok")\n        s.wfile.close()\n\nhttpd = HTTPServer((b"0.0.0.0", 8080), Handler)\nhttpd.serve_forever()\n', u'/data']), image=DockerImage(tag=u'2.7-slim', repository=u'python'), restart_policy=RestartNever(), volume=AttachedVolume(mountpoint=FilePath('/data'), manifestation=Manifestation(primary=True, dataset=Dataset(deleted=False, dataset_id=u'4bda5cd0-9377-45e5-ad53-441f12db38ba', maximum_size=1073741824, metadata=pmap({})))), running=True, ports=PortPSet([Port(internal_port=8080, external_port=8080)]), cpu_shares=None) manifestation is not on node], missing_fields=[]

I need to look into that one.

@wallrj
Copy link
Contributor Author

wallrj commented Jun 24, 2016

Well I don't fully understand it, but there still seem to be invariant failures in some of our acceptance test environments:

[centos@ip-172-31-0-243 ~]$ find /data/jenkins-jobs/ClusterHQ-flocker/jobs/invariant-error-FLOC-4452 -type f -name 'remote_logs.log' | xargs grep --count "invariant" | sort
...
__main_multijob/workspace/run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/remote_logs.log:0
__main_multijob/workspace/run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/2/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/3/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/4/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/5/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/6/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_CentOS_7_for_flocker.acceptance/builds/7/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/1/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/2/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/3/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/4/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/5/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/6/archive/remote_logs.log:0
run_acceptance_loopback_on_AWS_Ubuntu_Trusty_for_flocker.acceptance/builds/7/archive/remote_logs.log:0
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/1/archive/remote_logs.log:12
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/2/archive/remote_logs.log:12
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/3/archive/remote_logs.log:12
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/4/archive/remote_logs.log:4
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/5/archive/remote_logs.log:4
_run_acceptance_on_AWS_CentOS_7_with_EBS/builds/6/archive/remote_logs.log:4
_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/1/archive/remote_logs.log:8
_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/2/archive/remote_logs.log:8
_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/4/archive/remote_logs.log:0
_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/5/archive/remote_logs.log:0
_run_acceptance_on_AWS_Ubuntu_Trusty_with_EBS/builds/6/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/1/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/2/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/3/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/4/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/6/archive/remote_logs.log:0
_run_acceptance_on_GCE_CentOS_7_with_GCE/builds/7/archive/remote_logs.log:0
_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/1/archive/remote_logs.log:8
_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/2/archive/remote_logs.log:8
_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/4/archive/remote_logs.log:0
_run_acceptance_on_GCE_Ubuntu_Trusty_with_GCE/builds/5/archive/remote_logs.log:0
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/1/archive/remote_logs.log:14
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/2/archive/remote_logs.log:12
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/3/archive/remote_logs.log:12
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/4/archive/remote_logs.log:4
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/5/archive/remote_logs.log:8
_run_acceptance_on_Rackspace_CentOS_7_with_Cinder/builds/6/archive/remote_logs.log:4
_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/2/archive/remote_logs.log:8
_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/3/archive/remote_logs.log:8
_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/4/archive/remote_logs.log:8
_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/5/archive/remote_logs.log:0
_run_acceptance_on_Rackspace_Ubuntu_Trusty_with_Cinder/builds/6/archive/remote_logs.log:0

And judging by the stack trace it seems that the acceptance test runner is installing the wrong package version

1b7c5314-8a82-4de8-b240-a4f5ff3d497a -> /1
2016-06-24 15:13:52.062164Z
  message_type: 'twisted:log'
  _HOSTNAME: 'ip-172-31-1-149.us-west-1.compute.internal'
  _PROCESS_NAME: 'flocker-container-agent.service'
  error: True
  message: 'Unhandled Error
         |  Traceback (most recent call last):
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1021, in _commandReceived
         |      deferred = self.dispatchCommand(box)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1079, in dispatchCommand
         |      return maybeDeferred(responder, box)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
         |      result = f(*args, **kw)
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/protocols/amp.py", line 1166, in doit
         |      return maybeDeferred(aCallable, **kw).addCallback(
         |  --- <exception caught here> ---
         |    File "/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
         |      result = f(*args, **kw)
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_protocol.py", line 1152, in cluster_updated_diff
         |      self._current_state
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_diffing.py", line 115, in apply
         |      obj = c.apply(obj)
         |    File "/opt/flocker/lib/python2.7/site-packages/flocker/control/_diffing.py", line 79, in apply
         |      return obj.transform(self.path, self.value)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_pclass.py", line 136, in transform
         |      return transform(self, transformations)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 65, in transform
         |      r = _do_to_path(r, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
         |      result = _do_to_path(v, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 110, in _update_structure
         |      result = _do_to_path(v, path, command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 74, in _do_to_path
         |      return _update_structure(structure, kvs, path[1:], command)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_transformations.py", line 113, in _update_structure
         |      return e.persistent()
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_precord.py", line 149, in persistent
         |      check_global_invariants(result, cls._precord_invariants)
         |    File "/opt/flocker/lib/python2.7/site-packages/pyrsistent/_field_common.py", line 22, in check_global_invariants
         |      raise InvariantException(error_codes, (), \'Global invariant failed\')
         |  pyrsistent._checked_types.InvariantException: Global invariant failed, invariant_errors=[Either all or none of set([\'paths\', \'manifestations\', \'devices\']) must be set.], missing_fields=[]
         |  '

NB no sign of the new _EvolverProxy.commit function.

Which is a bit worrying!

@wallrj
Copy link
Contributor Author

wallrj commented Jun 24, 2016

Ah! I know why. I think this is our test_upgrade test striking again

The errors are occurring during the short period while the cluster is running the downgraded version of flocker. (1.13.0)

@wallrj wallrj changed the title [FLOC-4452] Collect and apply adjacent set operations to an evolver so as not to trigger the invariant checks until all attributes have been set [FLOC-4452] Delay invariant checks until all diff operations have been applied Jun 24, 2016
@@ -341,7 +371,7 @@ def deployment_strategy(
draw(
lease_strategy(
dataset_id=st.just(dataset_id),
node_uuid=st.just(node_uuid)
node_id=st.just(node_uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this simply broken before? It looks like yes but it went unnoticed because with no "stateful" applications there were never any leases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it was broken and the node_strategy is also broken in the way it supplies UUID(...) versus unicode(UUID(...)) to Dataset.dataset_id....I need to investigate that.
I just added the new stateful_application parameter to avoid triggering the bug in the existing tests.

@exarkun
Copy link
Contributor

exarkun commented Jun 28, 2016

Thanks Richard. Left my comments inline. Looks good for the most part though I wasn't able to follow all of the implementation (left comments where I was lost). The test suite looks good though so I'm happy to have you address the noted points to your satisfaction and then merge - apart from the issue with the upgrade test? What's the story there? Does that test just need to be made more robust as a separate unit of work? Thanks again.

@wallrj
Copy link
Contributor Author

wallrj commented Jul 11, 2016

Ignoring:

@wallrj wallrj merged commit 9f9b2f7 into master Jul 11, 2016
@wallrj wallrj deleted the invariant-error-FLOC-4452 branch July 11, 2016 09:51

class _IRecordType(Interface):
"""
The operations that can be performed when transforming a ``PSet`` object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PSet/PMap/ probably

@avg-I
Copy link
Contributor

avg-I commented Jul 11, 2016

@wallrj Richard, sorry it took me this long to review this change and to produce just a few minor comments. It took me quite a while to understand all the changed and related code.

I didn't realize that you merged the PR some hours ago, so this is a belated LGTM from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants