Skip to content

activity_annotation hook makes graph serialization crash #3408

@btraven00

Description

@btraven00

Describe the bug

When using the hook for adding an arbitrary activity annotation, a succesive export of the graph will crash.

To Reproduce
Steps to reproduce the behavior:

  1. create a pluggy implementator of activity_annotations, following the documentation
  2. execute a renku run.
  3. export the resulting graph containing the extra annotation.

Expected behavior
Crash (a KeyError exception) should be handled gracefully, and execution halted. This kind of bug will need a hard-revert of the git repo, since any further attempts to export the data will choke on the past annotation.

Run environment (please complete the following information):

in this particular run, renku==1.11.2, calamus==0.4.2, but after looking at the codebase I suspect the bug persists in current development head for both projects.

Additional context

I am not too familiar with the code, but I'm inclined to believe this is due to an underlying bug in calamus. More precisely, the schema lookup that happens in fields._serialize_single_obj fails. The original backtrace is:

  File "/opt/conda/lib/python3.9/site-packages/renku/command/graph.py", line 79, in export_graph
    graph = get_graph_for_all_objects()
  File "/opt/conda/lib/python3.9/site-packages/inject/__init__.py", line 342, in injection_wrapper
    return sync_func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/renku/command/graph.py", line 195, in get_graph_for_all_objects
    return _convert_entities_to_graph(objects, project)
  File "/opt/conda/lib/python3.9/site-packages/renku/command/graph.py", line 239, in _convert_entities_to_graph
    graph.extend(schema(flattened=True).dump(entity))
  File "/opt/conda/lib/python3.9/site-packages/marshmallow/schema.py", line 557, in dump
    result = self._serialize(processed_obj, many=many)
  File "/opt/conda/lib/python3.9/site-packages/calamus/schema.py", line 187, in _serialize
    value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
  File "/opt/conda/lib/python3.9/site-packages/marshmallow/fields.py", line 344, in serialize
    return self._serialize(value, attr, obj, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/calamus/fields.py", line 550, in _serialize
    result.append(self._serialize_single_obj(obj, **kwargs))
  File "/opt/conda/lib/python3.9/site-packages/calamus/fields.py", line 522, in _serialize_single_obj
    schema = self.schema["to"][type(obj)]
KeyError: <class 'renku.domain_model.provenance.annotation.Annotation'>

In that function, there's a safeguard, but there's a typo that makes the execution continue (note the missing raise in line 508):

 if type(obj) not in self.schema["to"]:
   ValueError("Type {} not found in field {}.{}".format(type(obj), type(self.parent), self.name))

Fixing that is useful to get a clearer error, but will not fix the underlying problem.

Diving into why the class is not present there turns to be puzzling: the class is indeed registered, but the lookup in the dictionary doesn't retrieve the object (equality comparison fail for both class objects):

ipdb> type(obj)
<class 'renku.domain_model.provenance.annotation.Annotation'>
ipdb> tt = type(obj)
ipdb> ts = tuple(self.schema["to"].keys())[0]
ipdb> ts
<class 'renku.domain_model.provenance.annotation.Annotation'>
ipdb> tt
<class 'renku.domain_model.provenance.annotation.Annotation'>
ipdb> ts == tt
False
ipdb> ts.__dict__
mappingproxy({'__module__': 'renku.domain_model.provenance.annotation', '__doc__': 'Represents a custom annotation for a research object.', '__init__': <function Annotation.__init__ at 0x7fc791986550>, 'copy': <function Annotation.copy at 0x7fc7919865e0>, 'generate_id': <staticmethod object at 0x7fc7919b49a0>, '__dict__': <attribute '__dict__' of 'Annotation' objects>, '__weakref__': <attribute '__weakref__' of 'Annotation' objects>})
ipdb> tt.__dict__
mappingproxy({'__module__': 'renku.domain_model.provenance.annotation', '__doc__': 'Represents a custom annotation for a research object.', '__init__': <function Annotation.__init__ at 0x7fc78ca8e280>, 'copy': <function Annotation.copy at 0x7fc78ca8e310>, 'generate_id': <staticmethod object at 0x7fc78ca81640>, '__dict__': <attribute '__dict__' of 'Annotation' objects>, '__weakref__': <attribute '__weakref__' of 'Annotation' objects>})

I'm lost about why the mappingproxy returns the same class with functions at different addresses in memory, but I suspect the internals of pluggy might be interfering with the schema lookup at calamus that relies on equality of class objects.

A quick and dirty (although unelegant) workaround is to convert the class-based index to a string comparison, which seems to capture the same semantics:

schema = None
for klass, schema  in self.schema["to"].items():
  if str(klass) == str(type(obj)):
    break

a slightly better way could perhaps be to index by string with {__module__}.{__name__}.

I can try to work on a better fix, how do you suggest this should be handled?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions