FieldProperty fields validate twice, which is slow #67

jamadden · 2018-07-13T22:46:37Z

The common case of using nti.schema.fieldproperty:createDirectFieldProperties (or simply zope.schema.fieldproperty.createFieldProperties) cause validation to happen twice during internalization.

The first time is when we explicit when we call validate_named_field_value as we're copying key/value pairs from the external dictionary. This pass has the extra steps that attempt to deal with adapting objects and sequences if there's a validation issue.

The second time is when we actually set the object. We call field.set(object, value). That usually translates into a simple setattr(object, field.__name__, value). But when the object's type has a FieldProperty for that field, that winds up being FieldProperty.__set__. This method first binds a field to the instance (allocates an object), then calls field.validate(value) before finally putting the value in the instance's dictionary and emitting an event.

This process turns out to be expensive, and it's mainly in the double validation. In the benchmark that has ten objects with a textline being stored in a List of that type, taking out the double validation makes us something like 25% faster. (Removing the field property altogether makes us 29% faster.)

It's not entirely clear what if anything we can or should do about this. Part of the problem is probably the way that validate_field_value returns a callable instead of actually doing the setting itself---the only time it's used in this package we immediately call the returned callable. If it wasn't split into two parts like that it could more easily just set the attribute and catch an exception to do its adaptation.

The text was updated successfully, but these errors were encountered:

jamadden · 2018-07-17T13:43:30Z

This diff makes us 10% faster (while still passing all the tests) by only validating once:

------------------------------------------------------------------------------------
modified: src/nti/externalization/internalization/fields.py
------------------------------------------------------------------------------------
@@ -218,7 +218,7 @@ def _handle_WrongContainedType(field_name, field, value):
         raise reraise(*exc_info)

     return value
-
+from zope.schema.fieldproperty import FieldProperty
 def validate_field_value(self, field_name, field, value):
     """
     Given a :class:`zope.schema.interfaces.IField` object from a schema
@@ -241,6 +241,11 @@ def validate_field_value(self, field_name, field, value):
     try:
         if isinstance(value, text_type) and IFromUnicode_providedBy(field):
             value = field.fromUnicode(value)  # implies validation
+        elif isinstance(getattr(type(self), field_name, None), FieldProperty):
+            # implies validation. field.set() calls setattr() calls
+            # FieldProperty.__set__ calls field.bind().validate()
+            field.set(self, value)
+            return noop
         else:
             field.validate(value)
     except SchemaNotProvided:

(Simply calling field.set without the check for FieldProperty is not correct as Field.set doesn't guarantee any validation.)

It does change the behaviour, though. The obvious one is that it doesn't just validate, it also sets immediately. That's not a problem in this library because we never use the deferred setter, it's always immediately called. But it might be an issue in other libraries.

More subtle is that if field.set has side effects, they may now happen twice. For example, Object.set notifies a BeforeObjectAssignedEvent:

    def set(self, object, value):
        # Announce that we're going to assign the value to the object.
        # Motivation: Widgets typically like to take care of policy-specific
        # actions, like establishing location.
        event = BeforeObjectAssignedEvent(value, self.__name__, object)
        notify(event)
        # The event subscribers are allowed to replace the object, thus we need
        # to replace our previous value.
        value = event.object
        super(Object, self).set(object, value)

That's a useful event. If the value is already valid, then nothing is different. But if validation fails, that event will be sent once with the soon-to-be-declared-invalid value, and then (assuming we can fix it) sent again with the valid value...assuming that subscribers to the event don't react badly to the initial invalid value and raise an exception we're not prepared for.

Because update_from_external_object recurses through the external data structure before calling IInternalObjectUpdate.updateFromExternalObject, which itself is supposed to leave objects in a valid state, I would expect that all field values for complex objects are generally valid, except possibly at the leaves. But that's valid by their own schema, not necessarily the target field's schema, so I don't think we can make any assumptions about the prevalence of valid/invalid data.

Those same possible side effects from the field are why we can't attempt to use the instance's __dict__ directly when we detect a FieldProperty.

Fixes #67 Add tests that this doesn't break anything. In our simple tests, this makes us at least 10% faster. 2.7: | Benchmark | 27_list_fieldproperty3 | 27_list_fieldproperty_patched3 | |------------------------------|------------------------|--------------------------------| | __main__: fromExternalObject | 1.43 ms | 1.27 ms: 1.12x faster (-11%) | Not significant (1): __main__: toExternalObject 3.6 (taken during a backup, results for toExternalObject had a *huge* stddev of 40%): | Benchmark | 36_list_fieldproperty | 36_list_fieldproperty_patched2 | |------------------------------|-----------------------|--------------------------------| | __main__: toExternalObject | 428 us | 577 us: 1.35x slower (+35%) | | __main__: fromExternalObject | 1.43 ms | 1.19 ms: 1.20x faster (-17%) |

jamadden mentioned this issue Jul 17, 2018

Monkey-patch FieldProperty so we can avoid double-validation. #69

Merged

jamadden closed this as completed in #69 Jul 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FieldProperty fields validate twice, which is slow #67

FieldProperty fields validate twice, which is slow #67

jamadden commented Jul 13, 2018

jamadden commented Jul 17, 2018

FieldProperty fields validate twice, which is slow #67

FieldProperty fields validate twice, which is slow #67

Comments

jamadden commented Jul 13, 2018

jamadden commented Jul 17, 2018