Feature/transform improvements #437

tombooth · 2015-07-01T17:44:55Z

A couple of improvements to transforms based on some work done with @zilnhoj as the Narwhal yesterday. Details about the individual changes are in the commit messages

jcbashdown · 2015-07-02T15:59:39Z

backdrop/transformers/dispatch.py

+            'max_age_expected': input_dataset['max_age_expected'],
+        }
+
+        if 'capped_size' in input_dataset and input_dataset['capped_size']:


All looks good! Comment this up and I shall merge.

It's definitely not possible that input_dataset might also not have a key/value pair for bearer_token/realtime/published/max_age_expected?

In relation to the second comment, the code here https://github.com/alphagov/stagecraft/blob/master/stagecraft/apps/datasets/models/data_set.py#L214 shows that they will exist in a dict

But couldn't any of max_age_expected be None (and so null in the json) and cause the same problem with the schema as capped_size? It looks like it could here:

https://github.com/alphagov/stagecraft/blob/master/stagecraft/apps/datasets/models/data_set.py#L168

As it's the only one of these that can be null apart from capped_size. A user would have to override the default though which I'm not sure it possible - if django detects it's None it will just put it back to the default?

I'm going to merge but putting this here as a bit of history.

A lot of the properties that are received when querying for the input data set are either not wanted or will break the schema validation when POSTd as they are not meant to be part of the body. We would not want a derivative data set to share the same auto_ids as it will have different structure and the transforms should be generating their own specific `_id` fields. We only really want the following fields to be the same: - token: as someone might expect to be able to fiddle with derivative data sets - realtime: if the incoming data is realtime, the transformed data will be. This also relates to `capped_size` which should be configured to tell mongo how much of the realtime data to keep - max_age_expected: the transformed data should be as up to date as the input data set, so if there are rules of how old it should be then that should mirror to the derivative - published: if the input is published we would want the derivative to be too `capped_size` is a little different than the others as Stagecraft will reject a data set with this set to None, as if the key appears in the POSTd JSON it needs to be a string. We are only interested if it is set to a value so it gets ignored otherwise.

The collectors allow you to merge static data into every record sent through to Backdrop, which is very useful for flagging up different sources of data in a single data set. We gain similar value from the application of tags in transforms as we want a single data set tracking the completion rate of two different sources (old and new forms). We would like to be able to compare them and leave them side by side in a single data set. This has been written so that it is applied post the transformation of the data, regardless of what type of transform was executed. This push for commonality in utility options is something that we have learnt is important from the drift in config and impl in the collectors

Feature/transform improvements

jcbashdown reviewed Jul 2, 2015
View reviewed changes

tombooth added 2 commits July 2, 2015 17:16

tombooth force-pushed the feature/transform-improvements branch from dec848c to 401dd56 Compare July 2, 2015 16:24

jcbashdown added a commit that referenced this pull request Jul 3, 2015

Merge pull request #437 from alphagov/feature/transform-improvements

a0153d7

Feature/transform improvements

jcbashdown merged commit a0153d7 into master Jul 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/transform improvements #437

Feature/transform improvements #437

tombooth commented Jul 1, 2015

jcbashdown Jul 2, 2015

jcbashdown Jul 2, 2015

tombooth Jul 2, 2015

jcbashdown Jul 3, 2015

Feature/transform improvements #437

Feature/transform improvements #437

Conversation

tombooth commented Jul 1, 2015

jcbashdown Jul 2, 2015

Choose a reason for hiding this comment

jcbashdown Jul 2, 2015

Choose a reason for hiding this comment

tombooth Jul 2, 2015

Choose a reason for hiding this comment

jcbashdown Jul 3, 2015

Choose a reason for hiding this comment