Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update entities in batch when field encryption settings change #26

Merged
merged 11 commits into from
Mar 13, 2016

Conversation

svendecabooter
Copy link
Contributor

This PR fixes the following:

This has been tested with the following scenario's:

  • Non-encrypted field gets encryption --> plain text data gets encrypted
  • Encrypted field (with method X) settings get changed to use method Y --> encrypted data gets re-encrypted with the new encryption method
  • Encrypted field made unencrypted (checkbox unchecked) --> encrypted data gets stored as plain text

…ncrypt field storage if it wasn't encrypted before.
…updating the field storage settings:

- field without encryption becomes encrypted
- field with encryption gets encrypted with another encryption profile
- field with encryption becomes unencrypted
@svendecabooter
Copy link
Contributor Author

After the update is performed, old values still seem to be in the cache.
Can't quite figure out how to remove them from the cache... normally the node:[NID] cache invalidation gets called on node save, so they should be removed...

@nerdstein
Copy link
Contributor

@svendecabooter i am really curious if the Batch API is the way to go. Doesn't this signify a lapse of time between the encryption settings of a field (updated) and the potential data updates needed to process the change? Maybe this is resolved by revisions or something, but I need to understand this a lot more from your point of view

@svendecabooter
Copy link
Contributor Author

I'm not sure I understand the concern exactly.
You mean there would be a problem if entities get created in the meantime?

Upon submitting the form to change the field settings, we query the database to get all entities with this field, and store the original (pre-update) encryption settings.

The batch is then run on each of the entities, (optionally) decrypting with the old settings, and (re)encrypting with the newly saved settings.

While the batch is running, the new settings are already saved, so new entities would be encrypted with the correct settings.

@svendecabooter
Copy link
Contributor Author

I guess there would be a problem if entities get updated manually, while the batch is running...

I think realistically we only have 2 choices:

  • Work like core: once there is data in the field, the storage settings can no longer be updated. This is the easiest to preserve data integratity, but not very user friendly.
  • Allow the option to change the settings (and perform encryption / decryption even when data exists). Though this should probably be marked with a big warning about data integrity.
    It could be very helpful if you want to encrypt data that was already stored in plaintext earlier, but doing it in a live and changing environment can indeed cause problems, so a decent backup and a controlled environment to perform the updates seem like necessities.

@svendecabooter
Copy link
Contributor Author

Either way this feature is a pretty big game of whack-a-mole, with all the possible variations (multiple languages, multiple revisions, multiple bundles) and encryption / decryption actions that get performed.

Every time I fix a use case, another use case breaks again, so this will need thorough testing, if we can get it into a state where every use case seems to work.

@nerdstein
Copy link
Contributor

@svendecabooter understood and I'm comfortable making incremental progress. Let's hop on a hangout and chat.

@svendecabooter
Copy link
Contributor Author

The latest commits make the code work for 3 scenario's mentioned above (start encrypting existing field, change encryption method on existing field, stop encrypting existing field), tested with the following use cases:

  • regular entity
  • entity with revisions
  • entity with multiple translations

I can start writing some tests to confirm this, if we agree to add this feature.

@nerdstein
Copy link
Contributor

@svendecabooter I gave this a lot more thought and yes the Batch API is probably the most elegant way to get this done. I was confusing myself a bit thinking through all of this, but nothing that wasn't cured with a good night's sleep.

I do think, if possible, we could "lock" records when we process them inside of the batch API. Do you know of any features?

I am not talking about "code" level locking, as demonstrated here: https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Lock%21LockBackendInterface.php/group/lock/8

I believe each entity storage has a "locked" attribute (as seen here: https://api.drupal.org/api/drupal/core!modules!field!src!Entity!FieldStorageConfig.php/class/FieldStorageConfig/8).

Maybe it's possible for us to set locked=true for the entire field before processing the batch API records. Or, maybe this is overkill altogether. I would love to hear your thoughts.

Regardless, I want to stress we're on the right track with this. I'm going to review the code in depth and provide any inline comments right now

@@ -34,6 +33,11 @@ function field_encrypt_form_alter(&$form, FormStateInterface $form_state, $form_
'#open' => TRUE,
);

// Display a warning about changing field data.
if ($form_id == "field_storage_config_edit_form" && $field->hasData()) {
$form['field_encrypt']['#prefix'] = '<div class="messages messages--warning">' .t('Warning: changing field encryption settings may cause data corruption!<br />When changing these settings, existing fields will be (re)encrypted in batch according to the new settings. <br />Make sure you have a proper backup, and do not perform this action in an environment where the data will be changing during the batch operation, to avoid data loss.') . '</div>';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@nerdstein
Copy link
Contributor

Code reviewed and it looks really good. Minor comments.

One other potential black hole I thought of:

Are we assuming field config changes happen only on production? These settings are in CMI …. would we possibly need the batch API to run if these config changes happen on another environment? E.g. someone updates a YML file w/ field encrypt settings that are then pushed up to another environment

@nerdstein
Copy link
Contributor

Black hole comments:

https://pantheon.io/blog/hard-things-are-possible-configuration-management-drupal-8

Config API has some mediation events through event subscribers (it seems). An example: https://api.drupal.org/api/drupal/core!lib!Drupal!Core!Config!ConfigEvents.php/constant/ConfigEvents%3A%3ASAVE/8

Full set of events listed here: https://api.drupal.org/api/drupal/core!lib!Drupal!Core!Config!ConfigEvents.php/class/ConfigEvents/8

In short, it seems we need an EventSubscriber that monitors config changes to FieldSettings. If we see encrypted == true and a change between existing field encryption settings, we invoke our same batch API code (in theory and maybe not in practice).

This was referenced Mar 12, 2016
@svendecabooter
Copy link
Contributor Author

I think this is probably good to be merged.

Followup issues for the remaining outstanding tasks:

Let me know if I missed something

@nerdstein
Copy link
Contributor

Merging

nerdstein added a commit that referenced this pull request Mar 13, 2016
Update entities in batch when field encryption settings change
@nerdstein nerdstein merged commit 0943a38 into d8-contrib-modules:master Mar 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants