Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
NIFI-3594 Encrypted provenance repository implementation #1686
This is a big PR, and there is some helpful information before delving into the code.
What is it?
How does it work?
The code will provide more details, and I plan to write extensive documentation for the Admin Guide and User Guide NIFI-3721, but this will suffice for an overview.
The fully qualified class
The simplest configuration is below:
Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the master key defined by
Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (
On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a
Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process.
While there is an obvious performance cost to cryptographic operations, I tried to minimize the impact and to provide an estimate of the metrics of this implementation in comparison to existing behavior.
In general, with low flow event volume, the performance impact is not noticeable -- it is perfectly inline with
With a much higher volume of events, the impact is felt in two ways. First, the throughput of the flow is slower, as more resources are dedicated to encrypting and serializing the events (note the total events processed/events per second). In addition, the provenance queries are slightly slower than the original implementation (1% - 17%), and significantly slower than the new
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
For all changes:
For code changes:
For documentation related changes:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
@alopresto one thought regarding the StaticKeyProvider... rather than having a .key= perhaps it would make sense to instead use a .key.<key_id>= property scheme. This would allow multiple Key ID's to be used, which would resolve the 'Potential Issue' listed above, of now allowing the key to rotate, with rather minimal changes to the code??
I had to make some substantial changes to enable multiple keys for
This should generate a key map of
@alopresto @YolandaMDavis just a quick update on my review... I think the code looks good and am a +1 on that front. I had no problem getting this up & running and things seem to work well. I just wanted to verify some corner cases like running out of disk space, kill -9 against NiFI, etc. and ensure that NiFi is still able to restart, as this is an issue that some have hit in the past. Otherwise all looks good to me. Thanks!
@alopresto Ran a test where I had an existing unencrypted provenance repo and switched to the encrypted provider. The issues you've documented did occur (where switching between configuration would lead to errors in the log and when querying). However I'm wondering two things:
If clearing the provenance repo when changing configurations would resolve the issue can that step be included in the documentation (if needed I would recommend including that step in the issues section)?
@YolandaMDavis Thanks Yolanda.
During most of my testing, I did clear the repositories between application starts when switching the implementation (
I will re-evaluate those scenarios as soon as I finish reviewing PR 1712 for Bryan.
I do think suppressing the stacktrace and providing a more descriptive error is a good idea and will tackle that as well. If it is determined there is a simple reason your queries were not working, great. If not, adding documentation instructing repository removal may be necessary.
I can reproduce your described issue. I believe the difference is that when I tested and did not clear the existing provenance repository, I was switching between
added a commit
this pull request
May 1, 2017
pushed a commit
this pull request
May 1, 2017
@alopresto ran tests with the following corner cases:
Also validated that switch between EWAPR and PPR (and vice versa) along with removal of content, flowfile and provenance repos would prevent classcastexception seen previously.
will merge shortly
I rebased and resolved the conflict. There were test errors in