Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
NIFI-3594 Encrypted provenance repository implementation #1686
This is a big PR, and there is some helpful information before delving into the code.
What is it?
How does it work?
The code will provide more details, and I plan to write extensive documentation for the Admin Guide and User Guide NIFI-3721, but this will suffice for an overview.
The fully qualified class
The simplest configuration is below:
Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the master key defined by
Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (
On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a
Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process.
While there is an obvious performance cost to cryptographic operations, I tried to minimize the impact and to provide an estimate of the metrics of this implementation in comparison to existing behavior.
In general, with low flow event volume, the performance impact is not noticeable -- it is perfectly inline with
With a much higher volume of events, the impact is felt in two ways. First, the throughput of the flow is slower, as more resources are dedicated to encrypting and serializing the events (note the total events processed/events per second). In addition, the provenance queries are slightly slower than the original implementation (1% - 17%), and significantly slower than the new
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
For all changes:
For code changes:
For documentation related changes:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
…der w/ 2 impls, Encryptor skeleton, and exceptions/utilities). Reorganized tests to proper path.
… Pausing to re-evaluate because work may need to be done at lower level (EventWriter/EventReader -- byte/Object serialization).
Added validity checks for algorithm and version in AESProvenanceEventEncryptor. Added unit tests.
…cryption. Added nifi.provenance.repository.encryption.key to default sensitive keys and updated unit tests and test resources. Added method to correctly calculate protected percentage of sensitive keys (unpopulated keys are no longer counted against protection %).
Moved getBestEventIdentifier() from StandardProvenanceEventRecord to ProvenanceEventRecord interface and added delegate in all implementations to avoid ClassCastException from multiple classloaders. Initialized IV before cipher to suppress unnecessary warnings. Added utility method to read encrypted provenance keys from key provider file. Suppressed logging of event record details in LuceneEventIndex. Added logic to create EncryptedSchemaRecordReader (if supported) in RecordReaders. Cleaned up EncryptedSchemaRecordReader and EncryptedSchemaRecordWriter. Added keyProvider, recordReaderFactory, and recordWriterFactory initialization to EncryptedWriteAheadProvenanceRepository to provide complete interceptor implementation. Added logic to RepositoryConfiguration to load encryption-related properties if necessary. Refactored WriteAheadProvenanceRepository to allow subclass implementation. Registered EncryptedWAPR in ProvenanceRepository implementations. Added unit tests for EWAPR. Added new nifi.properties keys for encrypted provenance repository.
@alopresto one thought regarding the StaticKeyProvider... rather than having a .key= perhaps it would make sense to instead use a .key.<key_id>= property scheme. This would allow multiple Key ID's to be used, which would resolve the 'Potential Issue' listed above, of now allowing the key to rotate, with rather minimal changes to the code??
Refactored StaticKeyProvider and FileBasedKeyProvider to reduce duplicate code. Added helper methods in NiFiProperties to read multiple key definitions for StaticKeyProvider. Fixed undetected NPE in tests (storing null value into properties). Added unit tests.
I had to make some substantial changes to enable multiple keys for
This should generate a key map of
@alopresto @YolandaMDavis just a quick update on my review... I think the code looks good and am a +1 on that front. I had no problem getting this up & running and things seem to work well. I just wanted to verify some corner cases like running out of disk space, kill -9 against NiFI, etc. and ensure that NiFi is still able to restart, as this is an issue that some have hit in the past. Otherwise all looks good to me. Thanks!
@alopresto Ran a test where I had an existing unencrypted provenance repo and switched to the encrypted provider. The issues you've documented did occur (where switching between configuration would lead to errors in the log and when querying). However I'm wondering two things:
If clearing the provenance repo when changing configurations would resolve the issue can that step be included in the documentation (if needed I would recommend including that step in the issues section)?
@YolandaMDavis Thanks Yolanda.
During most of my testing, I did clear the repositories between application starts when switching the implementation (
I will re-evaluate those scenarios as soon as I finish reviewing PR 1712 for Bryan.
I do think suppressing the stacktrace and providing a more descriptive error is a good idea and will tackle that as well. If it is determined there is a simple reason your queries were not working, great. If not, adding documentation instructing repository removal may be necessary.
I can reproduce your described issue. I believe the difference is that when I tested and did not clear the existing provenance repository, I was switching between
…o Admin Guide and User Guide. Added screenshot of encrypted provenance repository contents on disk. Added note about clearing existing provenance repository when switching to encrypted implementation (see PR 1686 @ #1686 (comment)). This closes #1713. Signed-off-by: Andy LoPresto <email@example.com>
@alopresto ran tests with the following corner cases:
Also validated that switch between EWAPR and PPR (and vice versa) along with removal of content, flowfile and provenance repos would prevent classcastexception seen previously.
will merge shortly
I rebased and resolved the conflict. There were test errors in