Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment encryption on Controller based on table config #5617

Merged

Conversation

sajjad-moradi
Copy link
Contributor

@sajjad-moradi sajjad-moradi commented Jun 25, 2020

Description

Currently if one wants to have their segments encrypted on Pinot cluster, they need to encrypt the segments before upload and then on the upload http request, CRYPTER header with crypter class name as the value needs to be set. In this PR, encryption is added on Controller side by introducing an optional config crypterClassName int table config. In UploadSegment path of Controller's REST API, this config is retrieved and if its value is not null, it triggers segment encryption using the specified crypter.
Note that to have less pressure on ZK for retrieving table config on each segment upload, an existing cache for table configs, TableCache in HelixResourceManager is being used.

Testing Done

Ran Controller, Server, and Broker locally with the new changes and verified the desired behavior in different scenarios for uploading segments:

  1. CRYPTER header set to NoOpPinotCrypter
  2. crypterClassName config set to NoOpPinotCrypter
  3. Both 1 and 2
  4. Neither

Also in each scenario, checked the value of crypter property of the uploaded segment ZK metadata.

@Jackie-Jiang
Copy link
Contributor

I would recommend putting this under the SegmentsValidationAndRetentionConfig instead of creating another top level config.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npawar should the new crypter class name go into ingestion config (because this is related to ingestion)? Or, as @Jackie-Jiang recommends, in the segment validation and retention?

return _tableConfigChangeListener._tableConfigMap.get(canonicalize(tableName));
}

private String canonicalize(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Are you planning to add additional canonicalization logic? If not, we are substituting one method for another, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latter, probably not worth creating another layer of method call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using canonicalize explicitly states the reason why we're making the those strings lower case which IMO makes the code more maintainable, but since you both think otherwise, I'm reverting it.

decryptFile(crypterClassNameInHeader, tempEncryptedFile, tempDecryptedFile);
}

// TODO: Change when metadata upload added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an issue for this and reference the issue number? It is not clear what needs to be done here. I am guessing that the change is that if metadata is added, then we don't need to extract metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why TODO is added there. I looked at it git history and it's added on this PR: 3155. If you (or other ppl) can confirm the intention there, i'll open up an issue for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through some history. I believe it is trying to refer to the case when the client (segment pusher) does the following:
(1) Upload segment to deepstore
(2) upload metadata to controller so that it can update ZK.

We don't support this pattern. When a URI is uploaded, we actually download the segment and extract metadata instead.

A lot of code needs to change when we support metata-only upload, so let us just get rid of this TODO. thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we will need the support for metadata push pretty soon. But that is somewhat unrelated here, so just an FYI at this point in time.


boolean segmentNeedsEncryption = !Strings.isNullOrEmpty(crypterClassName);
if (segmentNeedsEncryption && uploadedSegmentIsEncrypted) {
throw new ControllerApplicationException(LOGGER, String.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an exception? Should this just not be a case where encryption is not needed? Why are we explicitly preventing the client from encrypting the segment over the wire?
I would suggest the following:
(1) Check the crypter class name. If they are the same in the header and tableconfig, then accept the segment.
(2) If they are not same, we have a few choices: Accept the segment after bumping a metric (and a warn log), or reject the segment with an exception. The problem with rejecting the segment is if the crypter classes are some extension of one another, and actually work fine, then we may be unnecessarily rejecting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Refactored per your suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with rejecting the segment option. If we encounter users having same crypter with different names, it'll be a small refactor to allow that change.


// get crypter class name from table config
String crypterClassName = _pinotHelixResourceManager.getCrypterClassNameFromTableConfig(offlineTableName);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning right here if segment does not need encryption? The logic is easier to follow then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, its better to always go through the same path and just make the crypter no-op

LOGGER.info("Using crypter class {}", pinotCrypter.getClass().getName());
private void decryptFile(String crypterClassName, File tempEncryptedFile, File tempDecryptedFile) {
PinotCrypter pinotCrypter = PinotCrypterFactory.create(crypterClassName);
LOGGER.info("Using crypter class {} for decryption.", pinotCrypter.getClass().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the file names to this log, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@npawar
Copy link
Contributor

npawar commented Jun 25, 2020

@npawar should the new crypter class name go into ingestion config (because this is related to ingestion)? Or, as @Jackie-Jiang recommends, in the segment validation and retention?

I think SegmentSConfig is a better place than ingestionConfig. This is more like a segment property than a "how to ingest" property.
It definitely shouldn't be a top level property

return _tableConfigChangeListener._tableConfigMap.get(canonicalize(tableName));
}

private String canonicalize(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latter, probably not worth creating another layer of method call.

@@ -303,46 +335,37 @@ private String getZkDownloadURIForSegmentUpload(String rawTableName, String segm
}
}

private SegmentMetadata getMetadataForURI(String crypterClassHeader, String currentSegmentLocationURI,
File tempEncryptedFile, File tempDecryptedFile, File tempSegmentDir, String metadataProviderClass)
private void getSegmentFileFromURI(String currentSegmentLocationURI, File destFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A get method returning void seems odd to me. Probably s/get/download?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍 Refactored.

@@ -251,6 +247,20 @@ private SuccessResponse uploadSegment(@Nullable String tableName, FormDataMultiP
_controllerMetrics, _leadControllerManager.isLeaderForTable(offlineTableName))
.validateOfflineSegment(offlineTableName, segmentMetadata, tempSegmentDir);

Pair<Boolean, String> encryptionInfo =
encryptSegmentIfNeeded(offlineTableName, tempDecryptedFile, tempEncryptedFile, uploadedSegmentIsEncrypted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypterClassName should be an input to this method, not a return value? This method should just return the boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kishoreg
Copy link
Member

I would recommend putting this under the SegmentsValidationAndRetentionConfig instead of creating another top-level config.

can we rename 'SegmentsValidationAndRetentionConfig' to segmentConfig?

@sajjad-moradi sajjad-moradi force-pushed the feature/encryption.in.controller branch from 2ec6fab to 190f3f3 Compare June 30, 2020 00:56
@sajjad-moradi
Copy link
Contributor Author

I would recommend putting this under the SegmentsValidationAndRetentionConfig instead of creating another top-level config.

can we rename 'SegmentsValidationAndRetentionConfig' to segmentConfig?

I tried to do that, but it added a lot of noise to the PR, so I decided to leave the renaming to another PR. I'll create a separate PR for it after this one goes in.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in uploadSegment is quite complicated, so needs a unit test i believe, with all the different combinations tested for correctness.

It will be awesome if you can re-factor the logic into a method that can then be tested independently for each combo (inbound segment encrypted, with same/different class, table config has/has not the encryption, whatever).

And add a unit test to test all these combinations.


// TODO: Change when metadata upload added
String metadataProviderClass = DefaultMetadataExtractor.class.getName();
boolean uploadedSegmentIsEncrypted = !Strings.isNullOrEmpty(crypterClassNameInHeader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean uploadedSegmentIsEncrypted = !Strings.isNullOrEmpty(crypterClassNameInHeader);
boolean isUploadedSegmentEncrypted = !Strings.isNullOrEmpty(crypterClassNameInHeader);

decryptFile(crypterClassNameInHeader, tempEncryptedFile, tempDecryptedFile);
}

// TODO: Change when metadata upload added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through some history. I believe it is trying to refer to the case when the client (segment pusher) does the following:
(1) Upload segment to deepstore
(2) upload metadata to controller so that it can update ZK.

We don't support this pattern. When a URI is uploaded, we actually download the segment and extract metadata instead.

A lot of code needs to change when we support metata-only upload, so let us just get rid of this TODO. thanks


if (uploadedSegmentIsEncrypted) {
if (!crypterClassNameInTableConfig.equalsIgnoreCase(crypterUsedInUploadedSegment)) {
throw new ControllerApplicationException(LOGGER, String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add segment name and table name to the exception string

}

if (uploadedSegmentIsEncrypted) {
if (!crypterClassNameInTableConfig.equalsIgnoreCase(crypterUsedInUploadedSegment)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!crypterClassNameInTableConfig.equalsIgnoreCase(crypterUsedInUploadedSegment)) {
if (!crypterClassNameInTableConfig.equals(crypterUsedInUploadedSegment)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -290,6 +297,34 @@ private String extractHttpHeader(HttpHeaders headers, String name) {
return value;
}

private Boolean encryptSegmentIfNeeded(String offlineTableName, File tempDecryptedFile, File tempEncryptedFile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private Boolean encryptSegmentIfNeeded(String offlineTableName, File tempDecryptedFile, File tempEncryptedFile,
private boolean encryptSegmentIfNeeded(String offlineTableName, File tempDecryptedFile, File tempEncryptedFile,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

@sajjad-moradi sajjad-moradi force-pushed the feature/encryption.in.controller branch from 190f3f3 to 055e47a Compare June 30, 2020 22:11

// encrypt segment
PinotCrypter pinotCrypter = PinotCrypterFactory.create(crypterClassNameInTableConfig);
LOGGER.info("Using crypter class {} for encrypting {} to {}.", crypterClassNameInTableConfig, tempDecryptedFile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add table name and segment name here. In production, we will see multiple parallel pushes going on, and it is useful to determine which segment/table we are talking about.

segmentMetadata = getSegmentMetadata(crypterClassHeader, tempEncryptedFile, tempDecryptedFile, tempSegmentDir,
metadataProviderClass);
return segmentMetadata;
LOGGER.info("Downloading segment from {} to {}", currentSegmentLocationURI, destFile.getAbsolutePath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that the URI tells us segment and table name here, otherwise, it is useful to add table/segment name i this log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same way, but I added the table name to the logs anyways. For the segment, it is not available here. Segment needs to be downloaded first and then its name can be retrieved from the segment metadata.

@@ -98,7 +102,7 @@ public synchronized void refresh() {
try {
TableConfig tableConfig = TableConfigUtils.fromZNRecord(znRecord);
String tableNameWithType = tableConfig.getTableName();
_tableConfigMap.put(tableNameWithType, tableConfig);
_tableConfigMap.put(tableNameWithType.toLowerCase(), tableConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we lower case here. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. I was following the pattern with column and table name in this class.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the log message

Co-authored-by: Subbu Subramaniam <mcvsubbu@users.noreply.github.com>
@mcvsubbu mcvsubbu merged commit 40a3152 into apache:master Jul 1, 2020
@kishoreg
Copy link
Member

kishoreg commented Jul 2, 2020

is Crypter part of the pinot-spi? Should we treat this similar to other plugins?

@@ -177,13 +179,13 @@ public Response downloadSegment(
private SuccessResponse uploadSegment(@Nullable String tableName, FormDataMultiPart multiPart,
boolean enableParallelPushProtection, HttpHeaders headers, Request request, boolean moveSegmentToFinalLocation) {
String uploadTypeStr = null;
String crypterClassName = null;
String crypterClassNameInHeader = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to pass the crypterName instead of the classname right? Its ok to have the ability to override it with a classname but it weird to have users pass classname in the request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're talking about the difference between fully qualified name vs just the name of the crypter, then how can we differentiate two crypters which have the same name but different packages?

segmentMetadata =
getMetadataForURI(crypterClassName, downloadUri, tempEncryptedFile, tempDecryptedFile, tempSegmentDir,
metadataProviderClass);
downloadSegmentFileFromURI(downloadUri, dstFile, tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we download the segment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading logic was there before. I just renamed the functions.
Basically if the download uri, instead of the segment data, is provided, controllers download the segment from the given uri.

break;
default:
throw new UnsupportedOperationException("Unsupported upload type: " + uploadType);
}

if (uploadedSegmentIsEncrypted) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to always go through the crypter with noopcrypter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, the reader needs to know the assumption that there's always a default noopcrypter which gets used when there's no encryption involved. IMO using the proposed approach, it's more explicit and reads better - if the segment is encrypted, then decryption needs to happen.

@@ -251,6 +246,17 @@ private SuccessResponse uploadSegment(@Nullable String tableName, FormDataMultiP
_controllerMetrics, _leadControllerManager.isLeaderForTable(offlineTableName))
.validateOfflineSegment(offlineTableName, segmentMetadata, tempSegmentDir);

// Encrypt segment
String crypterClassNameInTableConfig =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this making a ZK call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it uses an existing TableCache which gets updated upon changes.


// get crypter class name from table config
String crypterClassName = _pinotHelixResourceManager.getCrypterClassNameFromTableConfig(offlineTableName);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, its better to always go through the same path and just make the crypter no-op

@kishoreg
Copy link
Member

kishoreg commented Jul 2, 2020

I would recommend putting this under the SegmentsValidationAndRetentionConfig instead of creating another top level config.

Why is this the right place for this config. IMO, SegmentsValidationAndRetentionConfig should be broken into multiple sections.

@Jackie-Jiang
Copy link
Contributor

I would recommend putting this under the SegmentsValidationAndRetentionConfig instead of creating another top level config.

Why is this the right place for this config. IMO, SegmentsValidationAndRetentionConfig should be broken into multiple sections.

That's why I added a TODO in this config: TODO: Consider break this config into multiple configs
At least we can rename it to SegmentsConfig.
But I don't think this is in the scope of this PR.

@sajjad-moradi
Copy link
Contributor Author

is Crypter part of the pinot-spi?

It is part of pinot-spi.

Should we treat this similar to other plugins?

I'm not sure. I'll let more experienced ppl to chime in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants