Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SystemAccessControl.getTableColumnMasks SPI function and OPA implementation #21997

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mosiac1
Copy link
Contributor

@mosiac1 mosiac1 commented May 16, 2024

Description

Addresses issues raised in #21359.

Adds the getTableColumnMasks function to the SPI

Map<ColumnSchema, ViewExpression> getTableColumnMasks(ConnectorSecurityContext context, SchemaTableName tableName, List<ColumnSchema> columns);

(which defaults to calling getColumnMask repeatedly) that SystemAcessControl/ConnectorAccessControl plugins can implement to process column mask requests in batches.

Updates the StatementAnalyzer and AccessControl to use this batched method of getting column masks whenever possible (which is all cases).

The issue linked above goes into the performance implications.

This PR updates the OPA plugin to implement the new SPI function with 2 modes of operation:

  • Sending requests to fetch each column mask in parallel; This is a drop-in improvement that doesn't require any config/Rego changes.
  • Using a new batchColumnMasksUri OPA Server endpoint that delegates collection of column masks to the Rego policy code; This requires changes to the Rego code running on the OPA server and the Trino configs.

For the latter mode, the following Rego rule can be added (assuming the columnMask rule is already implemented) to quickly add support for batched column masks. This approach isn't particularly efficient (the with operator in Rego can be costly) and I haven't had the time to benchmark.

batchColumnMasks contains {
   "index": i,
   "viewExpression": cm
} if {
  some i
  cm := columnMask with input.action.resource.column as input.action.filterResources[i].
}

Later edit: did some testing on using the above "trick" - it is very slow (column masks for 1k columns took on avg. 16 seconds to compute). So DO NOT USE THIS IN PRODUCTION. The parallel mode (which this PR enables by default) is much faster.

Additional context and related issues

#21359

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Comment on lines 956 to 972
return columns.stream()
.map(c -> Map.entry(c, getColumnMask(context, tableName, c.getName(), c.getType())))
.filter(entry -> entry.getValue().isPresent())
.collect(Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().get()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should have a default that calls getColumnMask, looking at other function in this interface none of them do this. Should this logic be left to another class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the default should be to call the old method
the engine should call the new method only
the old method should be deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! do we want to remove getColumnMask from the AccessControl interface? that would be the best way to ensure the engine is not using it anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, remove the supposed-to-be-unused method from the interface defined in trino-main

deprecation is needed only for trino-spi's interfaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the function from trino-main's interfaces, let me know if missed some.

Do we want to remove getColumnMask's implementation from the FileBasedAccessControl and OpaAccessControl implementation? (or others if you have any in mind)

Comment on lines 956 to 972
return columns.stream()
.map(c -> Map.entry(c, getColumnMask(context, tableName, c.getName(), c.getType())))
.filter(entry -> entry.getValue().isPresent())
.collect(Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the default should be to call the old method
the engine should call the new method only
the old method should be deprecated

Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Let's cleanup the commits as follows:

  • Squash Deprecate getColumnMask from SPI interfaces into Add access control SPI function for table column masks. The deprecation should be mentioned in the commit message body.
  • Fixup Remove getColumnMask from the AccessControl interface into Add use table column masks SPI function for access control in core and change title to say something like change access control to fetch column masks in bulk
  • Fixup Add FBAC tests for getTableColumnMasks into the commit that changed file based access control (no need to mention you added tests in the commit message/body)
  • For the other two commits change the commit message to say something like "Implement column mask bulk fetch api" or "convert to ...", so it is clear what changed

Comment on lines 2374 to 2375
field.getName().ifPresent(
fieldName -> columnSchemaBuilder.add(ColumnSchema.builder().setName(fieldName).setType(field.getType()).setHidden(field.isHidden()).build()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reformat this so the columnschema builder is split across multiple lines, like this:

Suggested change
field.getName().ifPresent(
fieldName -> columnSchemaBuilder.add(ColumnSchema.builder().setName(fieldName).setType(field.getType()).setHidden(field.isHidden()).build()));
field.getName().ifPresent(fieldName -> columnSchemaBuilder.add(ColumnSchema.builder()
.setName(fieldName)
.setType(field.getType())
.setHidden(field.isHidden())
.build()));

analyzeColumnMask(session.getIdentity().getUser(), table, name, field, accessControlScope, mask.get());
}
Map<ColumnSchema, ViewExpression> masks = accessControl.getTableColumnMasks(session.toSecurityContext(), name, columnSchemas);
for (Map.Entry<ColumnSchema, ViewExpression> entry : masks.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would iterate over columnSchemas instead of the returned map incase connector returns extra junk columns... just a bit safer

@@ -739,9 +741,17 @@ public List<ViewExpression> getRowFilters(SystemSecurityContext context, Catalog
@Override
public Optional<ViewExpression> getColumnMask(SystemSecurityContext context, CatalogSchemaTableName tableName, String columnName, Type type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to drop this method

@@ -125,4 +126,18 @@ public OpaConfig setOpaColumnMaskingUri(URI opaColumnMaskingUri)
this.opaColumnMaskingUri = Optional.ofNullable(opaColumnMaskingUri);
return this;
}

@NotNull
public Optional<URI> getOpaBatchColumnMaskingUri()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on supporting non-batched column masks in the long run. With this PR Trino will always ask for a batch of masks, do you want to support doing a call per column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be supporting single column masks. I do think this should be Optional so we don't break current configurations; the plugin will query opaColumnMaskingUri in parallel to get each mask. Admins can set this once they implement batching in Rego.

Comment on lines 171 to 173
return result.result().stream()
.map(resultItem -> Map.entry(columns.get(resultItem.index()), resultItem.viewExpression()))
.collect(toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit redundant:

Suggested change
return result.result().stream()
.map(resultItem -> Map.entry(columns.get(resultItem.index()), resultItem.viewExpression()))
.collect(toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));
return result.result().stream()
.collect(toImmutableMap(item -> columns.get(item.index(), item-> item.viewExpression()));

Comment on lines 29 to 30
public record OpaBatchColumnMaskQueryResultItem(int index, OpaViewExpression viewExpression)
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT the {} can be on the same line as the record declaration

@github-actions github-actions bot added the hive Hive connector label Jun 7, 2024
@mosiac1 mosiac1 force-pushed the trinodb/column-masks branch 2 times, most recently from 711ee6a to 9aba72b Compare June 7, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

None yet

3 participants