Skip to content

[Spark] [Authz] New Authz Plan Serde Layer#3904

Closed
yaooqinn wants to merge 13 commits intoapache:masterfrom
yaooqinn:na
Closed

[Spark] [Authz] New Authz Plan Serde Layer#3904
yaooqinn wants to merge 13 commits intoapache:masterfrom
yaooqinn:na

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented Dec 5, 2022

Why are the changes needed?

This PR redesigned the authorization part of the spark authz module with a New Authz Plan Serde Layer.

Motivation

  • add a general layer to describe a command, so that we can add a new command or users can add a third-party command easily according to the specification.
  • get rid of the spark version check. The built-in spark commands frequently vary from version to version, which makes us hard to maintain at compile& runtime phase, and the third-party commands are hard to check by spark versions.

Data structure

image

Overall, we introduce 2 general basic data structures:

  • CommandSpec: used to describe a command
    • classname as key for the read-side to get the spec by a particular command
    • pre-defined operation type
    • descriptors
  • Descriptor: used to describe an object, such as table, db, query,
    • fieldName: the object to get
    • fieldExtractor: the method to get the object; use SPI to load
    • sub-descriptors: such as columns in a table
    • etc.

SPI

  • Extractor: implementations for fieldExtractor
    • key: the name of the extractor for the read-side to get itself
    • func: converting the field value to specific and general objects

Code Path

  • Write code path
    • automatically generated default json configuration files
    • custom json configuration files for thrid-party commands
  • Read code path
    • Load json as maps
    • RuleAuthorization -> PrivilegeBuilder.build -> get command desc from maps -> build privileges with the retrieved desc.

TODOs

  • Add back the ArcticCommand
  • Add delta command
  • Add ways for loading custom json configuration files
  • Add hudi commands
  • etc

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

"fieldExtractor" : "LogicalPlanQueryExtractor"
} ]
}, {
"classname" : "org.apache.spark.sql.catalyst.plans.logical.MergeIntoIcebergTable",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Iceberg's command put in table_command_spec.json used by PrivilegesBuilder, how to extend this list to separate Iceberg support in a single plugin or cover more commands ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not cover this case completely, table_command_spec_custom.json maybe used later to support customize third-party commands

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Is it posible to put table_command_spec.json in META-INF , as a sample to for followup third-party commands plugin to expose the command spec and extractor in the same way as well ?

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 6, 2022

Codecov Report

Merging #3904 (379e933) into master (730fd57) will decrease coverage by 0.12%.
The diff coverage is 88.28%.

❗ Current head 379e933 differs from pull request most recent head efafcba. Consider uploading reports for the commit efafcba to get more accurate results

@@             Coverage Diff              @@
##             master    #3904      +/-   ##
============================================
- Coverage     51.91%   51.78%   -0.13%     
  Complexity       13       13              
============================================
  Files           508      521      +13     
  Lines         28996    28763     -233     
  Branches       3982     3849     -133     
============================================
- Hits          15053    14896     -157     
+ Misses        12518    12501      -17     
+ Partials       1425     1366      -59     
Impacted Files Coverage Δ
...ache/kyuubi/plugin/spark/authz/OperationType.scala 100.00% <ø> (+34.56%) ⬆️
.../plugin/spark/authz/ranger/RuleAuthorization.scala 80.95% <0.00%> (-0.45%) ⬇️
...che/kyuubi/plugin/spark/authz/serde/Function.scala 0.00% <0.00%> (ø)
...gin/spark/authz/serde/functionTypeExtractors.scala 73.07% <73.07%> (ø)
...apache/kyuubi/plugin/spark/authz/serde/Table.scala 75.00% <75.00%> (ø)
.../kyuubi/plugin/spark/authz/serde/CommandSpec.scala 80.00% <80.00%> (ø)
...e/kyuubi/plugin/spark/authz/serde/Descriptor.scala 82.43% <82.43%> (ø)
...ache/kyuubi/plugin/spark/authz/serde/package.scala 85.00% <85.00%> (ø)
.../plugin/spark/authz/serde/functionExtractors.scala 85.71% <85.71%> (ø)
.../plugin/spark/authz/serde/databaseExtractors.scala 90.00% <90.00%> (ø)
... and 24 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@yaooqinn yaooqinn changed the title [WIP][Extension][Spark] New Authz Plan Serde Layer [Spark] New Authz Plan Serde Layer Dec 6, 2022
@yaooqinn
Copy link
Copy Markdown
Member Author

yaooqinn commented Dec 6, 2022

cc @bowenliang123 @ulysses-you @pan3793 PTAL when you have time

val functionExtractor = functionExtractors(fieldExtractor)
var function = functionExtractor(functionVal)
if (function.database.isEmpty) {
function = function.copy(database = databaseDesc.map(_.getValue(v)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why TableDesc. getValue does not need fill database ?

Copy link
Copy Markdown
Contributor

@bowenliang123 bowenliang123 Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TableDesc.getValue is filling database of Table in TableExtractor, and the database is part of the identifier for resolved table. For functions, the database is missing in some case and we are trying to get database from databaseDesc.

@bowenliang123
Copy link
Copy Markdown
Contributor

bowenliang123 commented Dec 7, 2022

+1, LGTM. And thanks for the effort. Impressively passed the all the existed auth test in CI with in several commits.

It's a great work for redesigning and modernizing Authz core implementation. It significantly improves the pattern and clearification in rule authentication by components in layers of specs, descriptors, and extractors.

The idea and implemented components in resource privilege checking can be also applied to row filtering, column masking, object filtering, and even to other fields like lineage.

@bowenliang123
Copy link
Copy Markdown
Contributor

fyi @jeanlyn

@yaooqinn yaooqinn closed this in 2540f44 Dec 7, 2022
@yaooqinn yaooqinn mentioned this pull request Dec 7, 2022
5 tasks
@ulysses-you
Copy link
Copy Markdown
Contributor

late lgtm

@yaooqinn yaooqinn deleted the na branch December 7, 2022 10:53
@bowenliang123 bowenliang123 changed the title [Spark] New Authz Plan Serde Layer [Spark] [Authz] New Authz Plan Serde Layer Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants