Skip to content

Commit

Permalink
[DOC] Add Documentation for Spark AuthZ Extension
Browse files Browse the repository at this point in the history
  • Loading branch information
yaooqinn committed Apr 16, 2022
1 parent 852e7fd commit 109440b
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 13 deletions.
1 change: 0 additions & 1 deletion docs/security/authorization/spark/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,3 @@ Kyuubi Spark AuthZ Plugin
Overview <overview>
Building <build>
Installing <install>
Authorization Plugin for Spark SQL <authorization>
27 changes: 22 additions & 5 deletions docs/security/authorization/spark/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,32 @@

## Pre-install

- [Apache Ranger](https://ranger.apache.org/)

This plugin works as a ranger rest client with Apache Ranger admin server to do privilege check.
Thus, a ranger server need to be installed ahead and available to use.

- Building(optional)

If your ranger admin or spark distribution is not compatible with the official pre-built [artifact](https://mvnrepository.com/artifact/org.apache.kyuubi/kyuubi-spark-authz) in maven central.
You need to [build](build.md) the plugin targeting the spark/ranger you are using by yourself.

## Install

With the `kyuubi-spark-authz_*.jar` and its transitive dependencies available for spark runtime classpath, such as
- Copied to `$SPARK_HOME/jars`, or
- Specified to `spark.jars` configuration

## Configure

### Settings for Connecting Ranger Admin

Create `ranger-spark-security.xml` in `$SPARK_HOME/conf` and add the following configurations
#### ranger-spark-security.xml
- Create `ranger-spark-security.xml` in `$SPARK_HOME/conf` and add the following configurations
for pointing to the right Ranger admin server.


```xml
<configuration>

<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>ranger admin address like http://ranger-admin.org:6080</value>
Expand Down Expand Up @@ -63,6 +76,8 @@ for pointing to the right Ranger admin server.
</configuration>
```

#### ranger-spark-audit.xml

Create `ranger-spark-audit.xml` in `$SPARK_HOME/conf` and add the following configurations
to enable/disable auditing.

Expand Down Expand Up @@ -104,6 +119,8 @@ to enable/disable auditing.

### Settings for Spark Session Extensions

Add `org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension` to the spark configuration `spark.sql.extensions`.



```properties
spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
```
25 changes: 19 additions & 6 deletions docs/security/authorization/spark/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,33 @@ storage-based authorization is enabled by default, which only provides file-leve
When row/column-level fine-grained access control is required,
we can enhance the data access model with the Kyuubi Spark AuthZ plugin.

## Authorization in Kyuubi

## Storage-Based Authorization
### Storage-based Authorization

Enabling Storage Based Authorization in the `Hive Metastore Server` uses the HDFS permissions to act as the main source for verification and allows for consistent data and metadata authorization policy.
This allows control over metadata access by verifying if the user has permission to access corresponding directories on the HDFS.
Similar with `HiveServer2`, files and directories will be mapping to hive metadata objects, such as databases, tables, partitions, and be protected from end user's queries through Kyuubi.
As Kyuubi supports multi tenancy, a tenant can only visit authorized resources,
including computing resources, data, etc.
Most file systems, such as HDFS, support ACL management based on files and directories.

Storage-Based Authorization offers users with Database, Table and Partition-level coarse-gained access control.
A so called Storage-based authorization mode is supported by Kyuubi by default.
In this model, all objects, such as databases, tables, partitions, in meta layer are mapping to folders or files in the storage layer,
as well as their permissions.

## SQL-Standard Based Authorization with Ranger
Storage-based authorization offers users with database, table and partition-level coarse-gained access control.

### SQL-standard authorization with Ranger

A SQL-standard authorization usually offers a row/colum-level fine-grained access control to meet the real-world data security need.

[Apache Ranger](https://ranger.apache.org/) is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
This plugin enables Kyuubi with data and metadata control access ability for Spark SQL Engines, including,

- Column-level fine-grained authorization
- Row-level fine-grained authorization, a.k.a. Row-level filtering
- Data masking

## The Plugin Itself

Kyuubi Spark Authz Plugin itself provides general purpose for ACL management for data & metadata while using Spark SQL.
It is not necessary to deploy it with the Kyuubi server and engine, and can be used as an extension for any Spark SQL jobs.
However, the authorization always requires a robust authentication layer and multi tenancy support, so Kyuubi is a perfect match.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import org.apache.kyuubi.plugin.spark.authz.util.RuleEliminateMarker
* <ul>
* <li>Table/Column level authorization(yes)</li>
* <li>Row level filtering(yes)</li>
* <li>Data masking(no)</li>
* <li>Data masking(yes)</li>
* <ul>
*
* To work with Spark SQL, we need to enable it via spark extensions
Expand Down

0 comments on commit 109440b

Please sign in to comment.