Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 2, 2024

What changes were proposed in this pull request?

This PR aims to support JWSFilter which is a servlet filter that requires JWS, a cryptographically signed JSON Web Token, in the header via spark.ui.filters configuration.

  • spark.ui.filters=org.apache.spark.ui.JWSFilter
  • spark.org.apache.spark.ui.JWSFilter.param.key=YOUR-BASE64URL-ENCODED-KEY

To simply put, JWSFilter will check the following for all requests.

  • The HTTP request should have Authorization: Bearer <jws> header.
    • <jws> is a string with three fields, <header>.<payload>.<signature>.
    • <header> is supposed to be a base64url-encoded string of {"alg":"HS256","typ":"JWT"}.
    • <payload> is a base64url-encoded string of fully-user-defined content.
    • <signature> is a signature based on <header>.<payload> and a user-provided key parameter.

For example, the value of <header> will be eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 always and the value of payload can be e30 if the payload is empty, {}. The <signature> part is changed by the shared value of spark.org.apache.spark.ui.JWSFilter.param.key between the server and client.

jshell> java.util.Base64.getUrlEncoder().encodeToString("{\"alg\":\"HS256\",\"typ\":\"JWT\"}".getBytes())
$2 ==> "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"

jshell> java.util.Base64.getUrlEncoder().encodeToString("{}".getBytes())
$3 ==> "e30="

Why are the changes needed?

To provide a little better security on WebUI consistently including Spark Standalone Clusters.

For example,

SETTING

$ jshell
|  Welcome to JShell -- Version 17.0.12
|  For an introduction type: /help intro

jshell> java.util.Base64.getUrlEncoder().encodeToString("Visit https://spark.apache.org to download Apache Spark.".getBytes())
$1 ==> "VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4="
$ cat conf/spark-defaults.conf
spark.ui.filters org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.key VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=

SPARK-SHELL

$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ bin/spark-shell

Without JWS (ErrorCode: 403 Forbidden)

$ curl -v http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61313 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:27:23 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 472
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$2-3b39bee2</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact

With JWS,

$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61311 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 302 Found
< Date: Fri, 02 Aug 2024 01:27:01 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://localhost:4040/jobs/
< Content-Length: 0
<
* Connection #0 to host localhost left intact

SPARK MASTER

Without JWS (ErrorCode: 403 Forbidden)

$ curl -v http://localhost:8080/json/
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61331 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:34:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 477
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/json/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$1-6c52101f</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact

With JWS

$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:8080/json/

* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61329 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Fri, 02 Aug 2024 01:33:10 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: text/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 320
<
{
  "url" : "spark://M3-Max.local:7077",
  "workers" : [ ],
  "aliveworkers" : 0,
  "cores" : 0,
  "coresused" : 0,
  "memory" : 0,
  "memoryused" : 0,
  "resources" : [ ],
  "resourcesused" : [ ],
  "activeapps" : [ ],
  "completedapps" : [ ],
  "activedrivers" : [ ],
  "completeddrivers" : [ ],
  "status" : "ALIVE"
* Connection #0 to host localhost left intact
}%

Does this PR introduce any user-facing change?

No, this is a new filter.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

<groupId>io.jsonwebtoken</groupId>
<artifactId>jjwt-jackson</artifactId>
<version>0.12.6</version>
<scope>test</scope>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a test dependency for now because the user may want to use GSON instead of this.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR about Spark UI (including Spark Cluster), @viirya ?

val claims = Jwts.parser().verifyWith(key).build().parseSignedClaims(token)
chain.doFilter(req, res)
case _ =>
hres.sendError(HttpServletResponse.SC_FORBIDDEN, s"Malformed ${AUTHORIZATION} header.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hres.sendError(HttpServletResponse.SC_FORBIDDEN, s"Malformed ${AUTHORIZATION} header.")
hres.sendError(HttpServletResponse.SC_FORBIDDEN, s"Malformed JWT ${AUTHORIZATION} header.")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but actually, the previous one is better because Bearer is one of type~

Authorization: <type> <credentials> pattern is W3C in HTTP 1.0 spec, instead of a specific to JTW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but the current one also doesn't have Bearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and, the missing Bearer is an issue of Authorization header, not a JWT token. Here, JWT token itself doesn't exist yet.

Comment on lines +121 to +125
<dependency>
<groupId>io.jsonwebtoken</groupId>
<artifactId>jjwt-api</artifactId>
<version>0.12.6</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users don't use the JWSFilter feature, we still need to include this new dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for now. Of course, we can make this as a profile.

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

jettison/1.5.4//jettison-1.5.4.jar
jetty-util-ajax/11.0.21//jetty-util-ajax-11.0.21.jar
jetty-util/11.0.21//jetty-util-11.0.21.jar
jjwt-api/0.12.6//jjwt-api-0.12.6.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update our NOTICE-binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sure, @yaooqinn ! It's Apache License. Let me add this item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0-preview2.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49090 branch August 2, 2024 14:11
dongjoon-hyun added a commit that referenced this pull request Aug 4, 2024
### What changes were proposed in this pull request?

This PR aims to support `spark.master.rest.filters` configuration like the existing `spark.ui.filters` configuration.

Recently, Apache Spark starts to support `JWSFilter`. We can take advantage of `JWSFilter` to protect Spark Master REST API.
- #47575

### Why are the changes needed?

Like `Spark UI`, we had better provide the same capability to Apache Spark Master REST API .

For example, we can protect `JWSFilter` to `Spark Master REST API` like the following.

**MASTER REST API WITH JWSFilter**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ SPARK_NO_DAEMONIZE=1 \
SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.filters=org.apache.spark.ui.JWSFilter -Dspark.org.apache.spark.ui.JWSFilter.param.key=VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=" \
sbin/start-master.sh
```

**AUTHORIZATION FAILURE**
```
$ curl -v -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51705 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Sat, 03 Aug 2024 22:18:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 590
< Server: Jetty(11.0.21)
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/v1/submissions/clear</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.deploy.rest.StandaloneClearRequestServlet-7f171159</td></tr>
</table>
<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 11.0.21</a><hr/>

</body>
</html>
* Connection #0 to host localhost left intact
```

**SUCCESS**
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51697 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 03 Aug 2024 22:16:51 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 113
< Server: Jetty(11.0.21)
<
{
  "action" : "ClearResponse",
  "message" : "",
  "serverSparkVersion" : "4.0.0-SNAPSHOT",
  "success" : true
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new feature which is not loaded by default.

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47595 from dongjoon-hyun/SPARK-49103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
HyukjinKwon pushed a commit that referenced this pull request Aug 4, 2024
…REST API and rename parameter to `secretKey`

### What changes were proposed in this pull request?

This PR aims the following.
- Document `JWSFilter` and its usage in `Spark UI` and `REST API`
    - `Spark UI` section of `Configuration` page
    - `Spark Security` page
    - `Spark Standalone` page
- Rename the parameter `key` to `secretKey` to redact it in Spark Driver UI and Spark Master UI.

### Why are the changes needed?

To apply recent new security features
- #47575
- #47595

### Does this PR introduce _any_ user-facing change?

No because this is a new feature of Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs and manual review.

- `spark-standalone.html`
![Screenshot 2024-08-03 at 22 40 53](https://github.com/user-attachments/assets/f1b95a01-c14b-4f14-96b6-3181afaf6f9f)

- `security.html`
![Screenshot 2024-08-03 at 22 39 00](https://github.com/user-attachments/assets/8413f6a3-47df-4d71-87ee-25ab32171c6c)
![Screenshot 2024-08-03 at 22 39 51](https://github.com/user-attachments/assets/01546724-d5b5-40d5-a980-236f9d13ae81)

- `configuration.html`
![Screenshot 2024-08-03 at 22 38 07](https://github.com/user-attachments/assets/c0845a7f-6ae1-4194-b98a-68d7442c9785)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47596 from dongjoon-hyun/SPARK-49104.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
fusheng9399 pushed a commit to fusheng9399/spark that referenced this pull request Aug 6, 2024
### What changes were proposed in this pull request?

This PR aims to support `JWSFilter`  which is a servlet filter that requires `JWS`, a cryptographically signed JSON Web Token, in the header via `spark.ui.filters` configuration.

- spark.ui.filters=org.apache.spark.ui.JWSFilter
- spark.org.apache.spark.ui.JWSFilter.param.key=YOUR-BASE64URL-ENCODED-KEY

To simply put, `JWSFilter` will check the following for all requests.
- The HTTP request should have `Authorization: Bearer <jws>` header.
  - `<jws>` is a string with three fields, `<header>.<payload>.<signature>`.
  - `<header>` is supposed to be a base64url-encoded string of `{"alg":"HS256","typ":"JWT"}`.
  - `<payload>` is a base64url-encoded string of fully-user-defined content.
  - `<signature>` is a signature based on `<header>.<payload>` and a user-provided key parameter.

For example, the value of `<header>` will be `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9` always and the value of `payload` can be `e30` if the payload is empty, `{}`. The `<signature>` part is changed by the shared value of `spark.org.apache.spark.ui.JWSFilter.param.key` between the server and client.
```
jshell> java.util.Base64.getUrlEncoder().encodeToString("{\"alg\":\"HS256\",\"typ\":\"JWT\"}".getBytes())
$2 ==> "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"

jshell> java.util.Base64.getUrlEncoder().encodeToString("{}".getBytes())
$3 ==> "e30="
```

### Why are the changes needed?

To provide a little better security on WebUI consistently including Spark Standalone Clusters.

For example,

**SETTING**
```
$ jshell
|  Welcome to JShell -- Version 17.0.12
|  For an introduction type: /help intro

jshell> java.util.Base64.getUrlEncoder().encodeToString("Visit https://spark.apache.org to download Apache Spark.".getBytes())
$1 ==> "VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4="
```

```
$ cat conf/spark-defaults.conf
spark.ui.filters org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.key VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=
```

**SPARK-SHELL**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ bin/spark-shell
```

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61313 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:27:23 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 472
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$2-3b39bee2</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS,
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61311 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 302 Found
< Date: Fri, 02 Aug 2024 01:27:01 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://localhost:4040/jobs/
< Content-Length: 0
<
* Connection #0 to host localhost left intact
```

**SPARK MASTER**

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:8080/json/
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61331 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:34:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 477
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/json/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$1-6c52101f</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:8080/json/

* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61329 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Fri, 02 Aug 2024 01:33:10 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: text/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 320
<
{
  "url" : "spark://M3-Max.local:7077",
  "workers" : [ ],
  "aliveworkers" : 0,
  "cores" : 0,
  "coresused" : 0,
  "memory" : 0,
  "memoryused" : 0,
  "resources" : [ ],
  "resourcesused" : [ ],
  "activeapps" : [ ],
  "completedapps" : [ ],
  "activedrivers" : [ ],
  "completeddrivers" : [ ],
  "status" : "ALIVE"
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new filter.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47575 from dongjoon-hyun/SPARK-49090.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
fusheng9399 pushed a commit to fusheng9399/spark that referenced this pull request Aug 6, 2024
### What changes were proposed in this pull request?

This PR aims to support `spark.master.rest.filters` configuration like the existing `spark.ui.filters` configuration.

Recently, Apache Spark starts to support `JWSFilter`. We can take advantage of `JWSFilter` to protect Spark Master REST API.
- apache#47575

### Why are the changes needed?

Like `Spark UI`, we had better provide the same capability to Apache Spark Master REST API .

For example, we can protect `JWSFilter` to `Spark Master REST API` like the following.

**MASTER REST API WITH JWSFilter**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ SPARK_NO_DAEMONIZE=1 \
SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.filters=org.apache.spark.ui.JWSFilter -Dspark.org.apache.spark.ui.JWSFilter.param.key=VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=" \
sbin/start-master.sh
```

**AUTHORIZATION FAILURE**
```
$ curl -v -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51705 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Sat, 03 Aug 2024 22:18:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 590
< Server: Jetty(11.0.21)
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/v1/submissions/clear</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.deploy.rest.StandaloneClearRequestServlet-7f171159</td></tr>
</table>
<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 11.0.21</a><hr/>

</body>
</html>
* Connection #0 to host localhost left intact
```

**SUCCESS**
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51697 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 03 Aug 2024 22:16:51 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 113
< Server: Jetty(11.0.21)
<
{
  "action" : "ClearResponse",
  "message" : "",
  "serverSparkVersion" : "4.0.0-SNAPSHOT",
  "success" : true
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new feature which is not loaded by default.

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47595 from dongjoon-hyun/SPARK-49103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
fusheng9399 pushed a commit to fusheng9399/spark that referenced this pull request Aug 6, 2024
…REST API and rename parameter to `secretKey`

### What changes were proposed in this pull request?

This PR aims the following.
- Document `JWSFilter` and its usage in `Spark UI` and `REST API`
    - `Spark UI` section of `Configuration` page
    - `Spark Security` page
    - `Spark Standalone` page
- Rename the parameter `key` to `secretKey` to redact it in Spark Driver UI and Spark Master UI.

### Why are the changes needed?

To apply recent new security features
- apache#47575
- apache#47595

### Does this PR introduce _any_ user-facing change?

No because this is a new feature of Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs and manual review.

- `spark-standalone.html`
![Screenshot 2024-08-03 at 22 40 53](https://github.com/user-attachments/assets/f1b95a01-c14b-4f14-96b6-3181afaf6f9f)

- `security.html`
![Screenshot 2024-08-03 at 22 39 00](https://github.com/user-attachments/assets/8413f6a3-47df-4d71-87ee-25ab32171c6c)
![Screenshot 2024-08-03 at 22 39 51](https://github.com/user-attachments/assets/01546724-d5b5-40d5-a980-236f9d13ae81)

- `configuration.html`
![Screenshot 2024-08-03 at 22 38 07](https://github.com/user-attachments/assets/c0845a7f-6ae1-4194-b98a-68d7442c9785)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47596 from dongjoon-hyun/SPARK-49104.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
This PR aims to support `JWSFilter`  which is a servlet filter that requires `JWS`, a cryptographically signed JSON Web Token, in the header via `spark.ui.filters` configuration.

- spark.ui.filters=org.apache.spark.ui.JWSFilter
- spark.org.apache.spark.ui.JWSFilter.param.key=YOUR-BASE64URL-ENCODED-KEY

To simply put, `JWSFilter` will check the following for all requests.
- The HTTP request should have `Authorization: Bearer <jws>` header.
  - `<jws>` is a string with three fields, `<header>.<payload>.<signature>`.
  - `<header>` is supposed to be a base64url-encoded string of `{"alg":"HS256","typ":"JWT"}`.
  - `<payload>` is a base64url-encoded string of fully-user-defined content.
  - `<signature>` is a signature based on `<header>.<payload>` and a user-provided key parameter.

For example, the value of `<header>` will be `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9` always and the value of `payload` can be `e30` if the payload is empty, `{}`. The `<signature>` part is changed by the shared value of `spark.org.apache.spark.ui.JWSFilter.param.key` between the server and client.
```
jshell> java.util.Base64.getUrlEncoder().encodeToString("{\"alg\":\"HS256\",\"typ\":\"JWT\"}".getBytes())
$2 ==> "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"

jshell> java.util.Base64.getUrlEncoder().encodeToString("{}".getBytes())
$3 ==> "e30="
```

To provide a little better security on WebUI consistently including Spark Standalone Clusters.

For example,

**SETTING**
```
$ jshell
|  Welcome to JShell -- Version 17.0.12
|  For an introduction type: /help intro

jshell> java.util.Base64.getUrlEncoder().encodeToString("Visit https://spark.apache.org to download Apache Spark.".getBytes())
$1 ==> "VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4="
```

```
$ cat conf/spark-defaults.conf
spark.ui.filters org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.key VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=
```

**SPARK-SHELL**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ bin/spark-shell
```

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61313 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:27:23 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 472
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$2-3b39bee2</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS,
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61311 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 302 Found
< Date: Fri, 02 Aug 2024 01:27:01 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://localhost:4040/jobs/
< Content-Length: 0
<
* Connection #0 to host localhost left intact
```

**SPARK MASTER**

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:8080/json/
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61331 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:34:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 477
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/json/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$1-6c52101f</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:8080/json/

* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61329 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Fri, 02 Aug 2024 01:33:10 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: text/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 320
<
{
  "url" : "spark://M3-Max.local:7077",
  "workers" : [ ],
  "aliveworkers" : 0,
  "cores" : 0,
  "coresused" : 0,
  "memory" : 0,
  "memoryused" : 0,
  "resources" : [ ],
  "resourcesused" : [ ],
  "activeapps" : [ ],
  "completedapps" : [ ],
  "activedrivers" : [ ],
  "completeddrivers" : [ ],
  "status" : "ALIVE"
* Connection #0 to host localhost left intact
}%
```

No, this is a new filter.

Pass the CIs.

No.

Closes apache#47575 from dongjoon-hyun/SPARK-49090.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
This PR aims to support `spark.master.rest.filters` configuration like the existing `spark.ui.filters` configuration.

Recently, Apache Spark starts to support `JWSFilter`. We can take advantage of `JWSFilter` to protect Spark Master REST API.
- apache#47575

Like `Spark UI`, we had better provide the same capability to Apache Spark Master REST API .

For example, we can protect `JWSFilter` to `Spark Master REST API` like the following.

**MASTER REST API WITH JWSFilter**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ SPARK_NO_DAEMONIZE=1 \
SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.filters=org.apache.spark.ui.JWSFilter -Dspark.org.apache.spark.ui.JWSFilter.param.key=VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=" \
sbin/start-master.sh
```

**AUTHORIZATION FAILURE**
```
$ curl -v -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51705 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Sat, 03 Aug 2024 22:18:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 590
< Server: Jetty(11.0.21)
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/v1/submissions/clear</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.deploy.rest.StandaloneClearRequestServlet-7f171159</td></tr>
</table>
<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 11.0.21</a><hr/>

</body>
</html>
* Connection #0 to host localhost left intact
```

**SUCCESS**
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51697 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 03 Aug 2024 22:16:51 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 113
< Server: Jetty(11.0.21)
<
{
  "action" : "ClearResponse",
  "message" : "",
  "serverSparkVersion" : "4.0.0-SNAPSHOT",
  "success" : true
* Connection #0 to host localhost left intact
}%
```

No, this is a new feature which is not loaded by default.

Pass the CIs with newly added test case.

No.

Closes apache#47595 from dongjoon-hyun/SPARK-49103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
…REST API and rename parameter to `secretKey`

This PR aims the following.
- Document `JWSFilter` and its usage in `Spark UI` and `REST API`
    - `Spark UI` section of `Configuration` page
    - `Spark Security` page
    - `Spark Standalone` page
- Rename the parameter `key` to `secretKey` to redact it in Spark Driver UI and Spark Master UI.

To apply recent new security features
- apache#47575
- apache#47595

No because this is a new feature of Apache Spark 4.0.0.

Pass the CIs and manual review.

- `spark-standalone.html`
![Screenshot 2024-08-03 at 22 40 53](https://github.com/user-attachments/assets/f1b95a01-c14b-4f14-96b6-3181afaf6f9f)

- `security.html`
![Screenshot 2024-08-03 at 22 39 00](https://github.com/user-attachments/assets/8413f6a3-47df-4d71-87ee-25ab32171c6c)
![Screenshot 2024-08-03 at 22 39 51](https://github.com/user-attachments/assets/01546724-d5b5-40d5-a980-236f9d13ae81)

- `configuration.html`
![Screenshot 2024-08-03 at 22 38 07](https://github.com/user-attachments/assets/c0845a7f-6ae1-4194-b98a-68d7442c9785)

No.

Closes apache#47596 from dongjoon-hyun/SPARK-49104.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

This PR aims to support `JWSFilter`  which is a servlet filter that requires `JWS`, a cryptographically signed JSON Web Token, in the header via `spark.ui.filters` configuration.

- spark.ui.filters=org.apache.spark.ui.JWSFilter
- spark.org.apache.spark.ui.JWSFilter.param.key=YOUR-BASE64URL-ENCODED-KEY

To simply put, `JWSFilter` will check the following for all requests.
- The HTTP request should have `Authorization: Bearer <jws>` header.
  - `<jws>` is a string with three fields, `<header>.<payload>.<signature>`.
  - `<header>` is supposed to be a base64url-encoded string of `{"alg":"HS256","typ":"JWT"}`.
  - `<payload>` is a base64url-encoded string of fully-user-defined content.
  - `<signature>` is a signature based on `<header>.<payload>` and a user-provided key parameter.

For example, the value of `<header>` will be `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9` always and the value of `payload` can be `e30` if the payload is empty, `{}`. The `<signature>` part is changed by the shared value of `spark.org.apache.spark.ui.JWSFilter.param.key` between the server and client.
```
jshell> java.util.Base64.getUrlEncoder().encodeToString("{\"alg\":\"HS256\",\"typ\":\"JWT\"}".getBytes())
$2 ==> "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"

jshell> java.util.Base64.getUrlEncoder().encodeToString("{}".getBytes())
$3 ==> "e30="
```

### Why are the changes needed?

To provide a little better security on WebUI consistently including Spark Standalone Clusters.

For example,

**SETTING**
```
$ jshell
|  Welcome to JShell -- Version 17.0.12
|  For an introduction type: /help intro

jshell> java.util.Base64.getUrlEncoder().encodeToString("Visit https://spark.apache.org to download Apache Spark.".getBytes())
$1 ==> "VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4="
```

```
$ cat conf/spark-defaults.conf
spark.ui.filters org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.key VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=
```

**SPARK-SHELL**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ bin/spark-shell
```

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61313 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:27:23 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 472
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$2-3b39bee2</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS,
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61311 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 302 Found
< Date: Fri, 02 Aug 2024 01:27:01 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://localhost:4040/jobs/
< Content-Length: 0
<
* Connection #0 to host localhost left intact
```

**SPARK MASTER**

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:8080/json/
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61331 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:34:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 477
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/json/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$1-6c52101f</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:8080/json/

* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61329 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Fri, 02 Aug 2024 01:33:10 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: text/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 320
<
{
  "url" : "spark://M3-Max.local:7077",
  "workers" : [ ],
  "aliveworkers" : 0,
  "cores" : 0,
  "coresused" : 0,
  "memory" : 0,
  "memoryused" : 0,
  "resources" : [ ],
  "resourcesused" : [ ],
  "activeapps" : [ ],
  "completedapps" : [ ],
  "activedrivers" : [ ],
  "completeddrivers" : [ ],
  "status" : "ALIVE"
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new filter.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47575 from dongjoon-hyun/SPARK-49090.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

This PR aims to support `spark.master.rest.filters` configuration like the existing `spark.ui.filters` configuration.

Recently, Apache Spark starts to support `JWSFilter`. We can take advantage of `JWSFilter` to protect Spark Master REST API.
- apache#47575

### Why are the changes needed?

Like `Spark UI`, we had better provide the same capability to Apache Spark Master REST API .

For example, we can protect `JWSFilter` to `Spark Master REST API` like the following.

**MASTER REST API WITH JWSFilter**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ SPARK_NO_DAEMONIZE=1 \
SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.filters=org.apache.spark.ui.JWSFilter -Dspark.org.apache.spark.ui.JWSFilter.param.key=VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=" \
sbin/start-master.sh
```

**AUTHORIZATION FAILURE**
```
$ curl -v -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51705 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Sat, 03 Aug 2024 22:18:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 590
< Server: Jetty(11.0.21)
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/v1/submissions/clear</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.deploy.rest.StandaloneClearRequestServlet-7f171159</td></tr>
</table>
<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 11.0.21</a><hr/>

</body>
</html>
* Connection #0 to host localhost left intact
```

**SUCCESS**
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51697 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 03 Aug 2024 22:16:51 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 113
< Server: Jetty(11.0.21)
<
{
  "action" : "ClearResponse",
  "message" : "",
  "serverSparkVersion" : "4.0.0-SNAPSHOT",
  "success" : true
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new feature which is not loaded by default.

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47595 from dongjoon-hyun/SPARK-49103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…REST API and rename parameter to `secretKey`

### What changes were proposed in this pull request?

This PR aims the following.
- Document `JWSFilter` and its usage in `Spark UI` and `REST API`
    - `Spark UI` section of `Configuration` page
    - `Spark Security` page
    - `Spark Standalone` page
- Rename the parameter `key` to `secretKey` to redact it in Spark Driver UI and Spark Master UI.

### Why are the changes needed?

To apply recent new security features
- apache#47575
- apache#47595

### Does this PR introduce _any_ user-facing change?

No because this is a new feature of Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs and manual review.

- `spark-standalone.html`
![Screenshot 2024-08-03 at 22 40 53](https://github.com/user-attachments/assets/f1b95a01-c14b-4f14-96b6-3181afaf6f9f)

- `security.html`
![Screenshot 2024-08-03 at 22 39 00](https://github.com/user-attachments/assets/8413f6a3-47df-4d71-87ee-25ab32171c6c)
![Screenshot 2024-08-03 at 22 39 51](https://github.com/user-attachments/assets/01546724-d5b5-40d5-a980-236f9d13ae81)

- `configuration.html`
![Screenshot 2024-08-03 at 22 38 07](https://github.com/user-attachments/assets/c0845a7f-6ae1-4194-b98a-68d7442c9785)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47596 from dongjoon-hyun/SPARK-49104.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

This PR aims to support `JWSFilter`  which is a servlet filter that requires `JWS`, a cryptographically signed JSON Web Token, in the header via `spark.ui.filters` configuration.

- spark.ui.filters=org.apache.spark.ui.JWSFilter
- spark.org.apache.spark.ui.JWSFilter.param.key=YOUR-BASE64URL-ENCODED-KEY

To simply put, `JWSFilter` will check the following for all requests.
- The HTTP request should have `Authorization: Bearer <jws>` header.
  - `<jws>` is a string with three fields, `<header>.<payload>.<signature>`.
  - `<header>` is supposed to be a base64url-encoded string of `{"alg":"HS256","typ":"JWT"}`.
  - `<payload>` is a base64url-encoded string of fully-user-defined content.
  - `<signature>` is a signature based on `<header>.<payload>` and a user-provided key parameter.

For example, the value of `<header>` will be `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9` always and the value of `payload` can be `e30` if the payload is empty, `{}`. The `<signature>` part is changed by the shared value of `spark.org.apache.spark.ui.JWSFilter.param.key` between the server and client.
```
jshell> java.util.Base64.getUrlEncoder().encodeToString("{\"alg\":\"HS256\",\"typ\":\"JWT\"}".getBytes())
$2 ==> "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9"

jshell> java.util.Base64.getUrlEncoder().encodeToString("{}".getBytes())
$3 ==> "e30="
```

### Why are the changes needed?

To provide a little better security on WebUI consistently including Spark Standalone Clusters.

For example,

**SETTING**
```
$ jshell
|  Welcome to JShell -- Version 17.0.12
|  For an introduction type: /help intro

jshell> java.util.Base64.getUrlEncoder().encodeToString("Visit https://spark.apache.org to download Apache Spark.".getBytes())
$1 ==> "VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4="
```

```
$ cat conf/spark-defaults.conf
spark.ui.filters org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.key VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=
```

**SPARK-SHELL**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ bin/spark-shell
```

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61313 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:27:23 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 472
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$2-3b39bee2</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS,
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:4040/
* Host localhost:4040 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:4040...
* connect to ::1 port 4040 from ::1 port 61311 failed: Connection refused
*   Trying 127.0.0.1:4040...
* Connected to localhost (127.0.0.1) port 4040
> GET / HTTP/1.1
> Host: localhost:4040
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 302 Found
< Date: Fri, 02 Aug 2024 01:27:01 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://localhost:4040/jobs/
< Content-Length: 0
<
* Connection #0 to host localhost left intact
```

**SPARK MASTER**

Without JWS (ErrorCode: 403 Forbidden)
```
$ curl -v http://localhost:8080/json/
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61331 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Fri, 02 Aug 2024 01:34:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 477
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/json/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.ui.JettyUtils$$anon$1-6c52101f</td></tr>
</table>

</body>
</html>
* Connection #0 to host localhost left intact
```

With JWS
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" http://localhost:8080/json/

* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 61329 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> GET /json/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Fri, 02 Aug 2024 01:33:10 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: text/json;charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 320
<
{
  "url" : "spark://M3-Max.local:7077",
  "workers" : [ ],
  "aliveworkers" : 0,
  "cores" : 0,
  "coresused" : 0,
  "memory" : 0,
  "memoryused" : 0,
  "resources" : [ ],
  "resourcesused" : [ ],
  "activeapps" : [ ],
  "completedapps" : [ ],
  "activedrivers" : [ ],
  "completeddrivers" : [ ],
  "status" : "ALIVE"
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new filter.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47575 from dongjoon-hyun/SPARK-49090.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

This PR aims to support `spark.master.rest.filters` configuration like the existing `spark.ui.filters` configuration.

Recently, Apache Spark starts to support `JWSFilter`. We can take advantage of `JWSFilter` to protect Spark Master REST API.
- apache#47575

### Why are the changes needed?

Like `Spark UI`, we had better provide the same capability to Apache Spark Master REST API .

For example, we can protect `JWSFilter` to `Spark Master REST API` like the following.

**MASTER REST API WITH JWSFilter**
```
$ build/sbt package
$ cp jjwt-impl-0.12.6.jar assembly/target/scala-2.13/jars
$ cp jjwt-jackson-0.12.6.jar assembly/target/scala-2.13/jars
$ SPARK_NO_DAEMONIZE=1 \
SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.filters=org.apache.spark.ui.JWSFilter -Dspark.org.apache.spark.ui.JWSFilter.param.key=VmlzaXQgaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnIHRvIGRvd25sb2FkIEFwYWNoZSBTcGFyay4=" \
sbin/start-master.sh
```

**AUTHORIZATION FAILURE**
```
$ curl -v -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51705 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 403 Forbidden
< Date: Sat, 03 Aug 2024 22:18:03 GMT
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 590
< Server: Jetty(11.0.21)
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Authorization header is missing.</title>
</head>
<body><h2>HTTP ERROR 403 Authorization header is missing.</h2>
<table>
<tr><th>URI:</th><td>/v1/submissions/clear</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Authorization header is missing.</td></tr>
<tr><th>SERVLET:</th><td>org.apache.spark.deploy.rest.StandaloneClearRequestServlet-7f171159</td></tr>
</table>
<hr/><a href="https://eclipse.org/jetty">Powered by Jetty:// 11.0.21</a><hr/>

</body>
</html>
* Connection #0 to host localhost left intact
```

**SUCCESS**
```
$ curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw" -XPOST http://localhost:6066/v1/submissions/clear
* Host localhost:6066 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6066...
* connect to ::1 port 6066 from ::1 port 51697 failed: Connection refused
*   Trying 127.0.0.1:6066...
* Connected to localhost (127.0.0.1) port 6066
> POST /v1/submissions/clear HTTP/1.1
> Host: localhost:6066
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.4EKWlOkobpaAPR0J4BE0cPQ-ZD1tRQKLZp1vtE7upPw
>
* Request completely sent off
< HTTP/1.1 200 OK
< Date: Sat, 03 Aug 2024 22:16:51 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 113
< Server: Jetty(11.0.21)
<
{
  "action" : "ClearResponse",
  "message" : "",
  "serverSparkVersion" : "4.0.0-SNAPSHOT",
  "success" : true
* Connection #0 to host localhost left intact
}%
```

### Does this PR introduce _any_ user-facing change?

No, this is a new feature which is not loaded by default.

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47595 from dongjoon-hyun/SPARK-49103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…REST API and rename parameter to `secretKey`

### What changes were proposed in this pull request?

This PR aims the following.
- Document `JWSFilter` and its usage in `Spark UI` and `REST API`
    - `Spark UI` section of `Configuration` page
    - `Spark Security` page
    - `Spark Standalone` page
- Rename the parameter `key` to `secretKey` to redact it in Spark Driver UI and Spark Master UI.

### Why are the changes needed?

To apply recent new security features
- apache#47575
- apache#47595

### Does this PR introduce _any_ user-facing change?

No because this is a new feature of Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs and manual review.

- `spark-standalone.html`
![Screenshot 2024-08-03 at 22 40 53](https://github.com/user-attachments/assets/f1b95a01-c14b-4f14-96b6-3181afaf6f9f)

- `security.html`
![Screenshot 2024-08-03 at 22 39 00](https://github.com/user-attachments/assets/8413f6a3-47df-4d71-87ee-25ab32171c6c)
![Screenshot 2024-08-03 at 22 39 51](https://github.com/user-attachments/assets/01546724-d5b5-40d5-a980-236f9d13ae81)

- `configuration.html`
![Screenshot 2024-08-03 at 22 38 07](https://github.com/user-attachments/assets/c0845a7f-6ae1-4194-b98a-68d7442c9785)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47596 from dongjoon-hyun/SPARK-49104.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants