fix(embedded): Hide sensitive payload data from guest users #25878

jfrag1 · 2023-11-06T17:30:07Z

SUMMARY

Currently, guest users can get access to more data than they need to when viewing an embedded dashboard. The api/v1/dashboard/{id} and api/v1/dashboard/{id}/datasets endpoints would return owners' first and last names. The latter would return information about the underlying database(s) the datasets belong to, which could be useful for an attacker. None of this data is necessary for the embedded dashboard to render.

I recognize this PR does break typical API patterns by stripping out certain data depending on the type of user requesting, which is pretty ugly. However, this approach avoids making breaking changes for regular API use, while making the embedded experience more secure.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue: Missing Configuration Options to Improve Security for Embedded Dashboards #22960
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

john-bodley · 2023-11-07T17:53:04Z

superset/dashboards/api.py

@@ -342,6 +342,11 @@ def get(
              $ref: '#/components/responses/404'
        """
        result = self.dashboard_get_response_schema.dump(dash)
+        if security_manager.is_guest_user():


@jfrag1 we should adhere to the "shift left" mentality and filter out inaccessible fields at the DAO layer as opposed to the API. This ensures we adhere to the DRY principle avoiding the need to repeat (and/or miss) said logic.

@john-bodley Thanks for reviewing, but I'm not sure how we're supposed to properly filter out certain fields of a record in the DAO layer. As far as I'm aware the security logic in the DAO layer is only "row-level", protecting entire objects, there's no column-level protections. Are you suggesting in the appropriate DAO method do something like

dashboard = db.session.query(... if is_guest: dashboard.owners = [] dashboard.changed_by = None

I don't like this since it introduces the possibility of these changes accidentally being committed, but please let me know if you had something else in mind or if this pattern does exist currently but I'm just not aware of it.

@betodealmeida @dpgaspar @michael-s-molina @villebro what are your thoughts on where sanitization of the data response should occur? Note this doesn't seem to be API specific and thus I'm not sure we should explicitly be defining said logic there.

Implementing this at the API layer is problematic because there's no single point of verification and also because you will query data that you don't have access to and filter it out later, wasting resources (imagine a table with many columns and you only have access to one column). This will also be true for the suggested code below:

dashboard = db.session.query(... if is_guest: dashboard.owners = [] dashboard.changed_by = None

Considering our current architecture, where we offer a RLS (Row Level Security) solution instead of depending on the database layer for these types of controls, the correct way of implementing this is to augment our security model to also support Column Level Security (CLS) or even Object Level Security (OLS).

I don't like this since it introduces the possibility of these changes accidentally being committed

Ideally, you won't query for what you don't have access to, eliminating this concern because the values won't be in the session.

Pinging @villebro for awareness given his current work on an improved security model.

I do appreciate the discussion here, but this really feels like premature optimization to me. The changes in this PR only affect two endpoints, and I don't predict that this kind of logic will needed in a widespread manner.

If I'm wrong and there is a larger need/desire for this kind of security measure, I definitely agree we should re-assess, but I personally don't see a need to come up with a more "perfect" solution at this point.

I do appreciate the discussion here, but this really feels like premature optimization to me. The changes in this PR only affect two endpoints, and I don't predict that this kind of logic will needed in a widespread manner.

I thought that by referencing the concept of Object Level Security (OLS) it would be more clear why this is not a premature optimization and why we will have the same need in a widespread manner.

I'd argue the API layer is where that kind of security check belongs

If you research a little bit about CLS you'll see that many of these security checks are implemented at the database layer, even further from the API layer.

I agree it's risky to implement this fix in a single interface, as it gives no guarantees that this information won't be exposed via some other interface, either pre-existing, or one that's added later. Mind you, this is not the only security concern regarding guest tokens - the issue of exposing granular information about the underlying query has been raised multiple times in the community, and is also something I feel we need to tackle soon.

Could we setup a meeting to discuss viable long-term solutions to these issues? I propose meeting to agree on the the following:

A long term vision for fixing this and other similar security concerns. I agree with @michael-s-molina that baking this into the security work I've started sounds like a good idea. I can share what I have so far and my thoughts of how this can be taken forward.

Agreeing on a quick fix to unblock heavy users of guest tokens to bridge the gap between now and when we have a proper security mechanism in place for handling this and other similar security issues.

I thought that by referencing the concept of Object Level Security (OLS) it would be more clear why this is not a premature optimization and why we will have the same need in a widespread manner.

This is referring to a security feature similar to row-level security where Superset could apply security filters to queries against connected databases, while the security issue at hand in this PR has to do with data from the Superset metadata DB.

If you research a little bit about CLS you'll see that many of these security checks are implemented at the database layer, even further from the API layer.

My understanding is that CLS is a security feature commonly implemented in data warehouses/BI tools, not a guide for how a server should implement security checks for data it owns

I think it's a great idea to meet on long term solutions. If we were to move the logic for this endpoint into the schema (I left one more suggestion that would remove this from the api altogether) should we move forward with this for now as @villebro suggested?

I think it's a great idea to meet on long term solutions. If we were to move the logic for this endpoint into the schema (I left one more suggestion that would remove this from the api altogether) should we move forward with this for now as @villebro suggested?

@eschutho I also think moving this into the Marshmallow schema is probably the best solution for now.

eschutho

One more suggestion.

eschutho · 2023-11-15T04:18:14Z

superset/dashboards/api.py

+        schema = (
+            self.dashboard_get_response_schema_guest
+            if security_manager.is_guest_user()
+            else self.dashboard_get_response_schema


@jfrag1 I think we can even move the schema logic into Marshmallow dynamically with either a type function or a @marshmallow.post_dump decorator. That way the logic isn't in the api layer and you won't need two schemas.

villebro

@jfrag1 I would prefer it if we remove the owners property all together if guest tokens aren't supposed to know who the owners are. Or is there some other reason we want to keep the property there? Also, I feel we really should add tests for both cases here (hence "Request changes").

villebro · 2023-11-16T05:34:51Z

superset/dashboards/schemas.py

+    @post_dump()
+    def post_dump(self, serialized: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
+        if security_manager.is_guest_user():
+            serialized["owners"] = []


I would prefer to fully remove the property owners, as replacing it with an empty list is IMO conveying incorrect info

@villebro I agree. The only caveat would be if the owners field is required.

@john-bodley ideally I think we should relax that requirement if we want to make this data conditional on the permissions of the logged in user. Based on the proposed changes, I think it should be as follows:

regular user: all relevant owner info is conveyed (current behavior)

guest token: no owner property is present in the response

@villebro I initially opted to keep the owners property in the payload because the dashboard frontend expects it to always be there, and fails to render if it's not.

I found a way to exclude it from the response payload and have the client tack the property on after the dashboard is requested, which allows the embedded dashboard to render successfully.

villebro · 2023-11-16T05:34:59Z

superset/dashboards/schemas.py

+    @post_dump()
+    def post_dump(self, serialized: dict[str, Any], **kwargs: Any) -> dict[str, Any]:
+        if security_manager.is_guest_user():
+            serialized["owners"] = []


jfrag1 · 2023-12-04T18:02:49Z

@villebro Are you able to take another look at this?

villebro

LGTM, thanks for all the iterations @jfrag1 !

(cherry picked from commit 386d4e0)

…5878) (cherry picked from commit 386d4e0)

…5878)

Hide sensitive payload data from guest users

106f9f2

pull-request-size bot added the size/S label Nov 6, 2023

john-bodley reviewed Nov 7, 2023

View reviewed changes

Use marshmallow schemas instead

0061036

pull-request-size bot added size/M and removed size/S labels Nov 13, 2023

Fix lint

ebcd352

eschutho reviewed Nov 15, 2023

View reviewed changes

Refactor to do check in schema

0c79e14

pull-request-size bot added size/S and removed size/M labels Nov 16, 2023

Cleanup

0bf740f

jfrag1 closed this Nov 16, 2023

jfrag1 reopened this Nov 16, 2023

villebro requested changes Nov 16, 2023

View reviewed changes

jfrag1 added 2 commits November 16, 2023 22:37

Remove owners property from payload

bd1572f

Add tests

56f92b6

pull-request-size bot added size/M and removed size/S labels Nov 17, 2023

jfrag1 requested a review from villebro November 17, 2023 07:24

villebro approved these changes Dec 4, 2023

View reviewed changes

jfrag1 merged commit 386d4e0 into apache:master Dec 4, 2023
29 checks passed

jfrag1 deleted the jack/embedded-security-strip-out-unneeded-data branch December 4, 2023 22:53

michael-s-molina added v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch v3.1 Label added by the release manager to track PRs to be included in the 3.1 branch labels Dec 6, 2023

michael-s-molina pushed a commit that referenced this pull request Dec 8, 2023

fix(embedded): Hide sensitive payload data from guest users (#25878)

b9e8cc9

(cherry picked from commit 386d4e0)

michael-s-molina pushed a commit that referenced this pull request Dec 8, 2023

fix(embedded): Hide sensitive payload data from guest users (#25878)

0925d75

(cherry picked from commit 386d4e0)

jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Dec 8, 2023

fix(embedded): Hide sensitive payload data from guest users (apache#2…

e674631

…5878) (cherry picked from commit 386d4e0)

jinghua-qa added the preset:2023.49 label Dec 8, 2023

sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Dec 11, 2023

fix(embedded): Hide sensitive payload data from guest users (apache#2…

37b1c4a

…5878) (cherry picked from commit 386d4e0)

josedev-union pushed a commit to Ortege-xyz/studio that referenced this pull request Jan 22, 2024

fix(embedded): Hide sensitive payload data from guest users (apache#2…

1989ecf

…5878) (cherry picked from commit 386d4e0)

mistercrunch added 🍒 3.0.3 🍒 3.0.4 🍒 3.1.0 🍒 3.1.1 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Mar 8, 2024

sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024

fix(embedded): Hide sensitive payload data from guest users (apache#2…

bf76d05

…5878)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedded): Hide sensitive payload data from guest users #25878

fix(embedded): Hide sensitive payload data from guest users #25878

jfrag1 commented Nov 6, 2023

john-bodley Nov 7, 2023

jfrag1 Nov 7, 2023

john-bodley Nov 8, 2023

michael-s-molina Nov 9, 2023

jfrag1 Nov 10, 2023

michael-s-molina Nov 10, 2023 •

edited

Loading

villebro Nov 10, 2023

jfrag1 Nov 13, 2023 •

edited

Loading

eschutho Nov 15, 2023

villebro Nov 15, 2023

eschutho left a comment

eschutho Nov 15, 2023

villebro left a comment

villebro Nov 16, 2023

john-bodley Nov 16, 2023

villebro Nov 16, 2023

jfrag1 Nov 17, 2023

villebro Nov 16, 2023

jfrag1 commented Dec 4, 2023

villebro left a comment

fix(embedded): Hide sensitive payload data from guest users #25878

fix(embedded): Hide sensitive payload data from guest users #25878

Conversation

jfrag1 commented Nov 6, 2023

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-s-molina Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfrag1 Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfrag1 commented Dec 4, 2023

villebro left a comment

Choose a reason for hiding this comment

michael-s-molina Nov 10, 2023 •

edited

Loading

jfrag1 Nov 13, 2023 •

edited

Loading