Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ability to share saved queries #24301

Closed
wants to merge 12 commits into from

Conversation

jfrag1
Copy link
Member

@jfrag1 jfrag1 commented Jun 6, 2023

SUMMARY

Currently, when a user tries to access a shared query link created by another user, they cannot access the query and only see a "This query couldn't be loaded" toast. This was caused by this PR, which updated the endpoint used to fetch the query. The new endpoint currently only allows users to fetch their own saved queries, while the old one allowed any.

My solution to this is to also allow looking up a saved query by its UUID for non-creators so that we can more securely enable sharing saved queries in the v1 API.

I also formally marked the old APIs as deprecated for removal in 4.0 and replaced its final usage with the v1 API

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.get",
log_to_statsd=False,
)
def get(self, id: str, **kwargs: Any) -> Response:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoint was previously FAB-generated

@@ -127,9 +132,32 @@ class SavedQueryViewApi(SavedQueryView): # pylint: disable=too-many-ancestors
show_columns = add_columns + ["id"]

@has_access_api
@expose("show/<pk>")
def show(self, pk: int) -> FlaskResponse:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this endpoint didn't actually exist, not included in include_route_methods

@jfrag1 jfrag1 closed this Jun 6, 2023
@jfrag1 jfrag1 reopened this Jun 6, 2023
@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

Merging #24301 (44bc7bf) into master (ede6acd) will increase coverage by 2.02%.
The diff coverage is 92.59%.

❗ Current head 44bc7bf differs from pull request most recent head e4731dd. Consider uploading reports for the commit e4731dd to get more accurate results

@@            Coverage Diff             @@
##           master   #24301      +/-   ##
==========================================
+ Coverage   66.53%   68.55%   +2.02%     
==========================================
  Files        1957     1944      -13     
  Lines       75623    75403     -220     
  Branches     8224     8217       -7     
==========================================
+ Hits        50312    51695    +1383     
+ Misses      23202    21597    -1605     
- Partials     2109     2111       +2     
Flag Coverage Δ
hive 53.58% <70.73%> (?)
javascript 54.91% <58.33%> (+0.18%) ⬆️
mysql 79.14% <86.99%> (+0.21%) ⬆️
postgres 79.21% <86.99%> (+0.21%) ⬆️
presto 53.50% <70.73%> (?)
python 83.09% <95.93%> (+3.96%) ⬆️
sqlite 77.75% <86.99%> (+0.21%) ⬆️
unit 53.93% <78.04%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...nd/plugins/legacy-preset-chart-nvd3/src/NVD3Vis.js 0.00% <0.00%> (ø)
...end/plugins/legacy-preset-chart-nvd3/src/preset.js 0.00% <ø> (ø)
...ins/legacy-preset-chart-nvd3/src/transformProps.js 7.40% <ø> (+0.74%) ⬆️
...s/plugin-chart-pivot-table/src/PivotTableChart.tsx 0.00% <0.00%> (ø)
superset-frontend/src/SqlLab/actions/sqlLab.js 66.66% <ø> (ø)
...d/src/SqlLab/components/ShareSqlLabQuery/index.tsx 81.48% <0.00%> (ø)
superset-frontend/src/SqlLab/fixtures.ts 100.00% <ø> (ø)
superset-frontend/src/SqlLab/types.ts 57.14% <ø> (ø)
...rset-frontend/src/components/ReportModal/index.tsx 79.03% <ø> (ø)
superset-frontend/src/dashboard/actions/hydrate.js 2.04% <ø> (ø)
... and 27 more

... and 93 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -49,6 +49,7 @@ export interface QueryEditor {
selectedText?: string;
queryLimit?: number;
description?: string;
uuid: string | null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use an optional string like the ones above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that uuid: string | null means the key will always be there, but may be null, and uuid?: string means that the key may or may not be there, but if it is it's a string. This way seemed more accurate to me, I copied from the type def for the remoteId here (which is the saved query numerical id), since it should be the same for UUID

@@ -216,10 +216,10 @@ function SavedQueryList({
};

const copyQueryLink = useCallback(
(id: number) => {
(uuid: string) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does TypeScript support a UUID type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that I'm aware of, didn't find anything on a quick google search either

savedQueryToastContent = `${
window.location.origin + window.location.pathname
}?savedQueryId=${remoteId}`;
}?savedQueryId=${uuid}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern that by switching to UUID existing URLs will break?

Copy link
Member Author

@jfrag1 jfrag1 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing URLs will still work just fine as the get saved query endpoint still accepts numerical id's

action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.get",
log_to_statsd=False,
)
def get(self, _id: str, **kwargs: Any) -> Response:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get(self, _id: str, **kwargs: Any) -> Response:
def get(self, id_: str, **kwargs: Any) -> Response:

The convention is to add (rather than prepend) an underscore to variable names which conflict with the global namespace.

Additionally can't the ID be an integer (rather than a string) or UUID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the string type captures both integer id's and UUIDs. The dao will then cast the type appropriately depending on which is passed

@@ -44,3 +47,20 @@ def bulk_delete(models: Optional[list[SavedQuery]], commit: bool = True) -> None
except SQLAlchemyError as ex:
db.session.rollback()
raise DAODeleteFailedError() from ex

@classmethod
def get_by_id(cls, _id: str) -> Optional[SavedQuery]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is the right approach, i.e., per this code block the UUID option is bypass the security check (by way of the filter on line #63). The correct approach would be to generate a permalink similar to charts and dashboards.

cc @michael-s-molina and @villebro as we've discussed this previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bypassing the security check if a UUID is passed is intentional, it allows users to access saved queries via the API only if the UUID link has been shared with them. There is precedence for this approach in the codebase, see the dashboard DAO: https://github.com/apache/superset/blob/master/superset/dashboards/dao.py#L47

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the permalink logic for this purpose and probably migrate the dashboard DAO too. Keeping multiple strategies for sharing resources only increases code complexity and opens the door for inconsistent security checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina or @john-bodley given the discussion in the all hands meetup, did we come to a decision on whether we can move towards UUIDs here and save the permalink feature for a moment in time state that you want to share?

Copy link
Member

@john-bodley john-bodley Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eschutho I’m not sure that the UUID seems correct both from a security and consistency perspective.

It feels like an internal representation that we’re now exposing to circumvent security, i.e., it feels akin to saying you enter a bar and have to present your ID (an integer or slug) for verification, however if you merely wave a UUID you bypass the bouncer. Granted it’s harder to forge a UUID as opposed to guess a valid ID number, but that still doesn’t seem secure.

In my opinion the precedence which was set previously (the example @jfrag1 linked to—which is the same one I was referring to in the town hall) should be thought of as the exception rather than the rule, i.e., that logic would also need to be updated in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @michael-s-molina's comment regarding making this a SIP. I understand that this feels somewhat like a blocker to what feels like a somewhat benign fix, but I think it's important that we flush out (and detangle) the key concepts that Michael and myself raised to set the right precedence, as opposed to a bandaid solution.

Historically UUIDs were obfuscated and solely used to make charts, dashboards, etc. globally unique across shared Superset instances, however their use within the API is more of an organic phenomenon and thus I think there's merit in pausing, taking a step back, and evaluating how we think about UUIDs, shareable links, etc. in a more holistic manner.

I agree that finding a matching UUID is near impossible, but using UUIDs for security (via cryptography) isn't valid. Note this is in no way a criticism of your work, but more with the state of the Superset code base. At Airbnb we support thousands of weekly active users and are constantly fixing issues related to problematic features and/or inconsistent/non-coherent/complex functionality (you can see the vast majority of my commits to this repo are actually fixes and/or mechanism to improve quality).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed with using permalink as long term solution for copy link or any other share with link feature as opposed to directly sharing with raw uuid. Even though uuid is impossible to guess, it's still security through obscurity(link can be leaked unintentionally).
And we should be consistent for authorization policy on a resource regardless of the format of their IDs.
But like @jfrag1 said this PR is to fix a breaking changes of the other PR. We will need to either revert that breaking PR or allow exception of this PR for now until formal SIP is made around how we represent/export resource application wide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that a SIP on this functionality is a good idea. We were planning to write one up, I think @john-bodley is also interested in writing it. If we think this PR is too premature at this point, what do you all think about us reverting this one line back to the old api so that we can fix the sharing capabilities that are currently broken, and then we can talk about a longer-term solution in an upcoming SIP? cc @hughhhh who is also helping on this.

Copy link
Member

@john-bodley john-bodley Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zephyring, @jfrag1, @eschutho, @michael-s-molina, et al. I'm supportive of "reverting" the original PR to get us back to square one and then with said SIP we can proceed. I suspect the old API endpoint is a legacy FAB REST API which was slated for deletion in 2.3.X (we're now using FAB 4.3.2+).

If the one line revert to the old API isn't via I think the alternative is something akin to what we do at Airbnb, where our PR description states:

Though this logic,

class SavedQueryFilter(BaseFilter):  # pylint: disable=too-few-public-methods
   def apply(self, query: BaseQuery, value: Any) -> BaseQuery:
       """
       Filter saved queries to only those created by current user.

       :returns: flask-sqlalchemy query
       """
       
       return query.filter(
           SavedQuery.created_by == g.user  # pylint: disable=comparison-with-callable
       )

which, restricts only the creator of the saved query to access it, has been around for years we suspect it only came into force when the /api/v1/saved_query/{pk} API was enabled in Q4 of 2022 which prevented people from sharing saved queries.

This PR simply relaxes the filter to allow anyone to access a saved query.

Thus I think an alternative to changing the API endpoint would be to simply replace these lines with return query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #24434 to revert the original change which broke this functionality

@michael-s-molina
Copy link
Member

Closing this as we reverted the original PR. @jfrag1 feel free to reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants