Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: safer insert RLS #20323

Merged
merged 2 commits into from
Nov 9, 2023
Merged

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Jun 9, 2022

SUMMARY

We currently have a feature flag RLS_IN_SQLLAB that, when enabled, applies any relevant RLS rules to the SQL executed in SQL Lab. The current approach parses the SQL and extracts all tables being queried. If any of those tables have RLS rules associated with them the parse tree is modified inplace, appending the RLS predicate to the associated WHERE clause or creating a new one if needed.

@mistercrunch suggested an approach that is more bulletproof: replacing the table reference (eg, my_table) with a subquery that contains the RLS (SELECT * FROM my_table WHERE rls). This can be done conditionally on databases that support subqueries. I implemented that solution in this PR.

Note that this method is probably still not bulletproof. We're still using sqlparse to find table references, and this has been error-prone in the past. For example, we had problems where the identifier "RAW" was not detected properly because sqlparse considers it a reserved keyword, since it doesn't have separate lists of keywords for each dialect.

This PR and the original implementation are both conservative in cases like these, trying to figure out if a keyword is actually and identifier, and the unit tests cover such cases. But we should consider this feature experimental, and warn users that there are risks associated with using it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #20323 (556196c) into master (1edfd7e) will increase coverage by 0.01%.
The diff coverage is 93.93%.

❗ Current head 556196c differs from pull request most recent head 7f5c8a2. Consider uploading reports for the commit 7f5c8a2 to get more accurate results

@@            Coverage Diff             @@
##           master   #20323      +/-   ##
==========================================
+ Coverage   66.87%   66.88%   +0.01%     
==========================================
  Files        1847     1847              
  Lines       70561    70584      +23     
  Branches     7739     7739              
==========================================
+ Hits        47186    47209      +23     
  Misses      21374    21374              
  Partials     2001     2001              
Flag Coverage Δ
hive 52.49% <18.18%> (-0.04%) ⬇️
mysql 77.90% <39.39%> (-0.06%) ⬇️
postgres 77.97% <39.39%> (-0.06%) ⬇️
presto 52.39% <18.18%> (-0.04%) ⬇️
python 81.24% <93.93%> (+0.01%) ⬆️
sqlite 76.43% <39.39%> (-0.05%) ⬇️
unit 50.94% <90.90%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/models/helpers.py 38.19% <33.33%> (ø)
superset/connectors/sqla/utils.py 89.32% <100.00%> (ø)
superset/sql_lab.py 82.19% <100.00%> (+0.06%) ⬆️
superset/sql_parse.py 96.70% <100.00%> (+0.21%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great improvement, as practically all modern databases nowadays should be able to push down predicates into subqueries without any performance issues. I remember old versions of MSSQL or Oracle being horrible at these, but I believe even recent versions of these have improved this functionality.

I didn't try this out yet, but I really just have one concern regarding potential problems with aliases. If aliases are handled ok then I think this is good to go (I'm happy to do a proper test pass after we add a few test cases for queries with aliases).

tests/unit_tests/sql_parse_tests.py Show resolved Hide resolved
@betodealmeida
Copy link
Member Author

Friendly ping @dpgaspar @villebro on this. :)

@betodealmeida
Copy link
Member Author

Ping @dpgaspar @villebro.

@betodealmeida betodealmeida force-pushed the safer_insert_rls branch 2 times, most recently from 12af22c to d792a77 Compare September 26, 2023 22:23
@betodealmeida betodealmeida added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Nov 8, 2023
@betodealmeida betodealmeida merged commit 2bd6119 into apache:master Nov 9, 2023
29 checks passed
josedev-union pushed a commit to Ortege-xyz/studio that referenced this pull request Jan 22, 2024
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
@mistercrunch mistercrunch added the 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels label Mar 13, 2024
sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels merge-if-green If approved and tests are green, please go ahead and merge it for me preset-io size/L 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants