Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: search_path in RDS #24739

Merged
merged 5 commits into from
Jul 20, 2023
Merged

fix: search_path in RDS #24739

merged 5 commits into from
Jul 20, 2023

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Jul 19, 2023

SUMMARY

(See #24684 for more context.)

For security reasons, we need to be able to run SQL Lab queries in user-specified schemas. For example, in SQL Lab, when a user selects the schema foo from the dropdown and runs the following query:

SELECT * FROM some_table;

Since some_table is not fully qualified (it has no schema), we need to:

  1. Ensure the query runs in the foo schema, so that the right table is queried.
  2. Ensure that the user has permission to query foo.some_table (eg, by having access to the foo schema).

[Note that if (1) is not true, we can't do (2), since we don't know the schema of some_table.]

For Postgres, the way we currently do (1) is by setting options=-csearch_path=foo in the connection arguments every time the user runs a query. The problem is that the Postgres DB engine spec and the Postgres SQLAlchemy dialect are also used by RDS, which does not support options to be passed this way. In 3.0 RC1 it's not possible to query RDS because the search path is set for every query.

Fortunately there's a way that works for both Postgres and RDS (and hopefully, other databases using the Postgres client): by running the query set search_path=foo when the connection is first created.

This PR introduces the concept of "pre queries", ie, queries that are run as soon as a raw connection is created. This ensures ensure that the schema (and in the future, catalog) can be set per query correctly for Postgres and potentially other databases (the implementation is per DB-engine spec).

Note that if DML is enabled it's still possible for users to run set search_path=bar in SQL Lab. This could lead to security issues where:

  1. User selects foo as the schema in SQL Lab.
  2. User runs the query set search_path=bar; SELECT * FROM some_table.
  3. Superset's security manager will check if the user has access to some_table. Since the table name is not fully qualified and the dropdown is set to foo, Superset will check for access to foo.some_table, even though the table is actually bar.some_table.

This would allow a malicious user that has access to a given schema (say, foo) to query tables in any schema by simply choosing foo from the schema dropdown in SQL Lab and specifying the search path in the query.

To fix this problem (at least for Postgres), I extended the get_default_schema_for_query to check for set search_path and raise an exception if it's there:

Screenshot 2023-07-19 at 19-01-35 Superset

Ideally, instead of raising an exception we would parse the query to figure out the schema, but this is not trivial. Consider, eg, the following query:

set search_path = foo;
SELECT * FROM some_table;
set search_path = bar;
SELECT * FROM some_other_table;

Because of the complexity of figuring out the correct schema for table in a query with multiple statements, I opted to raise an exception for now, hoping that in the future we can improve this.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added new unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #24739 (78d889b) into master (9c6d535) will decrease coverage by 0.08%.
The diff coverage is 74.50%.

❗ Current head 78d889b differs from pull request most recent head d0ba6db. Consider uploading reports for the commit d0ba6db to get more accurate results

@@            Coverage Diff             @@
##           master   #24739      +/-   ##
==========================================
- Coverage   68.97%   68.89%   -0.08%     
==========================================
  Files        1901     1901              
  Lines       74008    73935      -73     
  Branches     8183     8183              
==========================================
- Hits        51047    50941     -106     
- Misses      20840    20873      +33     
  Partials     2121     2121              
Flag Coverage Δ
hive 54.17% <46.07%> (+0.05%) ⬆️
mysql 79.19% <67.64%> (-0.17%) ⬇️
postgres 79.29% <72.54%> (-0.16%) ⬇️
presto 54.07% <46.07%> (+0.05%) ⬆️
python 83.32% <74.50%> (-0.13%) ⬇️
sqlite 77.86% <67.64%> (-0.17%) ⬇️
unit 54.91% <51.96%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rontend/src/filters/components/Range/buildQuery.ts 0.00% <ø> (ø)
superset/examples/birth_names.py 69.69% <ø> (ø)
superset/viz.py 64.51% <ø> (-2.39%) ⬇️
superset/common/query_context_processor.py 85.13% <7.40%> (-4.83%) ⬇️
superset/charts/commands/warm_up_cache.py 98.14% <96.00%> (+0.64%) ⬆️
superset/charts/commands/export.py 94.11% <100.00%> (ø)
superset/db_engine_specs/base.py 91.02% <100.00%> (+0.03%) ⬆️
superset/db_engine_specs/postgres.py 97.22% <100.00%> (+0.05%) ⬆️
superset/models/core.py 90.22% <100.00%> (+0.08%) ⬆️
superset/row_level_security/schemas.py 100.00% <100.00%> (ø)
... and 1 more

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mdeshmu
Copy link
Contributor

mdeshmu commented Jul 19, 2023

type hint in get_prequeries throws an error with Python 3.9

File "/app/superset/db_engine_specs/postgres.py", line 257, in PostgresEngineSpec
catalog: str | None = None,
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

So I changed it to Optional[str] = None in my local code. Then I get the below errors:

File "/app/superset/models/core.py", line 525, in get_raw_connection
conn.execute(prequery)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 1086, in __getattr__
return getattr(self.dbapi_connection, key)
AttributeError: 'psycopg2.extensions.connection' object has no attribute 'execute'
File "/app/superset/models/core.py", line 524, in get_raw_connection
for prequery in self.db_engine_spec.get_prequeries(schema=schema):
TypeError: get_prequeries() missing 1 required positional argument: 'catalog'

Happy to test further once these are addressed :)

@john-bodley john-bodley added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 19, 2023
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 19, 2023
@betodealmeida
Copy link
Member Author

@mdeshmu sorry, the initial commit was just a proof-of-concept to see how much refactoring would be necessary.

I've updated this PR and tested it with Postgres and RDS. When running queries in SQL Lab the schema was set correctly in both based on the dropdown.

Can you give it another try now? I'm going to work on adding some tests in the meantime.

@betodealmeida betodealmeida changed the title [WIP] fix: search_path in RDS fix: search_path in RDS Jul 20, 2023
@mdeshmu
Copy link
Contributor

mdeshmu commented Jul 20, 2023

I have tested this and It does fix #24684
But just FYI, I was able to use set search_path in SQL Lab to a schema other than the one chosen in drop down and could successfully query data from the PostgreSQL table using its non-fully qualified table name.
image
image

@betodealmeida
Copy link
Member Author

@mdeshmu: But just FYI, I was able to use set search_path in SQL Lab to a schema other than the one chosen in drop down and could successfully query data from the PostgreSQL table using its non-fully qualified table name.

Right, I should've been more clear about this.

This is probably because by default you have access to the whole database; in that case, setting the path is fine. The check only happens when the user doesn't have full database access, when we're checking if the user has access to the schema or the table.

@mdeshmu
Copy link
Contributor

mdeshmu commented Jul 20, 2023

This is probably because by default you have access to the whole database; in that case, setting the path is fine. The check only happens when the user doesn't have full database access, when we're checking if the user has access to the schema or the table.

True

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
@mistercrunch mistercrunch deleted the fix_rds branch March 26, 2024 18:03
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants