-
Notifications
You must be signed in to change notification settings - Fork 24
Add authlib and custom webserver config to allow using google auth in flowetl #7159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Google OAuth support via a new Airflow webserver config module, exposes that config to the Docker image, adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User as User (Browser)
participant Web as Airflow Webserver
participant OAuthLib as Authlib (OAuth client)
participant Google as Google OAuth
rect rgba(243,248,255,0.9)
note left of Web: Startup — import `flowetl/webserver_config.py`\n(env-driven AUTH_TYPE and OAUTH_PROVIDERS)
Web->>Web: Read env vars and configure OAuth provider(s)
end
User->>Web: GET /login
alt AIRFLOW__WEBSERVER__AUTH_TYPE == "google"
Web->>OAuthLib: Initiate redirect to Google (auth request)
OAuthLib->>Google: Authorization request
Google-->>User: Consent page
User-->>Google: Approve
Google-->>OAuthLib: Authorization code
OAuthLib->>Google: Exchange code for token & userinfo
Google-->>OAuthLib: Access token & userinfo
OAuthLib-->>Web: Provide userinfo
Web->>Web: Create/register user if enabled
Web-->>User: Authenticated session
else Fallback auth
Web-->>User: Default login flow
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
flowetl/flowetl/setup.py (1)
38-38: Consider specifying a minimum version for authlib.The authlib library should have a minimum version constraint to ensure security fixes are included. Version 1.3.1 (released June 2024) contains important security fixes for JOSE/JWK handling, specifically preventing OctKey from incorrectly importing PEM/SSH formatted asymmetric keys as symmetric keys.
Apply this diff to add a minimum version constraint:
- "authlib" + "authlib>=1.3.1"Based on learnings
flowetl.Dockerfile (1)
51-51: Copy webserver_config.py into a fixed location and update ENV
Replace the current ENV with:COPY --chown=airflow ./flowetl/webserver_config.py /opt/airflow/webserver_config.py ENV AIRFLOW__WEBSERVER__CONFIG_FILE=/opt/airflow/webserver_config.pyThe Docker ignore already un-ignores
flowetl/webserver_config.py(line 20), so it’ll be included in the build context.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.md(1 hunks)flowetl.Dockerfile(1 hunks)flowetl.Dockerfile.dockerignore(1 hunks)flowetl/dev-requirements.txt(2 hunks)flowetl/flowetl/setup.py(1 hunks)flowetl/requirements.txt(2 hunks)flowetl/webserver_config.py(1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[grammar] ~10-~10: The singular proper name ‘Google’ must be used with a third-person or a past tense verb.
Context: ... [Unreleased] ### Added - Added Google auth to flowetl. ### Changed ### Fixed ##...
(HE_VERB_AGR)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: CodeQL-Build
- GitHub Check: Summary
- GitHub Check: run_build_pipeline
🔇 Additional comments (4)
flowetl/dev-requirements.txt (1)
307-313: LGTM!The authlib dependency is correctly added with version 1.3.1 and integrity hashes. This version includes the security fixes for JOSE/JWK handling mentioned in the learnings.
flowetl.Dockerfile.dockerignore (1)
20-20: Verify that flowetl/webserver_config.py contains no hardcoded secrets
The!flowetl/webserver_config.pyentry whitelists this file in the image—ensure it only references environment variables (no literal OAuth credentials or keys). To scan for hardcoded values, run:#!/bin/bash # PCRE2-enabled search for hardcoded secrets rg --pcre2 -n "client_secret\s*=\s*['\"](?!os\.environ|os\.getenv)" flowetl/webserver_config.py rg --pcre2 -n "client_id\s*=\s*['\"][A-Za-z0-9-]{20,}" flowetl/webserver_config.py rg --pcre2 -n "(token|key|secret|password)\s*=\s*['\"][A-Za-z0-9+/=]{16,}" flowetl/webserver_config.pyIf
--pcre2isn’t available, use:rg -n "client_secret\s*=\s*['\"]" flowetl/webserver_config.py | rg -v "os\.environ|os\.getenv" rg -n "client_id\s*=\s*['\"][A-Za-z0-9-]{20,}" flowetl/webserver_config.py rg -n "(token|key|secret|password)\s*=\s*['\"][A-Za-z0-9+/=]{16,}" flowetl/webserver_config.pyNo matches should be found.
flowetl/webserver_config.py (1)
5-6: Boolean parsing is fine; document the toggleENABLE_PROXY_FIX parsing is correct. Add a brief note in deployment docs to set AIRFLOW__WEBSERVER__AUTH_TYPE=google and, if behind a proxy, AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX=true.
flowetl/requirements.txt (1)
284-289: Authlib and cryptography pins are consistentauthlib==1.3.1 and cryptography==42.0.8 appear only in requirements and dev-requirements with no conflicting versions; no ecosystem conflicts detected.
| ## [Unreleased] | ||
|
|
||
| ### Added | ||
| - Added Google auth to flowetl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny wording polish
Recommend “Added Google authentication to FlowETL.” for clarity and casing.
-- Added Google auth to flowetl.
+- Added Google authentication to FlowETL.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Added Google auth to flowetl. | |
| Added Google authentication to FlowETL. |
🧰 Tools
🪛 LanguageTool
[grammar] ~10-~10: The singular proper name ‘Google’ must be used with a third-person or a past tense verb.
Context: ... [Unreleased] ### Added - Added Google auth to flowetl. ### Changed ### Fixed ##...
(HE_VERB_AGR)
🤖 Prompt for AI Agents
In CHANGELOG.md around line 10, the entry "Added Google auth to flowetl." uses
informal wording and inconsistent casing; update it to read "Added Google
authentication to FlowETL." to use the full word "authentication" and capitalize
the project name consistently.
flowetl/webserver_config.py
Outdated
| if use_google_auth: | ||
| AUTH_TYPE = AUTH_OAUTH | ||
| AUTH_USER_REGISTRATION = True | ||
| AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer") | ||
| AUTH_ROLES_SYNC_AT_LOGIN = False | ||
|
|
||
|
|
||
| OAUTH_PROVIDERS = [ | ||
| { | ||
| "name": "google", | ||
| "icon": "fa-google", | ||
| "token_key": "access_token", | ||
| "remote_app": { | ||
| "server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration", | ||
| "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"], | ||
| "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"], | ||
| "api_base_url": "https://www.googleapis.com/oauth2/v2/", | ||
| "authorize_url": "https://accounts.google.com/o/oauth2/v2/auth", | ||
| "access_token_url": "https://oauth2.googleapis.com/token", | ||
| "client_kwargs": { | ||
| "scope": "openid email profile", | ||
| "prompt": "consent", | ||
| "access_type": "offline", | ||
| }, | ||
| }, | ||
| }, | ||
| ] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Harden Google OAuth: fail fast with clear errors and restrict who can sign in
- Accessing client_id/secret via os.environ[...] raises a KeyError at import with a vague traceback. Raise a clear error instead.
- Consider restricting login to your Google Workspace domain (e.g., check the ‘hd’ claim in the ID token or equivalent) to prevent any Google account from gaining access, even with a low-privilege default role.
Suggested improvement:
- "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
- "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
+ # Fail fast with a clear message if env vars are missing
+ "client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw(
+ RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(),
+ "client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw(
+ RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(),Domain restriction (pick one approach that fits your policy/tooling):
- Add an auth check that validates the ‘hd’ claim in the ID token equals your domain.
- Or enforce allowed domains via FAB/OAuth configuration if available in your stack.
Please confirm which restriction you prefer and I can draft the exact snippet.
🏁 Script executed:
#!/bin/bash
rg -n "use_google_auth" -C3 flowetl/webserver_config.pyLength of output: 461
Harden Google OAuth: fail fast on missing creds and restrict sign-in to your Workspace domain
- Replace os.environ[…] with os.getenv plus explicit RuntimeError to surface missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID/SECRET at startup:
- "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
- "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
+ "client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw(
+ RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(),
+ "client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw(
+ RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(),- Enforce your Google Workspace domain by validating the ID token’s “hd” claim equals your-domain.com (or configure FAB’s allowed_domains if supported).
Please confirm your domain or preferred enforcement approach.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if use_google_auth: | |
| AUTH_TYPE = AUTH_OAUTH | |
| AUTH_USER_REGISTRATION = True | |
| AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer") | |
| AUTH_ROLES_SYNC_AT_LOGIN = False | |
| OAUTH_PROVIDERS = [ | |
| { | |
| "name": "google", | |
| "icon": "fa-google", | |
| "token_key": "access_token", | |
| "remote_app": { | |
| "server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration", | |
| "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"], | |
| "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"], | |
| "api_base_url": "https://www.googleapis.com/oauth2/v2/", | |
| "authorize_url": "https://accounts.google.com/o/oauth2/v2/auth", | |
| "access_token_url": "https://oauth2.googleapis.com/token", | |
| "client_kwargs": { | |
| "scope": "openid email profile", | |
| "prompt": "consent", | |
| "access_type": "offline", | |
| }, | |
| }, | |
| }, | |
| ] | |
| if use_google_auth: | |
| AUTH_TYPE = AUTH_OAUTH | |
| AUTH_USER_REGISTRATION = True | |
| AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer") | |
| AUTH_ROLES_SYNC_AT_LOGIN = False | |
| OAUTH_PROVIDERS = [ | |
| { | |
| "name": "google", | |
| "icon": "fa-google", | |
| "token_key": "access_token", | |
| "remote_app": { | |
| "server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration", | |
| "client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw( | |
| RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(), | |
| "client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw( | |
| RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(), | |
| "api_base_url": "https://www.googleapis.com/oauth2/v2/", | |
| "authorize_url": "https://accounts.google.com/o/oauth2/v2/auth", | |
| "access_token_url": "https://oauth2.googleapis.com/token", | |
| "client_kwargs": { | |
| "scope": "openid email profile", | |
| "prompt": "consent", | |
| "access_type": "offline", | |
| }, | |
| }, | |
| }, | |
| ] |
🤖 Prompt for AI Agents
In flowetl/webserver_config.py around lines 8 to 34, replace the direct
os.environ[...] calls for AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID and _SECRET
with os.getenv and fail fast by raising a clear RuntimeError if either is
missing (e.g., read from os.getenv and if None raise RuntimeError with a message
naming the missing var); additionally enforce Google Workspace domain
restriction by validating the OpenID Connect ID token’s "hd" claim equals
your-domain (or alternatively set FAB's allowed_domains if preferred) — read the
allowed domain from an env var like
AIRFLOW__WEBSERVER__OAUTH_GOOGLE_ALLOWED_DOMAIN and implement the hd check after
token verification (or wire the allowed_domains config) so only users from that
domain can sign in.
FlowAuth
|
||||||||||||||||||||||||||||
| Project |
FlowAuth
|
| Branch Review |
google-auth
|
| Run status |
|
| Run duration | 00m 48s |
| Commit |
|
| Committer | Jonathan Gray |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
4
|
| View all changes introduced in this branch ↗︎ | |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7159 +/- ##
=======================================
Coverage 92.08% 92.08%
=======================================
Files 277 277
Lines 10778 10778
Branches 697 697
=======================================
Hits 9925 9925
Misses 700 700
Partials 153 153 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
New Features
Documentation
Chores