Skip to content

Conversation

@greenape
Copy link
Contributor

@greenape greenape commented Oct 14, 2025

Summary by CodeRabbit

  • New Features

    • Added Google sign‑in (OAuth) for the web interface, with optional proxy handling and user self‑registration using a default role.
  • Documentation

    • Changelog updated to document the new Google authentication.
  • Chores

    • Webserver now reads auth configuration from environment variables.
    • Added an OAuth dependency to runtime and dev requirements.
    • Updated ignore rules for the new config file and bumped CI cache keys.

@greenape greenape changed the title Add authlib and custom webserver config to allow using google auth in flowers Add authlib and custom webserver config to allow using google auth in flowetl Oct 14, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

Walkthrough

Adds Google OAuth support via a new Airflow webserver config module, exposes that config to the Docker image, adds authlib to runtime and dev dependencies, updates .dockerignore and the changelog, and bumps several CircleCI cache key versions.

Changes

Cohort / File(s) Summary of changes
Changelog
CHANGELOG.md
Added Unreleased → Added entry: “Added Google auth to flowetl.”
Webserver OAuth config
flowetl/webserver_config.py
New module that reads env vars to enable Google OAuth: sets AUTH_TYPE, AUTH_USER_REGISTRATION, AUTH_USER_REGISTRATION_ROLE, AUTH_ROLES_SYNC_AT_LOGIN, ENABLE_PROXY_FIX, and OAUTH_PROVIDERS (Google endpoints, client ID/secret from env).
Docker integration
flowetl.Dockerfile, flowetl.Dockerfile.dockerignore
flowetl.Dockerfile: added ENV AIRFLOW__WEBSERVER__CONFIG_FILE=/${SOURCE_TREE}/flowetl/webserver_config.py. flowetl.Dockerfile.dockerignore: added flowetl/webserver_config.py to ignored paths.
Dependencies (runtime + dev)
flowetl/flowetl/setup.py, flowetl/requirements.txt, flowetl/dev-requirements.txt
Added authlib to install_requires and pinned authlib==1.3.1 with hashes in requirements and dev-requirements.
CI cache keys
.circleci/config.yml
Bumped several CircleCI cache key versions (e.g., flowmachine-deps-11flowmachine-deps-12, integration-test-deps-8integration-test-deps-9, flowkit-docs-deps-9flowkit-docs-deps-10) in restore/save cache steps.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as User (Browser)
  participant Web as Airflow Webserver
  participant OAuthLib as Authlib (OAuth client)
  participant Google as Google OAuth

  rect rgba(243,248,255,0.9)
    note left of Web: Startup — import `flowetl/webserver_config.py`\n(env-driven AUTH_TYPE and OAUTH_PROVIDERS)
    Web->>Web: Read env vars and configure OAuth provider(s)
  end

  User->>Web: GET /login
  alt AIRFLOW__WEBSERVER__AUTH_TYPE == "google"
    Web->>OAuthLib: Initiate redirect to Google (auth request)
    OAuthLib->>Google: Authorization request
    Google-->>User: Consent page
    User-->>Google: Approve
    Google-->>OAuthLib: Authorization code
    OAuthLib->>Google: Exchange code for token & userinfo
    Google-->>OAuthLib: Access token & userinfo
    OAuthLib-->>Web: Provide userinfo
    Web->>Web: Create/register user if enabled
    Web-->>User: Authenticated session
  else Fallback auth
    Web-->>User: Default login flow
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

enhancement, dependencies

Poem

A rabbit hops in with a key and a grin,
Google smiles and lets the login begin.
Secrets snug in Docker's den,
Users flow through the gate again.
Hooray — FlowETL hops forward, nibbling code and zen.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately reflects the primary change in the pull request by specifying the addition of authlib and a custom webserver configuration to enable Google authentication in FlowETL, matching the core diff content across dependency, configuration and module changes.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch google-auth

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a2f89d and 491e902.

📒 Files selected for processing (1)
  • .circleci/config.yml (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
flowetl/flowetl/setup.py (1)

38-38: Consider specifying a minimum version for authlib.

The authlib library should have a minimum version constraint to ensure security fixes are included. Version 1.3.1 (released June 2024) contains important security fixes for JOSE/JWK handling, specifically preventing OctKey from incorrectly importing PEM/SSH formatted asymmetric keys as symmetric keys.

Apply this diff to add a minimum version constraint:

-        "authlib"
+        "authlib>=1.3.1"

Based on learnings

flowetl.Dockerfile (1)

51-51: Copy webserver_config.py into a fixed location and update ENV
Replace the current ENV with:

COPY --chown=airflow ./flowetl/webserver_config.py /opt/airflow/webserver_config.py
ENV AIRFLOW__WEBSERVER__CONFIG_FILE=/opt/airflow/webserver_config.py

The Docker ignore already un-ignores flowetl/webserver_config.py (line 20), so it’ll be included in the build context.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f45f69d and 7a96caa.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • flowetl.Dockerfile (1 hunks)
  • flowetl.Dockerfile.dockerignore (1 hunks)
  • flowetl/dev-requirements.txt (2 hunks)
  • flowetl/flowetl/setup.py (1 hunks)
  • flowetl/requirements.txt (2 hunks)
  • flowetl/webserver_config.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[grammar] ~10-~10: The singular proper name ‘Google’ must be used with a third-person or a past tense verb.
Context: ... [Unreleased] ### Added - Added Google auth to flowetl. ### Changed ### Fixed ##...

(HE_VERB_AGR)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: CodeQL-Build
  • GitHub Check: Summary
  • GitHub Check: run_build_pipeline
🔇 Additional comments (4)
flowetl/dev-requirements.txt (1)

307-313: LGTM!

The authlib dependency is correctly added with version 1.3.1 and integrity hashes. This version includes the security fixes for JOSE/JWK handling mentioned in the learnings.

flowetl.Dockerfile.dockerignore (1)

20-20: Verify that flowetl/webserver_config.py contains no hardcoded secrets
The !flowetl/webserver_config.py entry whitelists this file in the image—ensure it only references environment variables (no literal OAuth credentials or keys). To scan for hardcoded values, run:

#!/bin/bash
# PCRE2-enabled search for hardcoded secrets
rg --pcre2 -n "client_secret\s*=\s*['\"](?!os\.environ|os\.getenv)" flowetl/webserver_config.py
rg --pcre2 -n "client_id\s*=\s*['\"][A-Za-z0-9-]{20,}"       flowetl/webserver_config.py
rg --pcre2 -n "(token|key|secret|password)\s*=\s*['\"][A-Za-z0-9+/=]{16,}" flowetl/webserver_config.py

If --pcre2 isn’t available, use:

rg -n "client_secret\s*=\s*['\"]" flowetl/webserver_config.py | rg -v "os\.environ|os\.getenv"
rg -n "client_id\s*=\s*['\"][A-Za-z0-9-]{20,}" flowetl/webserver_config.py
rg -n "(token|key|secret|password)\s*=\s*['\"][A-Za-z0-9+/=]{16,}" flowetl/webserver_config.py

No matches should be found.

flowetl/webserver_config.py (1)

5-6: Boolean parsing is fine; document the toggle

ENABLE_PROXY_FIX parsing is correct. Add a brief note in deployment docs to set AIRFLOW__WEBSERVER__AUTH_TYPE=google and, if behind a proxy, AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX=true.

flowetl/requirements.txt (1)

284-289: Authlib and cryptography pins are consistent

authlib==1.3.1 and cryptography==42.0.8 appear only in requirements and dev-requirements with no conflicting versions; no ecosystem conflicts detected.

## [Unreleased]

### Added
- Added Google auth to flowetl.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tiny wording polish

Recommend “Added Google authentication to FlowETL.” for clarity and casing.

-- Added Google auth to flowetl.
+- Added Google authentication to FlowETL.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Added Google auth to flowetl.
Added Google authentication to FlowETL.
🧰 Tools
🪛 LanguageTool

[grammar] ~10-~10: The singular proper name ‘Google’ must be used with a third-person or a past tense verb.
Context: ... [Unreleased] ### Added - Added Google auth to flowetl. ### Changed ### Fixed ##...

(HE_VERB_AGR)

🤖 Prompt for AI Agents
In CHANGELOG.md around line 10, the entry "Added Google auth to flowetl." uses
informal wording and inconsistent casing; update it to read "Added Google
authentication to FlowETL." to use the full word "authentication" and capitalize
the project name consistently.

Comment on lines 8 to 34
if use_google_auth:
AUTH_TYPE = AUTH_OAUTH
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer")
AUTH_ROLES_SYNC_AT_LOGIN = False


OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration",
"client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
"client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"authorize_url": "https://accounts.google.com/o/oauth2/v2/auth",
"access_token_url": "https://oauth2.googleapis.com/token",
"client_kwargs": {
"scope": "openid email profile",
"prompt": "consent",
"access_type": "offline",
},
},
},
] No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Harden Google OAuth: fail fast with clear errors and restrict who can sign in

  • Accessing client_id/secret via os.environ[...] raises a KeyError at import with a vague traceback. Raise a clear error instead.
  • Consider restricting login to your Google Workspace domain (e.g., check the ‘hd’ claim in the ID token or equivalent) to prevent any Google account from gaining access, even with a low-privilege default role.

Suggested improvement:

-                "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
-                "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
+                # Fail fast with a clear message if env vars are missing
+                "client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw(
+                    RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(),
+                "client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw(
+                    RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(),

Domain restriction (pick one approach that fits your policy/tooling):

  • Add an auth check that validates the ‘hd’ claim in the ID token equals your domain.
  • Or enforce allowed domains via FAB/OAuth configuration if available in your stack.

Please confirm which restriction you prefer and I can draft the exact snippet.


🏁 Script executed:

#!/bin/bash
rg -n "use_google_auth" -C3 flowetl/webserver_config.py

Length of output: 461


Harden Google OAuth: fail fast on missing creds and restrict sign-in to your Workspace domain

  • Replace os.environ[…] with os.getenv plus explicit RuntimeError to surface missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID/SECRET at startup:
-                "client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
-                "client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
+                "client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw(
+                    RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(),
+                "client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw(
+                    RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(),
  • Enforce your Google Workspace domain by validating the ID token’s “hd” claim equals your-domain.com (or configure FAB’s allowed_domains if supported).

Please confirm your domain or preferred enforcement approach.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if use_google_auth:
AUTH_TYPE = AUTH_OAUTH
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer")
AUTH_ROLES_SYNC_AT_LOGIN = False
OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration",
"client_id": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"],
"client_secret": os.environ["AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"],
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"authorize_url": "https://accounts.google.com/o/oauth2/v2/auth",
"access_token_url": "https://oauth2.googleapis.com/token",
"client_kwargs": {
"scope": "openid email profile",
"prompt": "consent",
"access_type": "offline",
},
},
},
]
if use_google_auth:
AUTH_TYPE = AUTH_OAUTH
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = os.getenv("AIRFLOW__WEBSERVER__AUTH_USER_REGISTRATION_ROLE", "Viewer")
AUTH_ROLES_SYNC_AT_LOGIN = False
OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"server_metadata_url": "https://accounts.google.com/.well-known/openid-configuration",
"client_id": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID"): v or (_ for _ in ()).throw(
RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID")))(),
"client_secret": (lambda v=os.getenv("AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET"): v or (_ for _ in ()).throw(
RuntimeError("Missing AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_SECRET")))(),
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"authorize_url": "https://accounts.google.com/o/oauth2/v2/auth",
"access_token_url": "https://oauth2.googleapis.com/token",
"client_kwargs": {
"scope": "openid email profile",
"prompt": "consent",
"access_type": "offline",
},
},
},
]
🤖 Prompt for AI Agents
In flowetl/webserver_config.py around lines 8 to 34, replace the direct
os.environ[...] calls for AIRFLOW__WEBSERVER__OAUTH_GOOGLE_CLIENT_ID and _SECRET
with os.getenv and fail fast by raising a clear RuntimeError if either is
missing (e.g., read from os.getenv and if None raise RuntimeError with a message
naming the missing var); additionally enforce Google Workspace domain
restriction by validating the OpenID Connect ID token’s "hd" claim equals
your-domain (or alternatively set FAB's allowed_domains if preferred) — read the
allowed domain from an env var like
AIRFLOW__WEBSERVER__OAUTH_GOOGLE_ALLOWED_DOMAIN and implement the hd check after
token verification (or wire the allowed_domains config) so only users from that
domain can sign in.

@cypress
Copy link

cypress bot commented Oct 14, 2025

FlowAuth    Run #25327

Run Properties:  status check passed Passed #25327  •  git commit 491e902651: And bust another cache
Project FlowAuth
Branch Review google-auth
Run status status check passed Passed #25327
Run duration 00m 48s
Commit git commit 491e902651: And bust another cache
Committer Jonathan Gray
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
View all changes introduced in this branch ↗︎

@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.08%. Comparing base (f45f69d) to head (491e902).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7159   +/-   ##
=======================================
  Coverage   92.08%   92.08%           
=======================================
  Files         277      277           
  Lines       10778    10778           
  Branches      697      697           
=======================================
  Hits         9925     9925           
  Misses        700      700           
  Partials      153      153           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@greenape greenape merged commit 84d4f65 into master Oct 15, 2025
39 checks passed
@greenape greenape deleted the google-auth branch October 15, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants