Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize the conn_id to disallow potential script execution #32867

Merged
merged 33 commits into from Jan 16, 2024
Merged
Changes from 3 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8b52ca7
sanitize the conn_id to disallow potential script execution
andylamp Jul 26, 2023
8da482d
Merge branch 'main' into main
andylamp Jul 26, 2023
6a4a01d
update docs and naming of the regex to better indicate its purpose
andylamp Jul 26, 2023
c0d2143
use re2 instead of re, as per static checks
andylamp Jul 26, 2023
2912fec
Update airflow/models/connection.py
andylamp Jul 27, 2023
a78c149
Merge branch 'main' into main
andylamp Jul 27, 2023
735b34b
refactor to return the first matched group, which is out desired outcome
andylamp Jul 27, 2023
44a7211
tweak the comments
andylamp Jul 27, 2023
54e7cd4
add a test suite for the regex sanitization
andylamp Jul 27, 2023
bfdcf86
fix the docs linter, change to en-US instead of en-GB
andylamp Jul 28, 2023
546afff
remove docstrings from test
andylamp Jul 28, 2023
96d72c1
Merge branch 'main' into main
andylamp Jul 31, 2023
a524a5e
update to fix integration tests
andylamp Aug 6, 2023
7882372
Merge branch 'main' into main
andylamp Aug 6, 2023
fa14397
update to remove logging that messes up tests
andylamp Aug 6, 2023
304836b
Merge remote-tracking branch 'origin/main'
andylamp Aug 6, 2023
9cf14cb
Merge branch 'main' into main
andylamp Aug 6, 2023
f9cd0b6
Merge branch 'main' into main
andylamp Aug 7, 2023
997e648
Update airflow/models/connection.py
andylamp Aug 7, 2023
b1ee2e1
remove commented code
andylamp Aug 8, 2023
c04caef
push regex update to handle aws ARN format
andylamp Aug 13, 2023
088f0de
make ruff happy
andylamp Aug 13, 2023
5491fe6
Merge branch 'main' into main
andylamp Aug 13, 2023
5408199
Merge remote-tracking branch 'origin/main'
andylamp Aug 13, 2023
6aa9612
add validator to form
andylamp Jan 12, 2024
0dc4134
Merge remote-tracking branch 'origin/main'
andylamp Jan 12, 2024
f8aa6d2
feat: add sanitization, proper messaging to FAB, fix errors/tests
andylamp Jan 15, 2024
68d9bdc
bugfix: typo in function docs
andylamp Jan 15, 2024
5ad35ca
bugfix: typo in function docs, add example for the regex
andylamp Jan 15, 2024
156f3fc
Merge branch 'main' into main
andylamp Jan 15, 2024
e1c126c
Merge branch 'main' into main
andylamp Jan 15, 2024
3feef94
Merge branch 'main' into main
andylamp Jan 15, 2024
90b3220
Merge branch 'main' into main
andylamp Jan 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 24 additions & 1 deletion airflow/models/connection.py
Expand Up @@ -19,6 +19,7 @@

import json
import logging
import re
import warnings
from json import JSONDecodeError
from urllib.parse import parse_qsl, quote, unquote, urlencode, urlsplit
Expand All @@ -35,6 +36,9 @@
from airflow.utils.module_loading import import_string

log = logging.getLogger(__name__)
# sanitize the `conn_id` pattern by allowing alphanumeric characters plus
# the symbols @,#,$,%,&,!,-,_, and () from 1 matches up to 200.
_RE_SANITIZE_CONN_ID = re.compile("^[\w\@\#\$\%\&\!\(\)\*\-]{1,200}$")


def parse_netloc_to_hostname(*args, **kwargs):
Expand All @@ -43,6 +47,25 @@ def parse_netloc_to_hostname(*args, **kwargs):
return _parse_netloc_to_hostname(*args, **kwargs)


def sanitize_conn_id(conn_id: str | None) -> str | None:
andylamp marked this conversation as resolved.
Show resolved Hide resolved
andylamp marked this conversation as resolved.
Show resolved Hide resolved
"""
Sanitises the connection id and allows only specific characters to be within. Namely,
it allows alphanumeric characters plus the symbols @,#,$,%,&,!,-,_, and () from 1
and up to 200 consecutive matches.

The character selection is such that it prevents the injection of javascript or
executable bits in order to avoid any awkward behavior in the front-end.

:param conn_id: The connection id to sanitize.
:return: the sanitized string, `None` otherwise.
andylamp marked this conversation as resolved.
Show resolved Hide resolved
andylamp marked this conversation as resolved.
Show resolved Hide resolved
"""
res = None
if conn_id is None or (res := re.match(_RE_SANITIZE_CONN_ID, conn_id)) is None:
log.warning("We failed to match `conn_id` to the allowed pattern or it was None")

return res if res is None else conn_id
andylamp marked this conversation as resolved.
Show resolved Hide resolved


# Python automatically converts all letters to lowercase in hostname
# See: https://issues.apache.org/jira/browse/AIRFLOW-3615
def _parse_netloc_to_hostname(uri_parts):
Expand Down Expand Up @@ -112,7 +135,7 @@ def __init__(
uri: str | None = None,
):
super().__init__()
self.conn_id = conn_id
self.conn_id = sanitize_conn_id(conn_id)
self.description = description
if extra and not isinstance(extra, str):
extra = json.dumps(extra)
Expand Down