Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize the conn_id to disallow potential script execution #32867

Merged
merged 33 commits into from Jan 16, 2024

Conversation

andylamp
Copy link
Contributor

@andylamp andylamp commented Jul 26, 2023


This PR attempts to address an issue raised by @potiuk in #32770, which aims to sanitize the conn_id value. I have used a regex to check its validity during creation and have allowed any alphanumeric characters plus some special characters albeit the ones that could potentially be used to introduce scripts.

Further, I have also set the limit of characters for a valid connection to be up to 200, which I think is also a reasonable number but happy to adjust based on feedback.

closes: #32770

@boring-cyborg
Copy link

boring-cyborg bot commented Jul 26, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@vincbeck
Copy link
Contributor

Good job on your first PR 👏

@andylamp
Copy link
Contributor Author

thanks a lot @vincbeck! hopefully one of many :)

@andylamp
Copy link
Contributor Author

hmm @uranusjr, looking at this a bit further - I omitted a detail when reviewing your code. match returns a matched object which is a group of matches and not a str as is the desired outcome at this stage. I have tweaked it in order to still be valid and return the desired value.

airflow/models/connection.py Outdated Show resolved Hide resolved
airflow/models/connection.py Show resolved Hide resolved
airflow/models/connection.py Outdated Show resolved Hide resolved
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit. Looks awesome otherwise +1
Good work!

tests/models/test_connection.py Outdated Show resolved Hide resolved
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI passing.

@andylamp
Copy link
Contributor Author

@uranusjr thr failed CI tasks are not related to this PR, if am not mistaken... right?

@potiuk
Copy link
Member

potiuk commented Jul 31, 2023

@uranusjr thr failed CI tasks are not related to this PR, if am not mistaken... right?

@andylamp -> you are 52 commits behind. It's hard to say. Certainly the main builds are not failing with the same error. But you will actually find out when you rebase to latest . Which is always the first thing to do if you see errors that might be unrelated.

@andylamp
Copy link
Contributor Author

andylamp commented Jan 15, 2024

hi all, sorry for the delay - got a bit of time to get this fixed. @potiuk it turns out that the error FAB front-end was failing was due to the string message having an illegal character.

To further reduce any potential formatting (and security?) impact I have reduced the available characters that can comprise a valid conn_id. Now the allowed characters are allows alphanumeric plus the symbols #,!,-,_,.,:,,/ and () requiring at least one match. To allow a bit of freedom, I have put a parameter to configure the max length allowed and by default it is set to the same length keys should have, namely 250.

All tests that I tried pass and the connection error pops-up successfully without any errors in the backend of Airflow.

image

Happy for any comments regarding this but I think it is finally complete :-).

@andylamp
Copy link
Contributor Author

andylamp commented Jan 15, 2024

addressing the pre-commit error and pushing a fix again with some typo fixes as well, hopefully that's the end of it :)

@potiuk potiuk dismissed their stale review January 15, 2024 20:31

Modified multiple times.

@potiuk
Copy link
Member

potiuk commented Jan 15, 2024

Hmm... Just thinking about that... What will happen in f someone already has conn_id with forbidden character ... I think - even if it's not the cleanest, maybe a better solution (if possible) we can simply escape what's there when passing it to form control?

@andylamp
Copy link
Contributor Author

andylamp commented Jan 15, 2024

The issue was with the description text shown in the error message, not the regex itself - so in theory can I can use the old one but then perhaps the error message gets rather long due to not being able to show the concise message using special characters... What do you think @potiuk?

@potiuk
Copy link
Member

potiuk commented Jan 15, 2024

The issue was with the description text shown in the error message, not the regex itself - so in theory can I can use the old one but then perhaps the error message gets rather long due to not being able to show the concise message using special characters... What do you think @potiuk?

Well ... in this case... we can also just escape AND shorten the connection id. It's not THAT important to show connection_id fully - we know what connection id we were just updating so showing all of it in the error message is probably good enough.

I have a feeling (now after looking at it and reallising that it's just error message) that sanitizing the id at entry in this case is far too invasive.

@andylamp
Copy link
Contributor Author

andylamp commented Jan 15, 2024

Sorry, perhaps that was misunderstood - the error pops up when you try to edit a connection id and the new value is an updated one that includes an error. I do not think you can have a situation where an existing connection id was invalid...

As far as the error message goes; I think It is good practice to show the error but also what the rules of a valid connection id are...

Finally, just to be on the same page here - what do you mean by escape? Because, in terms of raw strings as far as Python goes it is escaped and parsed as a raw string but markdown safe parsing does not allow these characters from FAB itself.

edit: the sanitization happens to ensure validity and that the entry is valid - the concerns you had are relevant, it's just that the error message showing in my PR had an issue. That issue I rectified by tweaking the message shown :)

edit2: BTW, the connection ID is not shown in the error message. The error you encountered stemmed by the fact I showed the regex rules for an acceptable connection id; and that was causing an issue with the rendering. I understood based on what you said that perhaps this was not clear...

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah .. Stupid me.. It's been too long from the original discussion and I simply made some bad assumptions - I have forgotten that we already validate the connection_id in the first place and that all connection_ids in DB are already following it.

Sorry for the confusion.

It looks good now! LGTM

@andylamp
Copy link
Contributor Author

I think these tests fail in the main branch as well and are unrelated to my changes :)

@potiuk
Copy link
Member

potiuk commented Jan 16, 2024

The compat one - yes - I am looking at it now.

@potiuk potiuk merged commit 71f422c into apache:main Jan 16, 2024
54 of 55 checks passed
Copy link

boring-cyborg bot commented Jan 16, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@potiuk
Copy link
Member

potiuk commented Jan 16, 2024

Congrats on your firs PR merged 🎉

@potiuk
Copy link
Member

potiuk commented Jan 16, 2024

(BTW the main bug is also fixed now).

@andylamp
Copy link
Contributor Author

Awesome!! Thanks for the visibility!! 💪

@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Feb 9, 2024
jedcunningham pushed a commit that referenced this pull request Feb 9, 2024
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
abhishekbhakat pushed a commit to abhishekbhakat/my_airflow that referenced this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection ID is not sanitized and you can self - javascript yourself
8 participants