Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace urlparse with urlsplit #27389

Merged
merged 11 commits into from
Nov 14, 2022
Merged

Replace urlparse with urlsplit #27389

merged 11 commits into from
Nov 14, 2022

Conversation

westonkl
Copy link
Contributor

Replaces urlparse with urlsplit as urlsplit is faster and lighterweight.

urlsplit is similar to urlparse(), but does not split the params from the URL. The params functionality allows you to put variables in the url. This should generally be used instead of urlparse().

urlsplit: (addressing scheme, network location, path, query, fragment identifier)
urlparse: (addressing scheme, network location, path, params, query, fragment identifier)


@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Oct 30, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 30, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@BobDu
Copy link
Contributor

BobDu commented Oct 31, 2022

why are you say urlsplit is faster and lighterweight?
python official docs not record it, even, I didn't find any other information recommending use urlsplit.

@westonkl
Copy link
Contributor Author

why are you say urlsplit is faster and lighterweight? python official docs not record it, even, I didn't find any other information recommending use urlsplit.

Its not much faster and maybe its a bit overselling it to be calling it lighterweight, but it is faster as it doesn't try to parse the url for params. The second sentence here of python docs on it says "This should generally be used instead of urlparse()".

@westonkl
Copy link
Contributor Author

Btw there were other files that could have this change applied to them, but I just wanted to see if it was a wanted change before I updated them too.

@BobDu
Copy link
Contributor

BobDu commented Oct 31, 2022

why are you say urlsplit is faster and lighterweight? python official docs not record it, even, I didn't find any other information recommending use urlsplit.

Its not much faster and maybe its a bit overselling it to be calling it lighterweight, but it is faster as it doesn't try to parse the url for params. The second sentence here of python docs on it says "This should generally be used instead of urlparse()".

Emm, if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see [RFC 2396](https://datatracker.ietf.org/doc/html/rfc2396.html)) is wanted.

But, It seems we don't need support RFC 2396 update syntax?

@uranusjr
Copy link
Member

I don’t think there’s any perceptible performance difference between urlsplit and urlparse, but the former is indeed better; literally nobody uses RFC 2396 now.

@eladkal
Copy link
Contributor

eladkal commented Oct 31, 2022

there are more usages for this in the project. Should we adress them as well?

@westonkl
Copy link
Contributor Author

there are more usages for this in the project. Should we adress them as well?

Yup, I can add them into this pr

@westonkl westonkl changed the title Replace urlparse with urlsplit in s3 files Replace urlparse with urlsplit Oct 31, 2022
@BobDu
Copy link
Contributor

BobDu commented Oct 31, 2022

some small ideas, If we replace all urlparse to urlsplit, can we add some lint rule in pre-commit to prevent new code use urlparse again in future.

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like something went wrong with your git operations. you have a lot of unrelated changes.

@westonkl
Copy link
Contributor Author

Yeah I just saw, should I create a temp branch and cherry pick my commits?

@dstandish
Copy link
Contributor

Yeah I just saw, should I create a temp branch and cherry pick my commits?

any strategy that works is fine :)

and yeah that sounds like a working strategy.

@westonkl westonkl marked this pull request as draft November 2, 2022 17:04
@westonkl
Copy link
Contributor Author

looks like something went wrong with your git operations. you have a lot of unrelated changes.

This was fixed, could you review again?

@eladkal
Copy link
Contributor

eladkal commented Nov 11, 2022

Needs rebase and resolving conflicts

@eladkal eladkal merged commit 00af5c0 into apache:main Nov 14, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 14, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants