Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow PythonVenvOperator using other index url #33017

Conversation

jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Aug 1, 2023

This PR adds the option to specify extra index URLs to PythonVirtualEnvOperator (+corresponding decorator) in order to be able to install virtualenvs with (private) additional Python package repositories.

Besides the option to modify the worker config or adding ENVs to specify additional PIP install configs, this allows to use different back-ends per DAG/Task.

As the definition of (private) extra index URLs might require credentials to be passed, using plain requirements or pip install options would expose passwords as secrets in logs. Therefore the Airflow core provided connection types were extended to contain a "Package Index (Python)" connection type that can be used to store the credentials in Airflow in a (more) secret way and preventing to expose.

Summary of changes, separated by commit:

  • Added a Package Index (Python) hook
  • Extended PythonVirtualEnvOperator for extra index URL and caching

Note: The PR was based on a discussion in Slack in https://apache-airflow.slack.com/archives/CCPRP7943/p1684608779378759
Note (2): Caching of venvs was separated in a follow-up PR boschglobal#2

How and what to test:

  • Pipeline green obviously
  • Open the Web UI connections form and see that a new connection type "Package Index (Python)" is available (with proper form)
  • Take a look to docs and check with a DAG, e.g. use example_python_operator and modify some parameters and try the new options
  • Review the added RST docs

@AutomationDev85 @clellmann @wolfdn

@boring-cyborg boring-cyborg bot added area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:webserver Webserver related Issues kind:documentation labels Aug 1, 2023
@jscheffl jscheffl changed the title Allow PythonVenvOperator using other index url and cacheing Allow PythonVenvOperator using other index url and caching Aug 1, 2023
@uranusjr
Copy link
Member

uranusjr commented Aug 2, 2023

I would prefer the changes are split into separate pull requests. The repository uses squash-merge and the separate commits would be lost in Git history.

@jscheffl jscheffl force-pushed the feature/allow-venv-operator-using-other-index-url branch from 1c20698 to 851a23c Compare August 2, 2023 20:42
@jscheffl jscheffl changed the title Allow PythonVenvOperator using other index url and caching [WIP] Allow PythonVenvOperator using other index url and caching Aug 2, 2023
@jscheffl
Copy link
Contributor Author

jscheffl commented Aug 2, 2023

I would prefer the changes are split into separate pull requests. The repository uses squash-merge and the separate commits would be lost in Git history.

Good idea @uranusjr , did not think about the squash. Splitted the ground work into #33051 - This PR branch is based on the other PR's branch (now). Change are only the two latter commits, the first both will go away if the other is merged.

@jscheffl
Copy link
Contributor Author

jscheffl commented Aug 6, 2023

I would prefer the changes are split into separate pull requests. The repository uses squash-merge and the separate commits would be lost in Git history.

@uranusjr Now the spin-off from the base rework is merged, this PR is reduced to the core feature of having a "Package Index" core hook/connection and the option to use this in VirtualEnv operators

@jscheffl jscheffl changed the title [WIP] Allow PythonVenvOperator using other index url and caching Allow PythonVenvOperator using other index url and caching Aug 6, 2023
@jscheffl jscheffl added the type:new-feature Changelog: New Features label Aug 6, 2023
@jscheffl jscheffl force-pushed the feature/allow-venv-operator-using-other-index-url branch from 5a1de9d to 80015e3 Compare August 8, 2023 22:00
@jscheffl jscheffl changed the title Allow PythonVenvOperator using other index url and caching Allow PythonVenvOperator using other index url Aug 8, 2023
@jscheffl jscheffl removed area:webserver Webserver related Issues area:CLI labels Aug 8, 2023
@uranusjr
Copy link
Member

uranusjr commented Aug 9, 2023

One minor comment, overall looks good to me.

airflow/operators/python.py Outdated Show resolved Hide resolved
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really goood. But I have a question about separating out BranchPythonVirtualenvOperator to a separate PR.

@potiuk potiuk merged commit 7d87d71 into apache:main Aug 10, 2023
42 checks passed
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 17, 2023
* Extended PythonVirtualEnvOperator for extra index URL

* Separate-out BranchPythonVirtualenvOperator from this PR
@jscheffl jscheffl deleted the feature/allow-venv-operator-using-other-index-url branch October 28, 2023 08:22
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants