Skip to content

Using requirements file in VirtualEnvPythonOperation appears to be broken #36069

@timc

Description

@timc

Apache Airflow version

2.7.3

What happened

When creating a virtual env task and passing in a requirements file like this:

@task.virtualenv( use_dill=True, system_site_packages=False, requirements='requirements.txt')

The result is that the contents of the requirements file using to populate the venv is

requirements.txt

Which is wrong. And you get this:

[2023-12-05, 12:33:06 UTC] {{process_utils.py:181}} INFO - Executing cmd: python3 /usr/local//.local/lib/python3.10/site-packages/virtualenv /tmp/venv3cdlqjlq
[2023-12-05, 12:33:06 UTC] {{process_utils.py:185}} INFO - Output:
[2023-12-05, 12:33:07 UTC] {{process_utils.py:189}} INFO - created virtual environment CPython3.10.9.final.0-64 in 397ms
[2023-12-05, 12:33:07 UTC] {{process_utils.py:189}} INFO - creator CPython3Posix(dest=/tmp/venv3cdlqjlq, clear=False, no_vcs_ignore=False, global=False)
[2023-12-05, 12:33:07 UTC] {{process_utils.py:189}} INFO - seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/usr/local/
/.local/share/virtualenv)
[2023-12-05, 12:33:07 UTC] {{process_utils.py:189}} INFO - added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
[2023-12-05, 12:33:07 UTC] {{process_utils.py:189}} INFO - activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
[2023-12-05, 12:33:07 UTC] {{process_utils.py:181}} INFO - Executing cmd: /tmp/venv3cdlqjlq/bin/pip install -r /tmp/venv3cdlqjlq/requirements.txt
[2023-12-05, 12:33:07 UTC] {{process_utils.py:185}} INFO - Output:
[2023-12-05, 12:33:09 UTC] {{process_utils.py:189}} INFO - ERROR: Could not find a version that satisfies the requirement requirements.txt (from versions: none)
[2023-12-05, 12:33:09 UTC] {{process_utils.py:189}} INFO - HINT: You are attempting to install a package literally named "requirements.txt" (which cannot exist). Consider using the '-r' flag to install the packages listed in requirements.txt
[2023-12-05, 12:33:09 UTC] {{process_utils.py:189}} INFO - ERROR: No matching distribution found for requirements.txt
[2023-12-05, 12:33:09 UTC] {{taskinstance.py:1824}} ERROR - Task failed with exception

The issue appears to be that the requirements parameter is added to a list on construction of the operator so the templating never happens.

What you think should happen instead

The provided requirements file should be used in the pip command to set up the venv.

How to reproduce

Create a dag:

from datetime import datetime
from airflow.decorators import dag, task

 @dag(schedule_interval=None, start_date=datetime(2021, 1, 1), catchup=False, tags=['example']) 
def virtualenv_task():
    @task.virtualenv(
        use_dill=True,
        system_site_packages=False,
        requirements='requirements.txt',
    )
    def extract():
        import pandas 
        x = pandas.DataFrame()

    extract()

dag = virtualenv_task()

And a requirements.txt file

pandas

Run AirFlow

Operating System

Ubuntu 23.04

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.2.0
apache-airflow-providers-celery==3.2.1
apache-airflow-providers-common-sql==1.5.2
apache-airflow-providers-ftp==3.4.2
apache-airflow-providers-http==4.4.2
apache-airflow-providers-imap==3.2.2
apache-airflow-providers-postgres==5.5.1
apache-airflow-providers-sqlite==3.4.2

Deployment

Docker-Compose

Deployment details

No response

Anything else

Everytime.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions