Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] module not found #2305

Closed
danielc103 opened this issue Apr 13, 2020 · 12 comments
Closed

[airflow] module not found #2305

danielc103 opened this issue Apr 13, 2020 · 12 comments

Comments

@danielc103
Copy link

Which chart:
[airflow] version 4.4.3

Describe the bug
DAG requiring cx_oracle module throwing error cannot be found. needed to manually pip install cx_oracle

ERROR

  from acme.hooks.oracle_hook import OracleHook
  File "/opt/bitnami/airflow/dags/acme/hooks/oracle_hook.py", line 20, in <module>
    import cx_Oracle
ModuleNotFoundError: No module named 'cx_Oracle'

To Reproduce

  • deploy helm chart
  • add dags that require oracle hooks
  • run dag with dependency

Expected behavior
cx_oracle module to be found

Additional context
This very well may be a lack of my understanding of Airflow. This python module should be used by oracle hooks from airflow as well as added modules like what I'm using.

https://airflow.apache.org/docs/stable/_modules/airflow/hooks/oracle_hook.html

@marcosbc
Copy link
Contributor

Hi @danielc103, note that the official Airflow image does not include this module either. In their official documentation they also explain why: https://airflow.apache.org/docs/stable/installation.html

We believe this could be solved by allowing installation of extra Airflow packages, instead of adding all of these (with the consequent image size increase). Find the related open issue here: https://github.com/bitnami/bitnami-docker-airflow/issues/32.

@danielc103
Copy link
Author

I've rebuilt bitnami image with oracle client installed. I'm still getting module not found errors. In this case it's nested packages.

/opt/bitanmi/airflow/dags/git/

git
|    mydag.py
|    mydag2.py
|___acme
|     |    __init__.py
|     |    hook.py

Broken DAG: [/opt/bitnami/airflow/dags/git/mydag.py] No module named 'acme' or 'cx_orcale'

If I move acme folder to to dag root. /opt/bitnami/airflow/dags/acme/git/ the errors go away in the command line when I list dags using CLI airflow list_dags but still show up in ui with errors, even after the 300s dag update interval.

@danielc103
Copy link
Author

it's odd even deleting the dags from the git folder and moving them to root /opt/bitnami/airflow/dags I still get the same error of

Broken DAG: [/opt/bitnami/airflow/dags/git/mydag.py] No module named 'acme'

there's no dags in /git folder

@marcosbc
Copy link
Contributor

marcosbc commented Apr 16, 2020

@danielc103 Could you let us know how you are installing the cx_oracle module? Note that the Bitnami Airflow container image includes a Python virtual env, so the installation method differs a bit from doing a simple "pip install cx_oracle". Instead you would need to do something like:

. /opt/bitnami/airflow/venv/bin/activate
pip install cx_oracle
deactivate

Have you taken a look at this PR? https://github.com/bitnami/bitnami-docker-airflow/pull/33/files
We are looking into merging those changes in the next few days.

You could check it out, by creating/mounting the file /bitnami/python/requirements.txt with the list of modules to install, it will automatically install them when running Airflow. It should work for your scenario.

@danielc103
Copy link
Author

Yep found that out after digging around. Install in Dockerfile

COPY requirements.txt requirements.txt
RUN . /opt/bitnami/airflow/venv/bin/activate && pip install -r requirements.txt

I did not run deactivate command after. I am rebuilding image right now to test. Would that also affect the nested acme module?

@danielc103
Copy link
Author

even adding the deactivate after the requirements install I still get the same errors.

$ airflow list_dags

[2020-04-16 17:56:06,449] {settings.py:253} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=163
[2020-04-16 17:56:07,247] {__init__.py:51} INFO - Using executor CeleryExecutor
[2020-04-16 17:56:07,248] {dagbag.py:403} INFO - Filling up the DagBag from /opt/bitnami/airflow/dags
[2020-04-16 17:56:07,273] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/bulk_dump.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/bulk_dump.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresOperatorWithTemplatedParams, PostgresBulkDumpOperator, \
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,274] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/bulk_load.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/bulk_load.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresOperatorWithTemplatedParams, PostgresBulkDumpOperator, \
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,275] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/customer_clear.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/customer_clear.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresOperatorWithTemplatedParams
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,276] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/customer_load.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/customer_load.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresToPostgresOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,277] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/customer_staging.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/customer_staging.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresToPostgresOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,279] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/labVantage.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/labVantage.py", line 19, in <module>
    from acme.hooks.oracle_hook import OracleHook
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,280] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/load2mssql.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/load2mssql.py", line 18, in <module>
    from acme.operators.mssql_operator import MsSqlLoadOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,280] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/oracle2mssql.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/oracle2mssql.py", line 18, in <module>
    from acme.operators.dwh_operators import OracleToMsSqlOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,394] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/postgres2mssql.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/postgres2mssql.py", line 18, in <module>
    from acme.operators.dwh_operators import PostgresToMsSqlOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,399] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/rl6.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/rl6.py", line 22, in <module>
    from acme.operators.dwh_operators import OracleToOracleOperator
ModuleNotFoundError: No module named 'acme'
[2020-04-16 17:56:07,401] {dagbag.py:246} ERROR - Failed to import: /opt/bitnami/airflow/dags/git/rl62.py
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/dagbag.py", line 243, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/bitnami/airflow/venv/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/bitnami/airflow/dags/git/rl62.py", line 22, in <module>
    from acme.operators.dwh_operators import OracleToOracleOperator
ModuleNotFoundError: No module named 'acme'


-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
example_twitter_dag
init_docker_example
oracle_test1
tutorial
tutorialtest

@marcosbc
Copy link
Contributor

@danielc103 Could you share the requirements.txt file? Also, it would be great if you could provide us with the DAG you're using, or a similar one where similar issues happen. That way we can try to reproduce your issues locally.

P.S. Just checking, I assume the "acme" module is listed in "requirements.txt", right?

@danielc103
Copy link
Author

requirements.txt is just cx_Oracle at the moment.

acme module is a nested module in the dag directory. So I have /opt/bitnami/airflow/dags/git/acme/

Similar to this folder structure https://github.com/gtoonstra/etl-with-airflow/tree/master/examples/etl-example/dags.

This dag is at /opt/bitnami/airflow/dags/git/customer_clear.py

from __future__ import print_function
import airflow
from datetime import datetime, timedelta
from acme.operators.dwh_operators import PostgresOperatorWithTemplatedParams
from acme.operators.dwh_operators import AuditOperator
from airflow.models import Variable


args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(7),
    'provide_context': True
}

tmpl_search_path = Variable.get("sql_path")

dag = airflow.DAG(
    'customer_clear',
    schedule_interval="@once",
    dagrun_timeout=timedelta(minutes=60),
    template_searchpath=tmpl_search_path,
    default_args=args,
    max_active_runs=1)

get_auditid = AuditOperator(
    task_id='get_audit_id',
    postgres_conn_id='postgres_dwh',
    audit_key="customer",
    cycle_dtm="{{ ts }}",
    dag=dag,
    pool='postgres_dwh')

clear_customer = PostgresOperatorWithTemplatedParams(
    sql='TRUNCATE staging.customer CASCADE',
    postgres_conn_id='postgres_dwh',
    task_id='clear_customer',
    dag=dag,
    pool='postgres_dwh')

get_auditid >> clear_customer


if __name__ == "__main__":
    dag.cli()

@danielc103
Copy link
Author

danielc103 commented Apr 17, 2020

I've deployed on minishift locally and imported the following dags through git

https://github.com/gtoonstra/etl-with-airflow/tree/master/examples/etl-example/dags

myvalues.yaml

airflow:
  cloneDagFilesFromGit:
    enabled: true
    repository: https://github.com/gtoonstra/etl-with-airflow
    branch: master
    path: examples/etl-example/dags/

securityContext:
    enabled: false

postgresql:
    volumePermissions:
        enabled: false
    shmVolume:
        chmod:
            enabled: false
    securityContext:
        enabled: false

redis:
    securityContext:
        enabled: false

helm install airflow -f myvalues.yaml bitnami/airlfow

and get the same acme not found errors. Broken DAG: [/opt/bitnami/airflow/dags/git/orders_staging.py] No module named 'acme' I checked and the folder and all was imported successfully.

@marcosbc
Copy link
Contributor

@danielc103 It looks to be working fine for us.

However, if we create a new "git" directory and move all files/folders inside there, we get a similar error:

airflow_1            | ModuleNotFoundError: No module named 'acme'

In order to fix that we had to do a couple of things:

  • Create an empty "init.py" file inside the "git" folder so it is recognized as a valid Python module. The "dags" directory should also contain this file.

  • Change "acme" imports to "git.acme". For instance:

    -from acme.operators.dwh_operators import AuditOperator
    +from git.acme.operators.dwh_operators import AuditOperator
    -import acme
    +import git.acme as acme

Hope it works!

@danielc103
Copy link
Author

yup, that was it. thank you very much for holding my hand through that. lol.

Can I pr for documentation on explaining the nested module?
Also can we expose in the chart the ability to put at root of dags folder not in git or external?

@marcosbc
Copy link
Contributor

marcosbc commented Apr 21, 2020

Can I pr for documentation on explaining the nested module?

Please do! It looks like it is already explained how to clone DAGs from a Git repo, so we should probably mention that any cloned directory should contain the init.py file when loading nested modules.

Also can we expose in the chart the ability to put at root of dags folder not in git or external?

I guess it makes sense, we just need make sure it is compatible with mounting DAGs via config maps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants