Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the BaseOperator template_ext behavior and where it is applied #1017

Closed
ghost opened this issue Feb 17, 2016 · 5 comments
Closed

Clarify the BaseOperator template_ext behavior and where it is applied #1017

ghost opened this issue Feb 17, 2016 · 5 comments

Comments

@ghost
Copy link

ghost commented Feb 17, 2016

Hi,

It seems that when BashOperator is called with a shell script (".sh") instead of a simple command like "date" or a python script like "job.py", airflow is giving a jinja template "not found" error.

It seems like a very weird bug so I was reinstalling airflow and trying it with the tutorial project and I can reproduce the bug there as well.

Even stranger, adding a space as the last character in the shell script makes it work with airflow and run correctly without the error!

To reproduce, change the tutorial project BashOperator t2 to a simple shell script that does something simple, like "echo hello world". Remember to make it executable.

t2 = BashOperator(
task_id='sleep',
bash_command="/home/batcher/test.sh", // This fails with template error
#bash_command="/home/batcher/test.sh ", // This works (has a space after)
dag=dag)

Can someone else reproduce?

@ghost ghost changed the title BashOperator gives jinja2 template error when called with a script BashOperator gives jinja2 template error when called with a shell script Feb 17, 2016
@mistercrunch
Copy link
Member

The reason why this happens is because the operator is looking for .sh extensions and trying to resolve those relative to you pipeline (.py) file. You cannot use full paths when using this feature (handled by jinja). When you add a space, it reverts to thinking that the string is the content of a bash command (as opposed to a reference to a file) and then just runs it.

@mistercrunch
Copy link
Member

Someone should clarify this in the docs.

@ghost
Copy link
Author

ghost commented Feb 18, 2016

Thanks for the explaination. It seems like not very intuitive behavior for the end user since python scripts use the full path, but shell scripts should use relative paths to the dag file as I understand it?

@mistercrunch
Copy link
Member

Yes, we need to clarify this behavior in the docs. Renaming this issue.

@mistercrunch mistercrunch changed the title BashOperator gives jinja2 template error when called with a shell script Clarify the BaseOperator template_ext behavior and where it is applied Feb 19, 2016
@mistercrunch
Copy link
Member

A quick explanation:
BaseOperator defines a template_ext class attribute that goes along with template_fields.

template_fields is a tuple of instance attributes (that match the constructor kwargs by convention) that are "templated" and for which we should apply the jinja2 magic.

template_ext is a tuple of file extensions as in ('.hql', '.sql') that if found as endswith in the template_fields, should be resolved and as a file to get the file's content. For example in the HiveOperator, the hql attribute is templated and .hql files will get resolved by Jinja. You have the choice to pass an HQL string, or a reference to a file.

For BashOperator it's more confusing since things like bash myscript.sh will result in a TemplateNotFound error. Adding a space a the end bypasses BaseOperator trying to resolve the file.

Note that the files are resolved relative to where the pipeline file lives. You can also add other folders to the template_searchpath as you create the DAG object.
http://pythonhosted.org/airflow/code.html#airflow.models.DAG

The magic takes place at some point between the __init__ and the execute methods of the operator, and is handled by the framework

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants