Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix using .json template extension in GMP operators #9566

Merged
merged 5 commits into from
Jul 2, 2020

Conversation

turbaszek
Copy link
Member

@turbaszek turbaszek commented Jun 29, 2020

Closes #9541


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Jun 29, 2020
@turbaszek
Copy link
Member Author

turbaszek commented Jun 29, 2020

@olchas @efolgar would you mind taking a look?

@@ -298,6 +299,12 @@ def __init__(
self.gcp_conn_id = gcp_conn_id
self.delegate_to = delegate_to

def prepare_template(self) -> None:
# If .json is passed then we have to read the file
if isinstance(self.report, str) and self.report.endswith('.json'):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work for templates passed in with a *.json extension, but doesn't cover the case where report is passed as a JSON string template. For example:

create_report = GoogleCampaignManagerInsertReportOperator(
        task_id="create_report",
        gcp_conn_id=self.config.get("gcp_connection_id"),
        profile_id=self.config.get("cm_profile_id"),
        report="""
            {
               'name': '{{ params.title }}',
               'accountId': '{{ params.account }}',
               'fileName': '{{ params.file_name }}',
               'type': 'STANDARD',
               'criteria': {
               ...""",
        params=report_params,
        dag=self.dag)

Safest approach may be to parse the json string within the execute method.

Copy link
Member Author

@turbaszek turbaszek Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to documentation user should provide a dictionary so I would say that passing str doesn't have to be supported.

Copy link
Member

@mik-laj mik-laj Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a task as below to create a task based on a JSON file from another source e.g. variable.

create_report = GoogleCampaignManagerInsertReportOperator(
        task_id="create_report",
        gcp_conn_id=self.config.get("gcp_connection_id"),
        profile_id=self.config.get("cm_profile_id"),
        report=json.loads("""
            {
               'name': '{{ params.title }}',
               'accountId': '{{ params.account }}',
               'fileName': '{{ params.file_name }}',
               'type': 'STANDARD',
               'criteria': {
               ..."""),
        params=report_params,
        dag=self.dag)

Otherwise we have ambiguity. The str type means the path to the file in one case and the contents of the file in other case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, although you can't take advantage of Airflow's built-in macros like task_instance. Also deviates from the behavior of other templated operators (e.g. BashOperator).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can because the templates are rendered recursively.

def render_template( # pylint: disable=too-many-return-statements
self, content: Any, context: Dict, jinja_env: Optional[jinja2.Environment] = None,
seen_oids: Optional[Set] = None
) -> Any:
"""
Render a templated string. The content can be a collection holding multiple templated strings and will
be templated recursively.
:param content: Content to template. Only strings can be templated (may be inside collection).
:type content: Any
:param context: Dict with values to apply on templated content
:type context: dict
:param jinja_env: Jinja environment. Can be provided to avoid re-creating Jinja environments during
recursion.
:type jinja_env: jinja2.Environment
:param seen_oids: template fields already rendered (to avoid RecursionError on circular dependencies)
:type seen_oids: set
:return: Templated content
"""
if not jinja_env:
jinja_env = self.get_template_env()
# Imported here to avoid ciruclar dependency
from airflow.models.xcom_arg import XComArg
if isinstance(content, str):
if any(content.endswith(ext) for ext in self.template_ext):
# Content contains a filepath
return jinja_env.get_template(content).render(**context)
else:
return jinja_env.from_string(content).render(**context)
elif isinstance(content, XComArg):
return content.resolve(context)
if isinstance(content, tuple):
if type(content) is not tuple: # pylint: disable=unidiomatic-typecheck
# Special case for named tuples
return content.__class__(
*(self.render_template(element, context, jinja_env) for element in content)
)
else:
return tuple(self.render_template(element, context, jinja_env) for element in content)
elif isinstance(content, list):
return [self.render_template(element, context, jinja_env) for element in content]
elif isinstance(content, dict):
return {key: self.render_template(value, context, jinja_env) for key, value in content.items()}
elif isinstance(content, set):
return {self.render_template(element, context, jinja_env) for element in content}
else:
if seen_oids is None:
seen_oids = set()
self._render_nested_template_fields(content, context, jinja_env, seen_oids)
return content

The problem only occurs when you use Jinja templates to generate new structures e.g. create new array elements.

@mik-laj
Copy link
Member

mik-laj commented Jun 30, 2020

Can you add unit test and some documentation?

@turbaszek turbaszek requested a review from mik-laj July 1, 2020 11:21
@turbaszek turbaszek merged commit cd3d9d9 into apache:master Jul 2, 2020
@turbaszek turbaszek deleted the read-json-in-gmp-operators branch July 2, 2020 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GoogleCampaignManagerInsertReportOperator Fails when using Templated Reports
4 participants