Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable specifying dictionary paths in template_fields_renderers #17321

Merged
merged 3 commits into from
Aug 2, 2021

Conversation

nathadfield
Copy link
Collaborator

@nathadfield nathadfield commented Jul 29, 2021

Closes: #17032
Related to: #17239
*Recreated from #17252

BigQueryInsertJobOperator requires the specification of a configuration dict but, at present, this results in poor SQL rendering; especially if the queries are big or complex.

Current Example

127119136-3d6f1f57-c941-4b06-b3f3-c6094b31f821

This PR attempts to implement the capability for specific elements of a dictionary provided to an operator to be rendered individually with the appropriate renderer.

For example, when the following configuration is submitted to BigQueryInsertJobOperator then, in addition to the JSON rendering of configuration the provided SQL should be separately rendered.

sql = '''
INSERT INTO `target_project.target_dataset.target_table`
(col1, col2, col3, col4)
SELECT col1, col2, col3, col4
FROM `source_project.source_dataset.source_table`
'''

test = BigQueryInsertJobOperator(
    task_id='test',
    gcp_conn_id='my_gcp_connection',
    configuration={"query": {"query": sql,
                             "destinationTable": {"projectId": 'my_project',
                                                  "datasetId": 'my_dataset',
                                                  "tableId": 'my_table'},
                             "writeDisposition": 'WRITE_TRUNCATE',
                             "useLegacySql": False}
                   }
)

To achieve it, this PR enables any operator to receive a dot-separated path from template_field_renderers along with the renderer type. In this case we want BigQueryInsertJobOperator to extract the value from configuration dict where the key path is query.query. This is fully represented as follows:

template_fields_renderers = {"configuration": "json", "configuration.query.query": "sql"}

When building the rendered page, we check if the content is a dict - e.g. configuration - then, if so, return a generator object that yields every possible path of keys (get_key_paths) in the form key1.key2.key3...keyX.

Using this we can then check to see if any of these paths match with what we have in template_field_renderers and, if so, what the renderer is. If it is a valid renderer then we can use that path to retrieve the value from the configuration (get_value_from_path).

if isinstance(content, dict):
    for dict_keys in get_key_paths(content):
        template_path = '.'.join((template_field, dict_keys))
        renderer = task.template_fields_renderers.get(template_path, template_path)
        if renderer in renderers:
            content_value = get_value_from_path(dict_keys, content)
            html_dict[template_path] = renderers[renderer](content_value)

The result of this is a rendered page which looks like the following:

Screenshot 2021-07-26 at 14 51 48

In addition, when decorated tasks or Python operators are used, op_kwargs is parsed and each individual element is extracted and rendered. Such that:

127146694-eb6334b6-be41-4099-9f14-4b1a5093db7b

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@nathadfield
Copy link
Collaborator Author

The failing Sqlite Py3.6 test seems to be unrelated to the changes.

@mehmax
Copy link
Contributor

mehmax commented Jul 30, 2021

Hi @nathadfield, checking and rendering op_kwargs individually would totally solve the Issue I faced.
Looks good to me now!

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ IT

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jul 30, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented Jul 30, 2021

Can you please rebase @nathadfield ? We have fixed some failures in main , and it would be great to re-run that one through all tests?

…les information contained in dictionaries to be unpacked and rendered appropriately.
@nathadfield
Copy link
Collaborator Author

@potiuk Rebased and all tests have ran successfully!

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

Yeah. Seems that thanks to some fixes by @ephraimbuddy we are gone a bit closer to the "green one" on the PRS.

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

One more small request @nathadfield - Could you please also explain this new feature in https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html?highlight=custom#templating - there template_rendering is mentioned, but it would be nice to add an explanation and example as well.

@nathadfield
Copy link
Collaborator Author

@potiuk Ah, yeah! I sure can!

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

at the same time - the list of lexers got longer, so would be nice to refresh it :)

@nathadfield nathadfield requested a review from kaxil as a code owner August 2, 2021 11:38
@nathadfield
Copy link
Collaborator Author

@potiuk Done! Happy with the explanation?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

Some errors with doc building though :).

./breeze build-docs -- --package-filter apache-airflow

Should allow you to build it quickly locally

@nathadfield
Copy link
Collaborator Author

Ah! My bad. Should be ready once everything runs through again.

@nathadfield
Copy link
Collaborator Author

@potiuk Well, that's annoying. Unrelated test failures again.

@potiuk potiuk merged commit 67cbb0f into apache:main Aug 2, 2021
@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

No worries. we can merge it anyway :)

@nathadfield
Copy link
Collaborator Author

Awesome! Thanks very much and for your help!

@potiuk
Copy link
Member

potiuk commented Aug 2, 2021

Good job on that :)

@nathadfield nathadfield deleted the bq_insertjob_rendering branch August 3, 2021 13:49
@nathadfield
Copy link
Collaborator Author

@potiuk Might we expect the core Airflow aspect of this to appear in 2.2? I guess there would also need to be a Google provider version release afterwards too.

@potiuk
Copy link
Member

potiuk commented Aug 19, 2021

@potiuk Might we expect the core Airflow aspect of this to appear in 2.2? I guess there would also need to be a Google provider version release afterwards too.

Yeah. We have not yet cut 2.2 branch, so everything merged in main is "in".

BTW. That made me realise that we should make sure that that if we specify the new "nested" renderers in the in a provider, theold airflow will still work with them (and not break the UI or anything). Maybe what you should do really - is to add those new field renderers now - so that we can release ithem without airflow yet supporting them (and we make sure that they work without problems in current airflow) ?

Could you please add such a change to e provider and check it maybe ? (just reverting your change in main should show how old airlfow will behave in this case.

@nathadfield
Copy link
Collaborator Author

@potiuk Good point! Let me take a look at that.

@nathadfield
Copy link
Collaborator Author

nathadfield commented Aug 20, 2021

@potiuk There doesn't appear to be any UI issues if you specify a key path in a provider but the version of core Airflow doesn't have the rendering capability yet.

The existing code in views.py looks up values from template_field_renderers against the operator class but these key paths will never match with anything.

@potiuk
Copy link
Member

potiuk commented Aug 20, 2021

Great ! thanks @nathadfield - would you make some pr to add some example operators with useful paths so that we can release it in the new batch of providers (I plan to cut them shortly - maybe early next week), and then they will be ready when we release 2.2 :)

@nathadfield
Copy link
Collaborator Author

@potiuk I'm more than happy to I'm not sure what you had in mind exactly or where to put it. At the moment, the only operator that has a path which can be rendered is BigQueryInsertJobOperator.

@potiuk
Copy link
Member

potiuk commented Aug 23, 2021

@potiuk I'm more than happy to I'm not sure what you had in mind exactly or where to put it. At the moment, the only operator that has a path which can be rendered is BigQueryInsertJobOperator.

That's enough :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:webserver Webserver related Issues okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved SQL rendering within BigQueryInsertJobOperator
3 participants