Skip to content

Conversation

@turbaszek
Copy link
Member

@turbaszek turbaszek commented Nov 29, 2020

This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to either display a tabular
representation of data or render it as valid json or yaml payload.

Closes: #12699

Instead of using tabulate to render commands output we use rich. Due to this change the --output argument will no longer accept formats of tabulate tables. Instead it accepts:

  • table - will render the output in predefined table
  • json - will render the output as a json
  • yaml - will render the output as yaml

By doing this we increased consistency and gave users possibility to manipulate the output programmatically (when using json or yaml).

Affected commands:

  • airflow dags list
  • airflow dags report
  • airflow dags list-runs
  • airflow dags list-jobs
  • airflow connections list
  • airflow connections get
  • airflow pools list
  • airflow pools get
  • airflow pools set
  • airflow pools delete
  • airflow pools export
  • airflow role list
  • airflow providers list
  • airflow providers get
  • airflow tasks states-for-dag-run
  • airflow users list
  • airflow variables list

Example:

Table:

root@e794bcc2d698:/opt/airflow# airflow tasks states-for-dag-run tasks_are_awesome 2020-11-13T00:00:00+00:00
dag_id            | execution_date            | task_id | state   | start_date                       | end_date
==================+===========================+=========+=========+==================================+=================================
tasks_are_awesome | 2020-11-13T00:00:00+00:00 | numbers | success | 2020-11-29T14:53:46.811030+00:00 | 2020-11-29T14:53:46.974545+00:00
tasks_are_awesome | 2020-11-13T00:00:00+00:00 | show__2 | success | 2020-11-29T14:53:56.926441+00:00 | 2020-11-29T14:53:57.118781+00:00
tasks_are_awesome | 2020-11-13T00:00:00+00:00 | show    | success | 2020-11-29T14:53:56.915802+00:00 | 2020-11-29T14:53:57.125230+00:00
tasks_are_awesome | 2020-11-13T00:00:00+00:00 | show__1 | success | 2020-11-29T14:53:56.922131+00:00 | 2020-11-29T14:53:57.129091+00:00
tasks_are_awesome | 2020-11-13T00:00:00+00:00 | show__3 | success | 2020-11-29T14:53:56.931243+00:00 | 2020-11-29T14:53:57.126306+00:00

JSON:

root@e794bcc2d698:/opt/airflow# airflow tasks states-for-dag-run tasks_are_awesome 2020-11-13T00:00:00+00:00 --outpu json
[{"dag_id": "tasks_are_awesome", "execution_date": "2020-11-13T00:00:00+00:00", "task_id": "numbers", "state": "success", "start_date": "2020-11-29T14:53:46.811030+00:00", "end_date": "2020-11-29T14:53:46.974545+00:00"}, {"dag_id": "tasks_are_awesome", "execution_date": "2020-11-13T00:00:00+00:00", "task_id": "show__2", "state": "success", "start_date": "2020-11-29T14:53:56.926441+00:00", "end_date": "2020-11-29T14:53:57.118781+00:00"}, {"dag_id": "tasks_are_awesome", "execution_date": "2020-11-13T00:00:00+00:00", "task_id": "show", "state": "success", "start_date": "2020-11-29T14:53:56.915802+00:00", "end_date": "2020-11-29T14:53:57.125230+00:00"}, {"dag_id": "tasks_are_awesome", "execution_date": "2020-11-13T00:00:00+00:00", "task_id": "show__1", "state": "success", "start_date": "2020-11-29T14:53:56.922131+00:00", "end_date": "2020-11-29T14:53:57.129091+00:00"}, {"dag_id": "tasks_are_awesome", "execution_date": "2020-11-13T00:00:00+00:00", "task_id": "show__3", "state": "success", "start_date": "2020-11-29T14:53:56.931243+00:00", "end_date": "2020-11-29T14:53:57.126306+00:00"}]

YAML:

root@e794bcc2d698:/opt/airflow# airflow tasks states-for-dag-run tasks_are_awesome 2020-11-13T00:00:00+00:00 --output yaml
- dag_id: tasks_are_awesome
  end_date: '2020-11-29T14:53:46.974545+00:00'
  execution_date: '2020-11-13T00:00:00+00:00'
  start_date: '2020-11-29T14:53:46.811030+00:00'
  state: success
  task_id: numbers
- dag_id: tasks_are_awesome
  end_date: '2020-11-29T14:53:57.118781+00:00'
  execution_date: '2020-11-13T00:00:00+00:00'
  start_date: '2020-11-29T14:53:56.926441+00:00'
  state: success
  task_id: show__2
- dag_id: tasks_are_awesome
  end_date: '2020-11-29T14:53:57.125230+00:00'
  execution_date: '2020-11-13T00:00:00+00:00'
  start_date: '2020-11-29T14:53:56.915802+00:00'
  state: success
  task_id: show
- dag_id: tasks_are_awesome
  end_date: '2020-11-29T14:53:57.129091+00:00'
  execution_date: '2020-11-13T00:00:00+00:00'
  start_date: '2020-11-29T14:53:56.922131+00:00'
  state: success
  task_id: show__1
- dag_id: tasks_are_awesome
  end_date: '2020-11-29T14:53:57.126306+00:00'
  execution_date: '2020-11-13T00:00:00+00:00'
  start_date: '2020-11-29T14:53:56.931243+00:00'
  state: success
  task_id: show__3

Screenshot 2020-11-29 at 16 29 12

Using jq

root@e794bcc2d698:/opt/airflow# airflow tasks states-for-dag-run tasks_are_awesome 2020-11-13T00:00:00+00:00 --output json | jq ".[] | .task_id"
"numbers"
"show__2"
"show"
"show__1"
"show__3"

Using yq

root@e794bcc2d698:/opt/airflow# airflow tasks states-for-dag-run tasks_are_awesome 2020-11-13T00:00:00+00:00 --output yaml | yq ".[] | {sd: .start_date, ed: .end_date}"
{
  "sd": "2020-11-29T14:53:46.811030+00:00",
  "ed": "2020-11-29T14:53:46.974545+00:00"
}
{
  "sd": "2020-11-29T14:53:56.926441+00:00",
  "ed": "2020-11-29T14:53:57.118781+00:00"
}
{
  "sd": "2020-11-29T14:53:56.915802+00:00",
  "ed": "2020-11-29T14:53:57.125230+00:00"
}
{
  "sd": "2020-11-29T14:53:56.922131+00:00",
  "ed": "2020-11-29T14:53:57.129091+00:00"
}
{
  "sd": "2020-11-29T14:53:56.931243+00:00",
  "ed": "2020-11-29T14:53:57.126306+00:00"
}

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@turbaszek
Copy link
Member Author

turbaszek commented Nov 29, 2020

The only thing that I'm wondering about is suppressing / removing log records when using json/yaml. Because otherwise the output is not valid. For example:

root@e794bcc2d698:/opt/airflow# airflow dags list --output yaml | yq "."
/opt/airflow/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:26 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1Volume`.
/opt/airflow/airflow/providers/cncf/kubernetes/backcompat/backwards_compat_converters.py:27 DeprecationWarning: This module is deprecated. Please use `kubernetes.client.models.V1VolumeMount`.
yq: Error running jq: ParserError: expected '<document start>', but found '{'
  in "<stdin>", line 1, column 27.
[
  "2020-11-29T15:41:59",
  435
]

or

^[[Aroot@e794bcc2d698:/opt/airflow# airflow providers hooks --output yaml | yq ".[]"
/usr/local/lib/python3.6/site-packages/snowflake/connector/options.py:39 UserWarning: You have an incompatible version of 'pyarrow' installed, please install a version that adheres to: 'pyarrow<0.18.0,>=0.17.0; extra == "pandas"'
yq: Error running jq: ParserError: while parsing a block mapping
  in "<stdin>", line 1, column 1
expected <block end>, but found '{'
  in "<stdin>", line 1, column 27.

It can also appear inline with the output:

Config info
executor             | LocalExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn     | postgresql+psycopg2://postgres:airflow@postgres/airflow
dags_folder          | /files/dags
plugins_folder       | /root/airflow/plugins
base_log_folder      | /root/airflow/logs

[2020-11-29 15:46:57,661] {providers_manager.py:207} WARNING - Exception when importing 'airflow.providers.google.cloud.hooks.compute_ssh.ComputeEngineSSHHook' from 'apache-airflow-providers-google' package: No module named 'google.cloud.oslogin_v1'
/usr/local/lib/python3.6/site-packages/snowflake/connector/options.py:39 UserWarning: You have an incompatible version of 'pyarrow' installed, please install a version that adheres to: 'pyarrow<0.18.0,>=0.17.0; extra == "pandas"'
Providers info
apache-airflow-providers-amazon           | 1.0.0b2
apache-airflow-providers-apache-cassandra | 1.0.0b2
apache-airflow-providers-apache-druid     | 1.0.0b2

I've never seen this in any other application, so I lean toward suppressing warnings and logs.

@XD-DENG
Copy link
Member

XD-DENG commented Nov 29, 2020

The only thing that I'm wondering about is suppressing / removing log records when using json/yaml. Because otherwise the output is not valid. For example:

... ...
I've never seen this in any other application, so I lean toward suppressing warnings and logs.

The same preference for me.

For table output I think we should suppress log as well. not only for json/yaml.

Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

@turbaszek turbaszek force-pushed the refactor-list-commands branch 2 times, most recently from 780d3da to 02a5ff2 Compare November 30, 2020 08:13
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@turbaszek turbaszek force-pushed the refactor-list-commands branch from 02a5ff2 to 48416ed Compare November 30, 2020 14:51
@turbaszek turbaszek requested a review from XD-DENG November 30, 2020 18:10
@turbaszek
Copy link
Member Author

@potiuk @mik-laj @kaxil @ashb thoughts?

UPDATING.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airflow pools import and airflow providers hooks should be in this list as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in caf81ff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek after your latest change (some commands don't print table anymore), maybe we need to further update this doc as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just print "Pool deleted" instead of printing info about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, most of clis I know would show "Pool created" instead of showing info. We can consider a verbose flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are exporting pools to file, I think we shouldn't be printing them after that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I agree with the three points you raised (including this one).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add try-except similar to what you added in pool_delete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to address this in followup PR. Probably create new decorator that will handle known errors and show only message instead of lines of traceback to make it more user friendly.

@turbaszek turbaszek force-pushed the refactor-list-commands branch from 922abe5 to 87533da Compare December 1, 2020 09:49
@github-actions
Copy link

github-actions bot commented Dec 1, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Dec 1, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.

Closes: apache#12699
@turbaszek turbaszek force-pushed the refactor-list-commands branch from 87533da to 0006363 Compare December 1, 2020 11:38
@turbaszek turbaszek requested a review from XD-DENG December 1, 2020 11:38
Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe other folks will have further suggestions. From my point of view, all my questions/concerns have been addressed well.

Nice feature indeed. Thanks @turbaszek

@github-actions
Copy link

github-actions bot commented Dec 1, 2020

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 1, 2020
@turbaszek turbaszek force-pushed the refactor-list-commands branch from 0006363 to d12f31a Compare December 1, 2020 14:19
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
@turbaszek turbaszek requested a review from kaxil December 1, 2020 23:26
@turbaszek turbaszek merged commit cba8d62 into apache:master Dec 2, 2020
@turbaszek turbaszek deleted the refactor-list-commands branch December 2, 2020 09:20
turbaszek added a commit to PolideaInternal/airflow that referenced this pull request Dec 2, 2020
This PR is a followup after apache#12375 and apache#12704 it improves handling
of some errors in cli commands to avoid show users to much traceback
and uses SystemExit consitently.
turbaszek added a commit that referenced this pull request Dec 4, 2020
This PR is a followup after #12375 and #12704 it improves handling
of some errors in cli commands to avoid show users to much traceback
and uses SystemExit consistently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add yaml/json output format support for many CLI commands

4 participants