Skip to content

[AIRFLOW-3153] send dag last_run to statsd#3997

Closed
feng-tao wants to merge 1 commit intoapache:masterfrom
feng-tao:last_run_statsd
Closed

[AIRFLOW-3153] send dag last_run to statsd#3997
feng-tao wants to merge 1 commit intoapache:masterfrom
feng-tao:last_run_statsd

Conversation

@feng-tao
Copy link
Member

@feng-tao feng-tao commented Oct 3, 2018

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Lyft has been running with this pr for over an year and numerous production issues have been detected by the stats(e.g setting pageduty on the last run time if it exceeds for certain threshold).

This PR adds statds logging for the DAG generation in Airflow, recording

the time spent processing each file; and
the last time it was processed (both as a unix timestamp and as an interval in seconds).

Stats.gauge('last_runtime.example_bash_operator', 1.622376)
Stats.gauge('last_run.unixtime.example_bash_operator', 1512670855)
Stats.gauge('last_run.seconds_ago.example_bash_operator', 0.641343)
Stats.gauge('last_runtime.example_bash_operator', 1.629494)
Stats.gauge('last_run.unixtime.example_bash_operator', 1512670886)
Stats.gauge('last_run.seconds_ago.example_bash_operator', 0.526443)

Credit to original PR owner(@betodealmeida) at lyft

And fix some flake8 error

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Add stats, no need for test.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@feng-tao
Copy link
Member Author

feng-tao commented Oct 4, 2018

PTAL @kaxil

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats are good, so I'm not going to complain about more stats, but doesn't Airflow's built in SLA do some of this already?

last_run = processor_manager.get_last_finish_time(file_path)

file_name = file_path[len(dags_folder) + 1:]
dag_name = os.path.splitext(file_name)[0].replace(os.sep, '.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dag file could contain multiple DAGs - is there a reason to not use dag_id here?

unixtime = last_run.strftime("%s")
seconds_ago = (timezone.utcnow() - last_run).total_seconds()
Stats.gauge('last_run.unixtime.{}'.format(dag_name), unixtime)
Stats.gauge('last_run.seconds_ago.{}'.format(dag_name), seconds_ago)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does this stat get updated? (I'm not sure from reviewing the PR where we are in the scheduler?) If we were using Promethus I would be tempted to say that just last_run.unixtime stat would be the only one we should have, but I honestly' don't remember how Statsd works anymore.

@Fokko
Copy link
Contributor

Fokko commented Oct 12, 2018

Tests seem to be failing :'(

@ashb
Copy link
Member

ashb commented Oct 12, 2018

pkg_resources.ContextualVersionConflict: (Flask-Login 0.2.11 (/app/.tox/py35-backend_mysql-env_docker/lib/python3.5/site-packages), Requirement.parse('Flask-Login<0.5,>=0.3'), {'flask-appbuilder'})

Rebasing on to latest master should fix that.

@feng-tao
Copy link
Member Author

@ashb , @Fokko , sorry for the pr slip through, will update

@kaxil
Copy link
Member

kaxil commented Oct 30, 2018

Can we please add this to docs as well, listing all the stats and what do they mean?

@stale
Copy link

stale bot commented Dec 14, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 14, 2018
@stale stale bot closed this Dec 22, 2018
@feng-tao feng-tao reopened this Jan 3, 2019
@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 3, 2019
@feng-tao feng-tao closed this Feb 21, 2019
@feng-tao feng-tao deleted the last_run_statsd branch February 21, 2019 19:58
@andrewhharmon
Copy link
Contributor

curious why this was implemented as a gauge and not a timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants