Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dag_processing.last_duration metric has random holes #17513

Closed
suhanovv opened this issue Aug 9, 2021 · 1 comment
Closed

dag_processing.last_duration metric has random holes #17513

suhanovv opened this issue Aug 9, 2021 · 1 comment
Labels
affected_version:2.1 Issues Reported for 2.1 area:metrics kind:bug This is a clearly a bug

Comments

@suhanovv
Copy link
Contributor

suhanovv commented Aug 9, 2021

Apache Airflow version: 2.1.2

Apache Airflow Provider versions (please include all providers that are relevant to your bug): 2.0

apache-airflow-providers-apache-hive==2.0.0
apache-airflow-providers-celery==2.0.0
apache-airflow-providers-cncf-kubernetes==1.2.0
apache-airflow-providers-docker==2.0.0
apache-airflow-providers-elasticsearch==1.0.4
apache-airflow-providers-ftp==2.0.0
apache-airflow-providers-imap==2.0.0
apache-airflow-providers-jdbc==2.0.0
apache-airflow-providers-microsoft-mssql==2.0.0
apache-airflow-providers-mysql==2.0.0
apache-airflow-providers-oracle==2.0.0
apache-airflow-providers-postgres==2.0.0
apache-airflow-providers-sqlite==2.0.0
apache-airflow-providers-ssh==2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.17

Environment:

What happened:

We are using an statsd_exporter to store statsd metrics in prometheus. And we came across strange behavior, the metric dag_processing.last_duration. <dag_file> for different dags is drawn with holes at random intervals.

image

image

image

What you expected to happen:

Metrics should be sent with the frequency specified in the config in the AIRFLOV__SHEDULER__PRINT_STATS_INTERVAL parameter and the default value is 30, and this happens in the _log_file_processing_stats method, the problem is that the initial time is taken using the get_start_time function which looks only at the probability of active processes that some of the dags will have time to be processed in 30 seconds and removed from self._processors [file_path] and thus the metrics for them will not be sent to the statsd. While for outputting to the log, lasttime is used in which information on the last processing is stored and from which it would be nice to send metrics to the statistics.

@eladkal
Copy link
Contributor

eladkal commented Sep 21, 2021

fixed in #17513

@eladkal eladkal closed this as completed Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.1 Issues Reported for 2.1 area:metrics kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

2 participants