-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-5274] dag loading duration metric name too long #5890
Conversation
cc @milton0825 |
airflow/models/dagbag.py
Outdated
elif file_stat.dag_num > 1: | ||
# if the file has multiple dags, use dag.loading-duration.file for metric | ||
Stats.timing('dag.loading-duration.{}'. | ||
format(file_stat.file), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to replace slashes, or some other special chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a note to UPDATING.md about the changed metric name
airflow/models/dagbag.py
Outdated
Stats.timing('dag.loading-duration.{}'. | ||
format(dag_names), | ||
format(dag_ids[0]), | ||
file_stat.duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for consistency we should always use the filename - otherwise any monitoring dashboards of load time will be "mixed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, will do.
@kaxil @ashb @milton0825 PR updated. |
airflow/models/dagbag.py
Outdated
format(dag_names), | ||
file_stat.duration) | ||
Stats.timing('dag.loading-duration.{}'. | ||
format(file_stat.file), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is file_stat.file
just the file name or it is the file path that contains /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I double check and the file name is /subdag/dagname.py
something. Update the pr.
format(dag_names), | ||
file_stat.duration) | ||
# file_stat.file similar format: /subdir/dag_name.py | ||
filename = file_stat.file.split('/')[-1].replace('.py', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible to have the same file name in different folders. please can we join the split, instead of just taking the filename? eg.
filename = '.'.join(file_stat.file.split('/')).replace('.py', '')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a bad practice to have the same file name for different DAGs. If you think this would be an issue on your side, feel free to raise a pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides if we join all the subdir, it will make those stats unreadable if the dir for the dag file is very deep. Personally I am not favor of this. Not sure how other committer thinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was imagining a file structure like the for example. it doesn't seem too unreasonable a structure?
dag_dir
+->dag_module_1
| +->dag_definition.py
| |->callbacks.py
| |->functions.py
| |->config.json
+->dag_module_2
+->dag_definition.py
|->callbacks.py
|->functions.py
|->config.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but yea, we dont use that dir structure, so im happy with the merge :)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
(cherry picked from commit 45176c8)
Make sure you have checked all steps below.
Jira
Description
Make dag loading duration metric name not too long when the dag file has multiple dags.
Tests
Commits
Documentation
Code Quality
flake8