-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-6047] Simplify the logging configuration template #6644
Conversation
f25509b
to
ead59b4
Compare
This is important to me because I want to add new integration that will require a lot of more configuration options |
Codecov Report
@@ Coverage Diff @@
## master #6644 +/- ##
==========================================
- Coverage 83.82% 83.49% -0.33%
==========================================
Files 672 672
Lines 37594 37600 +6
==========================================
- Hits 31512 31395 -117
- Misses 6082 6205 +123
Continue to review full report at Codecov.
|
Hi @mik-laj , sure I will take a look. Meanwhile curious what new integration you're planning? If you already have clear idea, may be useful for this review as well. Cheers. |
I am working on direct integration with Google Stackdriver Logging. WIP version is available on my company fork. In the end, I did not provide all possible options in the Airflow configuration options, but more advanced users have access through the |
ead59b4
to
fee1d0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach - getting the configurations where they are actually used makes perfect sense.
However maybe we have an opportunity to simplify the whole logging configuration here @ashb @XD-DENG @mik-laj ? (It might be different PR but I wanted to discuss it here since it touches it).
I think airflow_local_setttings used to have different purpose. You were supposed to be able to define your own configuration here and treat is as "template".
For example in some places in the code I found:
To define a pod mutation hook, add a ``airflow_local_settings`` module
to your PYTHONPATH that defines this ``pod_mutation_hook`` function.
It receives a ``Pod`` object and can alter it where needed.
The SETTINGS_FILE_POLICY_WITH_DUNDER_ALL and SETTINGS_FILE_POLICY and SETTINGS_FILE_POD_MUTATION_HOOK which are not documented other than some not-very-good code comments.
But the module documentation says "Airflow logging settings" (and it seems to be true, looking at the file).
The settings are loaded via "import_local_settings" method and theorethically you can modify it and configure yourself. However pretty much all the code in the airflow_local_settinngs is already configurable via conf. So it's a weird mixture of "conf-driven" settings and flexible python code that might be changed in your own copy of local_settings. Most of the VARIABLES in airflow_local_setttings do not need to be overrideable as they are already configurable..
I think much better solution might be this:
- turn "airflow local settings" in "logging_settins" and have it simply imported as everything else - statically rather than dynamically.
- have another "local_settings" importable dynamically from a specified folder.
- have a 'closed' list of settings that are overridable in local configuration file. But then this file should not contain the whole configuration, just the overridden VARIABLES.
WDYT?
I would dream that the whole configuration was expressed as Python code.
And the airflow.settings file would contain default value definitions that could be overwritten in settings.py.
This would allow much greater freedom and create more complex objects, e.g. dictionaries. This is how Django is configured. But all in all, if we delete the import from the first code and transfer it to the core, we will have your idea. so i agree. We can move this configuration file to the core and introduce another way to overwrite the settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the more future-oriented discussion aside, please find my 2 cents and let me if it makes sense to you?
REMOTE_LOGGING = conf.getboolean('core', 'remote_logging') | ||
|
||
if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'): | ||
S3_REMOTE_HANDLERS = { | ||
'task': { | ||
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler', | ||
'formatter': 'airflow', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatter
, base_log_folder
, and filename_template
are shared and are always the same in different handlers.
Wonder if it's a good idea to have a task template
(inside formatter
, base_log_folder
, and filename_template
are defined), and create your *_REMOTE_HANDLERS
using this template.
This can avoid duplicated lines, and make potential future updating on formatter
/base_log_folder
/filename_template
easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not common to all handlers, so it will be problematic. My Stackdriver handler contains the following configurations:
https://github.com/PolideaInternal/airflow/blob/e2511a74bfdd3824845ae037e4a50de127c223d6/airflow/config_templates/airflow_local_settings.py
gcp_conn_id = conf.get('core', 'REMOTE_LOG_CONN_ID', fallback=None)
# stackdriver:///airflow-tasks => airflow-tasks
REMOTE_BASE_LOG_FOLDER = urlparse(REMOTE_BASE_LOG_FOLDER).path[1:]
STACKDRIVER_REMOTE_HANDLERS = {
'task': {
'class': 'airflow.utils.log.stackdriver_task_handler.StackdriverTaskHandler',
'formatter': 'airflow',
'name': REMOTE_BASE_LOG_FOLDER,
'gcp_conn_id': gcp_conn_id
}
}
DEFAULT_LOGGING_CONFIG['handlers'].update(STACKDRIVER_REMOTE_HANDLERS)
I'm also afraid that pulling out only part of the configuration to a separate variable will make it difficult to understand. This is not a classic code that must follow DRY rules to avoid problems. This is a configuration file where each code has a different purpose. They look similar, but each has its own separate role. First of all, this file should be easy to understand and adapt to the specific case of our users .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I think it's ok.
DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['wasb']) | ||
elif REMOTE_LOGGING and ELASTICSEARCH_HOST: | ||
DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['elasticsearch']) | ||
DEFAULT_LOGGING_CONFIG['handlers'].update(ELASTIC_REMOTE_HANDLERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worthwhile to add a else:
at the end? Inside it we can handle cases like user enters something wrong when they configure (e.g. user makes a typo in REMOTE_BASE_LOG_FOLDER and it starts with something like upper case "S3:" or "hs://")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added else statement at the end. Is it looks good for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quite minor thing and no difference in terms of logic: but why not we write in the way below
if REMOTE_LOGGING:
if REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
...
elif REMOTE_BASE_LOG_FOLDER.startswith('gs://'):
...
...
else:
raise AirflowException("...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to increase the level of indentation, but It will be clearer to understand, so I make a change.
I like the approach of the changes. Would it make sense to add logging messages (to root logger) about the configuration of the logger? This way one could "verify" that their settings are being applied. |
@serkef Airflow runs a lot of processes. Each process initiates the logger from the beginning, so displaying the full configuration would be problematic. |
fee1d0f
to
241a408
Compare
eda5704
to
6450e9e
Compare
@potiuk Do you have any other comments related to scope of this PR? |
No more comments. It looks good. Just wanted to see if we can think about some future changes. |
(cherry picked from commit e82008d)
(cherry picked from commit e82008d)
(cherry picked from commit e82008d)
There are several problems in this file:
I also created a section only regarding the remote logging configuration.
Make sure you have checked all steps below.
Jira
Description
Tests
Commits
Documentation