-
Notifications
You must be signed in to change notification settings - Fork 16.4k
CloudwatchTaskHandler params to create log group/stream #19022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b5aa1f to
b2880f1
Compare
ephraimbuddy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments you added are the default args and doesn't solve the issue you referenced. The issue you referenced is a permission issue. It was not able to create a log group because the user doesn't have permission to create it.
You should use the docs to set the creation of log group to False.
|
thanks @ephraimbuddy for reviewing the PR. This is something that I had a question around as well (point 1 under open questions) . I added those as arguments so that as a user we could override the config using the docs you linked. Kept the default value as true to retain existing behavior of creating log group implicitly. Should I also add conf overridable values similar to what was done for google_key_path? Something along the lines of create_cloudwatch_log_group = conf.get('logging', 'CREATE_CLOUDWATCH_LOG_GROUP', fallback=True)
create_cloudwatch_log_stream = conf.get('logging', 'CREATE_CLOUDWATCH_LOG_STREAM', fallback=True)
CLOUDWATCH_REMOTE_HANDLERS: Dict[str, Dict[str, str]] = {
'task': {
'class': 'airflow.providers.amazon.aws.log.cloudwatch_task_handler.CloudwatchTaskHandler',
'formatter': 'airflow',
'base_log_folder': str(os.path.expanduser(BASE_LOG_FOLDER)),
'log_group_arn': urlparse(REMOTE_BASE_LOG_FOLDER).netloc,
'filename_template': FILENAME_TEMPLATE,
'create_log_group': create_cloudwatch_log_group,
'create_log_stream': create_cloudwatch_log_stream,
},
} |
|
cc: @mik-laj |
|
@ephraimbuddy / @mik-laj let me know if I should do the changes mentioned in #19022 (comment) |
b2880f1 to
6cdb3ea
Compare
|
Another way to do this would be to simply use query parameters like how we handle database URLs. The user can set Which is (fortunately) possible because |
|
I'll add a commit with the airflow_local_settings.py change. I like the idea that @uranusjr suggested, let me know if I should do that instead. |
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
| DEFAULT_LOGGING_CONFIG['handlers'].update(S3_REMOTE_HANDLERS) | ||
| elif REMOTE_BASE_LOG_FOLDER.startswith('cloudwatch://'): | ||
| create_log_group = conf.getboolean('logging', 'CREATE_CLOUDWATCH_LOG_GROUP', fallback=True) | ||
| create_log_stream = conf.getboolean('logging', 'CREATE_CLOUDWATCH_LOG_STREAM', fallback=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should allow this setting to be configured. Setting stream creation to anything but True will catastrophically break cloudwatch logging.
In cloudwatch Log Groups are like folders and streams are like files. It is possible to pre-create the log group "folder" but the stream log "files" are named on the fly after the task instances themselves (just like the default Airflow local filesystem logging). There's no (reasonable) way you could pre-create those stream names and thus watchtower, if not allowed to create the streams itself, will fail to log any task logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the log group name is already configurable through the arn. And it should be harmless for watchtower to attempt to re-create it (it correctly handles the resource already existing exception). Your original issue was an authentication issue, not an idempotentcy issue.
The only reason I can think of that you'd ever want to restrict log group creation is if the aws role/user you are using does not, and can not, have that policy permission allowed. However the role still needs stream creation permissions (see above) so I don't see why one would ever restrict log group creation but not stream creation. It's a very rare case that I don't think is worth the extra complexity to handle IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the original issue, I assumed that the issue was around a restrictive role since the log group was already created via Terraform. Based on what you've said, The bits around log stream definitely make sense. I'll remove that as an option.
If the option to not create the log group isn't required, I can close the PR and we can create a new one when required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, great discussion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this is the wrong place, but here feels appropriate - feel free to let me know otherwise.
I'd like to make the case for being able to disable creation of the log group (but not the log stream).
We recently noticed a lot of errors (multiple times a minute in some cases) causing our tasks to fail to start, along the lines of "Failed to execute task An error occurred (ThrottlingException) when calling the CreateLogGroup operation (reached max retries: 4): Rate exceeded." which led us to find the AWS service quota for "CreateLogGroup throttle limit in transactions per second" which is 5 per second by default, and we were frequently in breach of that (at peak times we run a lot of tasks!).
If my understanding is correct, every time a task runs, watchtower will make a CreateLogGroup request and pass on a ResourceAlreadyExistsException if it exists. If the log group never changes, this part shouldn't be necessary - though as @o-nikolas mentioned, it needs to be able to create a unique log stream for each task. FYI the default AWS quota for that is 50 per second (called "CreateLogStream throttle limit in transactions per second"), which allows for a lot more headroom.
It would be nice to have a way to prevent these thousands of requests to create a log group that already exists, or even better, perhaps it could attempt to create the log group on startup, then each task wouldn't need to make the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik-davis
Hey Nik, This is a good callout and it's something we ran into while developing MWAA (which leverages Cloudawatch for all log types and thus creates a LOT of log groups and log stream requests). The correct solution here is to actually not call create log group until you're sure it doesn't exist. Which involves first calling describe_log_groups (which has a much higher quota) and if the log group in question doesn't exist, then create it. This is what we do in MWAA and this is what the Watchtower package should be doing as well. And in fact, it is what they do here since v2.0.0. And I recently upgraded Airflow to use v2.0.1 here. So this problem should already be solved!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thank you for the detailed response. We're a bit behind latest and have Airflow v2.1.2 with Watchtower v1.0.6 installed, so sounds like we just need to upgrade. Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries :)
Let me know if you spot any other possible issues!
|
Closing this one based on the conversation here #19022 (comment) |
closes: #18549
Todo:
__init__params for CloudwatchTaskHandler which are passed towatchtower.CloudWatchLogHandlerto control whether a log group/stream are to be createdBased on the approach mentioned in the following comment
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.