Skip to content

Conversation

@anaynayak
Copy link
Contributor

@anaynayak anaynayak commented Oct 16, 2021

closes: #18549

Todo:

  • Add __init__ params for CloudwatchTaskHandler which are passed to watchtower.CloudWatchLogHandler to control whether a log group/stream are to be created
  • Add this to airflow_local_settings.py with values being read from the config.

Based on the approach mentioned in the following comment


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@anaynayak anaynayak force-pushed the cloudwatch-log-group-stream branch from 3b5aa1f to b2880f1 Compare October 17, 2021 07:10
Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments you added are the default args and doesn't solve the issue you referenced. The issue you referenced is a permission issue. It was not able to create a log group because the user doesn't have permission to create it.

You should use the docs to set the creation of log group to False.

@anaynayak
Copy link
Contributor Author

thanks @ephraimbuddy for reviewing the PR. This is something that I had a question around as well (point 1 under open questions) .

I added those as arguments so that as a user we could override the config using the docs you linked. Kept the default value as true to retain existing behavior of creating log group implicitly.

Should I also add conf overridable values similar to what was done for google_key_path?

Something along the lines of

        create_cloudwatch_log_group = conf.get('logging', 'CREATE_CLOUDWATCH_LOG_GROUP', fallback=True)
        create_cloudwatch_log_stream = conf.get('logging', 'CREATE_CLOUDWATCH_LOG_STREAM', fallback=True)
        CLOUDWATCH_REMOTE_HANDLERS: Dict[str, Dict[str, str]] = {
            'task': {
                'class': 'airflow.providers.amazon.aws.log.cloudwatch_task_handler.CloudwatchTaskHandler',
                'formatter': 'airflow',
                'base_log_folder': str(os.path.expanduser(BASE_LOG_FOLDER)),
                'log_group_arn': urlparse(REMOTE_BASE_LOG_FOLDER).netloc,
                'filename_template': FILENAME_TEMPLATE,
                'create_log_group': create_cloudwatch_log_group,
                'create_log_stream': create_cloudwatch_log_stream,
            },
        }

@ephraimbuddy
Copy link
Contributor

cc: @mik-laj

@anaynayak
Copy link
Contributor Author

@ephraimbuddy / @mik-laj let me know if I should do the changes mentioned in #19022 (comment)

@anaynayak anaynayak force-pushed the cloudwatch-log-group-stream branch from b2880f1 to 6cdb3ea Compare October 24, 2021 08:19
@uranusjr
Copy link
Member

uranusjr commented Oct 25, 2021

Another way to do this would be to simply use query parameters like how we handle database URLs. The user can set REMOTE_BASE_LOG_FOLDER to (say)

cloudwatch://my.cloudwatch.server/?create_log_group=false&create_log_stream=false

Which is (fortunately) possible because log_group_arn only uses the netloc part. But I think using standalone configurations (like your snippet above) is more idiomatic to Airflow, similar to configs for Elastic etc.

@anaynayak
Copy link
Contributor Author

I'll add a commit with the airflow_local_settings.py change. I like the idea that @uranusjr suggested, let me know if I should do that instead.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@uranusjr uranusjr requested a review from ephraimbuddy October 27, 2021 14:00
DEFAULT_LOGGING_CONFIG['handlers'].update(S3_REMOTE_HANDLERS)
elif REMOTE_BASE_LOG_FOLDER.startswith('cloudwatch://'):
create_log_group = conf.getboolean('logging', 'CREATE_CLOUDWATCH_LOG_GROUP', fallback=True)
create_log_stream = conf.getboolean('logging', 'CREATE_CLOUDWATCH_LOG_STREAM', fallback=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow this setting to be configured. Setting stream creation to anything but True will catastrophically break cloudwatch logging.
In cloudwatch Log Groups are like folders and streams are like files. It is possible to pre-create the log group "folder" but the stream log "files" are named on the fly after the task instances themselves (just like the default Airflow local filesystem logging). There's no (reasonable) way you could pre-create those stream names and thus watchtower, if not allowed to create the streams itself, will fail to log any task logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the log group name is already configurable through the arn. And it should be harmless for watchtower to attempt to re-create it (it correctly handles the resource already existing exception). Your original issue was an authentication issue, not an idempotentcy issue.

The only reason I can think of that you'd ever want to restrict log group creation is if the aws role/user you are using does not, and can not, have that policy permission allowed. However the role still needs stream creation permissions (see above) so I don't see why one would ever restrict log group creation but not stream creation. It's a very rare case that I don't think is worth the extra complexity to handle IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the original issue, I assumed that the issue was around a restrictive role since the log group was already created via Terraform. Based on what you've said, The bits around log stream definitely make sense. I'll remove that as an option.

If the option to not create the log group isn't required, I can close the PR and we can create a new one when required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, great discussion :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this is the wrong place, but here feels appropriate - feel free to let me know otherwise.

I'd like to make the case for being able to disable creation of the log group (but not the log stream).

We recently noticed a lot of errors (multiple times a minute in some cases) causing our tasks to fail to start, along the lines of "Failed to execute task An error occurred (ThrottlingException) when calling the CreateLogGroup operation (reached max retries: 4): Rate exceeded." which led us to find the AWS service quota for "CreateLogGroup throttle limit in transactions per second" which is 5 per second by default, and we were frequently in breach of that (at peak times we run a lot of tasks!).

If my understanding is correct, every time a task runs, watchtower will make a CreateLogGroup request and pass on a ResourceAlreadyExistsException if it exists. If the log group never changes, this part shouldn't be necessary - though as @o-nikolas mentioned, it needs to be able to create a unique log stream for each task. FYI the default AWS quota for that is 50 per second (called "CreateLogStream throttle limit in transactions per second"), which allows for a lot more headroom.

It would be nice to have a way to prevent these thousands of requests to create a log group that already exists, or even better, perhaps it could attempt to create the log group on startup, then each task wouldn't need to make the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik-davis
Hey Nik, This is a good callout and it's something we ran into while developing MWAA (which leverages Cloudawatch for all log types and thus creates a LOT of log groups and log stream requests). The correct solution here is to actually not call create log group until you're sure it doesn't exist. Which involves first calling describe_log_groups (which has a much higher quota) and if the log group in question doesn't exist, then create it. This is what we do in MWAA and this is what the Watchtower package should be doing as well. And in fact, it is what they do here since v2.0.0. And I recently upgraded Airflow to use v2.0.1 here. So this problem should already be solved!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you for the detailed response. We're a bit behind latest and have Airflow v2.1.2 with Watchtower v1.0.6 installed, so sounds like we just need to upgrade. Thanks again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries :)
Let me know if you spot any other possible issues!

@anaynayak
Copy link
Contributor Author

Closing this one based on the conversation here #19022 (comment)

@anaynayak anaynayak closed this Oct 30, 2021
@anaynayak anaynayak deleted the cloudwatch-log-group-stream branch November 4, 2021 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CloudWatch CreateLogGroup error on already existing LogGroup

5 participants