WIP: ECS Fluent Bit Daemon Collector #499

PettitWesley · 2022-12-15T00:02:51Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

PettitWesley · 2022-12-15T00:16:33Z

ecs/daemon/outputs/s3/task.conf

+    region                       ${REGION_TASK}
+    upload_timeout               2m
+    total_file_size              10M
+    s3_key_format                /aws/ecs/${CLUSTER_NAME}/$TAG[0]/$TAG[1]/$TAG[2]/$TAG[3]/$TAG[4]/%Y/%m/%d/%H/%M/%S


Matt: do we really want pseudo directories with / in the timestamp. May be only do it do day, not below that.

PettitWesley · 2022-12-15T00:27:39Z

ecs/daemon/outputs/s3/task.conf

+    region                       ${REGION_TASK}
+    upload_timeout               2m
+    total_file_size              10M
+    s3_key_format                /aws/ecs/${CLUSTER_NAME}/task/failed-to-retrieve-metadata-for/$TAG/%Y/%m/%d/%H/%M/%S


If I have two rewrite_tag filters, one for the fallback logs, I could still add instance ID probably to them, to make it still somewhat possible to figure where the failed logs came from.

Need to update timestamp here as well

PettitWesley · 2022-12-15T00:40:44Z

ecs/daemon/outputs/cloudwatch_logs/task.conf

+    log_group_name         /aws/ecs/${CLUSTER_NAME}/application/fallback-group-for-failed-metadata-templating
+    # fallback stream to log_stream_prefix + tag (log file name) if templating fails
+    log_stream_prefix      json-log-driver-file-
+    region                 ${REGION_TASK}


bingfeng: So this enables cross region log, but what if customers want cross account log?

wesley: That would require a different output file.

cross account requires you to add a new parameter in teh config:

log_stream_prefix json-log-driver-file- region ${REGION_TASK} role_arn ${ROLE_ARN_TASK}

which then can't be empty or it will fail validation.

we can use the EKS helm chart as an inspiration:

log_stream_prefix json-log-driver-file- region ${REGION_TASK} ${EXTRA_CLOUDWATCH_TASK}

its safe for EXTRA_CLOUDWATCH_TASK

Need to check if this really works.

set EXTRA_CLOUDWATCH_TASK=role_arn my-role`

need to think about the experience

log_stream_prefix json-log-driver-file- region ${REGION_TASK} ${OPTIONAL_ROLE_ARN} ${OPTIONAL_LOG_KEY} ${OPTIONAL_LOG_RETENTION}

all of these I THINK but need to check, can be empty and it won't break.

need to check.

basically init templating of config

Templating Documentation

One option discussed in today's meeting which I can put here for documentation is as follows:
Have a single environment variable called "init-config" which can be set to a JSON string. JSON is great because any JSON can be reduced to a one line string, thus one environment variable.

May take some more thinking to settle on a good JSON schema but the following would work

Configuration JSON

{ /* These are global configuration key value pairs */ "config": { "param1": "val" }, "options": [ /* Here are a list of options that the customer can configure separately. Name describes the option they are opting into, config is the option's setting */ { "name": "host", /* Each config has a default setting, so config is completely optional */ "config": { "destinations": ["s3", "cloudwatch"] } }, { "name": "host", "config": { "destination": "cloudwatch", "region": "auto" /* Each config gets translated into a template model which gets translated to the template. Non trivial config options should be considered */ } } ] }

Fluent Bit Config Generators

Instead of having init process stitch together raw config files, we introduce a concept of data models and config file templates.

A process on startup parses the JSON config and generates a set of configuration files which are used to startup fluent bit (similar to the init process but with an added step of generating the config files from template)

Templates
We can use a widely used templating engine to generate our fluent bit config files adaptively from our config file.
I recommend Mustache render https://mustache.github.io/
It's great, super flexible and allows for conditional and looped blocks of template content which will no doubt come in handy for some future features. See Demo: https://mustache.github.io/#demo

Data Models
A mustache template needs a data model to render into it's result. Datamodels are just object. We can generate a Data Model for each template.

Generator Structure
This could be how we store the template rendering logic and organize our generator scripts

{ "options": [ /* template providers for the options we can enable */ { "name": "<OptionName>" /* this option name corresponds to the options that can be enabled */ "synthesizeDataModel": function(globalConfig, optionConfig) => someObject, /* a function that makes our template data model */ "templates": [ /* A list of templates that get processed with our data model and sent to the output destination conf file */ { "path": "/anyPathToOurTemplate", "destination": "someDestinationConfigFile" /* append to destination file, maybe /parser.conf or /fluent-bit.conf */ }, ... ] } ] }

(I highly recommend Typescript, because I love TS and could code this efficiently, but adding node to our image could introduce security scan vulnerabilities that we may have to patch which is unfortunate)

Note: I actually don't like the proposed schemas and structures. It's just a general idea of some thoughts (initial thoughts).

Just having the log_key option, either via this or just another config file, for the task logs, is something I kind of want to have even just for the first launch 🤔

I think doing the log_key log thing is what a lot of users want for their container logs. They don't want them as JSONs.

ecs/daemon/inputs/host/all-host.conf

PettitWesley · 2022-12-15T22:27:17Z

ecs/daemon/inputs/ecs/all-ecs-dataplane.conf

@@ -0,0 +1,12 @@
+[INPUT]


I keep forgetting my own guidance here: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#tail-input-duplicates-logs-because-of-rotation

I need to double check and consider all of these. For each, there's a trade-off:

If we match rotated log files, then we will get duplicates

If we do not match all of the existing rotated log files at the time when Fluent Bit starts, then we won't get all of those old logs. Probably this is okay, but it'd be cool if there was a solution for it to get all the old logs. The ECS log collector script will grab all old logs IIRC: https://github.com/aws/amazon-ecs-logs-collector

I think this is something to just note in the debugging/DIY part of the guide

PettitWesley · 2022-12-15T22:41:56Z

ecs/daemon/inputs/host/all-host.conf

+    Name                tail
+    Tag                 host.*
+    Path                /var/log/dmesg*
+    Parser              syslog


My comment here is applicable: aws-samples/amazon-cloudwatch-container-insights#108 (comment)

I think for this daemon service template, putting the dmesg log inside of a key called "log" is all we want... so just removing the parser here.

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley · 2023-01-05T22:46:32Z

ecs/daemon/inputs/task.conf

+    Docker_Mode         On
+    Docker_Mode_Flush   5
+    Docker_Mode_Parser  container_firstline
+    Parser              docker


As I note here, this is the old parser config, we can/should use the new one which supports docker: aws-samples/amazon-cloudwatch-container-insights#115 (comment)

PettitWesley · 2023-01-05T22:52:34Z

ecs/daemon/inputs/ecs/agent.conf

+[INPUT]
+    Name                tail
+    Tag                 ecs.*
+    Path                /var/log/ecs/ecs-agent.log*


As noted elsewhere, need to not match rotated agent log files

…uplicates Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley · 2023-01-05T23:25:31Z

ecs/daemon/inputs/host/all-host.conf

+    Name                tail
+    Tag                 host.*
+    Path                /var/log/messages
+    Parser              syslog


TODO: Check if this parser is correct and actually most ideal for all of these logs

aws-samples/amazon-cloudwatch-container-insights#108 (comment)

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley · 2023-01-05T23:27:51Z

ecs/daemon/parsers.conf

@@ -0,0 +1,23 @@
+[PARSER]


TODO: I think we can just use the built-in docker parser. Only syslog may be needed here. Testing this.

PettitWesley · 2023-01-05T23:45:43Z

use_cases/ecs-ec2-daemon-collector/task-definition-templates/cloudwatch-task-logs.json

+            "logConfiguration": {
+                "logDriver": "awslogs",
+                "options": {
+                    "awslogs-group": "/aws/ecs/{INSERT CLUSTER_NAME}/collector",


should add cache-disabled true here: https://docs.docker.com/config/containers/logging/dual-logging/

and in all other templates

(Assuming ECS accepts this option and doesn't validate it against a set of options that is old and doesn't include it...)

the new `multiline.parsers docker` does everything we need I checked that it handles logs split by runtime at the 16KB limit and also that it parses timestamps correctly from the docker log file Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley · 2023-01-06T01:42:15Z

use_cases/ecs-ec2-daemon-collector/task-definition-templates/cloudwatch-all-logs.json

+                {
+                    "sourceVolume": "fluent-bit-buffer-directory",
+                    "containerPath": "/var/fluent-bit",
+                    "readOnly": true


Need to fix this everywhere, the buffer and state dirs should not be readonly.

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley · 2023-01-10T02:25:24Z

use_cases/ecs-ec2-daemon-collector/task-definition-templates/cloudwatch-task-logs.json

+                    "awslogs-group": "/aws/ecs/{INSERT CLUSTER_NAME}/collector",
+                    "awslogs-region": "{INSERT REGION}",
+                    "awslogs-create-group": "true",
+                    "awslogs-stream-prefix": "fluent-bit-stdout-stderr-"


https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html

prefix-name/container-name/ecs-task-id

So I need to remove the dash here

and may be just make he prefix be stdout-stderr since container name is fluent-bit-daemon-collector

PettitWesley · 2023-01-10T21:11:22Z

use_cases/ecs-ec2-daemon-collector/task-definition-templates/cloudwatch-task-logs.json

+                "options": {
+                    "awslogs-group": "/aws/ecs/{INSERT CLUSTER_NAME}/collector",
+                    "awslogs-region": "{INSERT REGION}",
+                    "awslogs-create-group": "true",


I should remove this/set it to false. Log Groups are normally managed by infra as code outside of tasks so that they can be easily cleaned up.

Also, the AmazonECSTaskExecutionRolePolicy does not have create group permissions so if I include this then I require customers to have more perms on their execution or instance role.

Instead, log group can be created with AWS CLI or in the CFN template.

PettitWesley · 2023-01-11T00:16:30Z

use_cases/ecs-ec2-daemon-collector/task-definition-templates/cloudwatch-task-logs.json

+                    "awslogs-stream-prefix": "fluent-bit-stdout-stderr-"
+                }
+            },
+            "environment": [


Need to add an env var here for IMDS version used by AWS Filter.

Actually, since we use host mode, its always safe to use V2 with default hop limit of one, so will just use that. No need to worry about this. V2 is always enabled by default now.

PettitWesley · 2023-01-11T00:17:08Z

ecs/daemon/outputs/cloudwatch_logs/task.conf

+[FILTER]
+    Name               aws
+    Match              task.var.lib.docker.containers.*
+    imds_version       v1


In all config files, IMDS version used by AWS filters should be configured by the same single env var setting which defaults to v1.

Actually, since we use host mode, its always safe to use V2 with default hop limit of one, so will just use that. No need to worry about this. V2 is always enabled by default now.

PettitWesley · 2023-01-11T01:12:17Z

ecs/daemon/outputs/cloudwatch_logs/task.conf

+    log_group_name         /aws/ecs/${CLUSTER_NAME}/application/fallback-group-for-failed-metadata-templating
+    # fallback stream to log_stream_prefix + tag (log file name) if templating fails
+    log_stream_prefix      json-log-driver-file-
+    region                 ${REGION_TASK}


I think these should be LOG_REGION_X or all just LOG_REGION

PettitWesley · 2023-01-22T05:35:00Z

ecs/daemon/inputs/task.conf

+[INPUT]
+    Name                tail
+    Tag                 app.*
+    Path                /var/lib/docker/containers/*/*.log


I've tested that if I add:

Exclude_Path *container-cached*

This will ignore the container cache log files, so we only pick up containers that explicitly configure json-file log driver.

Except, I saw on my instance that the ECS Agent seems to somehow have non-cache log files, I am not sure how this happened:

This shows that it does indeed work for real cache files: see the excluded debug message:

I did a simplified test where I scanned one container sub-directory on my test instance ^

I checked the agent thing on another instance, it seems that the agent typically has json file log driver files. This is strange and unexpected.

I should look into this/ask ECS Agent team if this is on purpose. I don't think it is.

aws/amazon-ecs-agent#3546

PettitWesley · 2023-01-25T06:25:09Z

ecs/daemon/outputs/cloudwatch_logs/ecs.conf

+[FILTER]
+    Name               aws
+    Match              ecs.*
+    imds_version       v1


TODO: IMDS version needs to be an env var everywhere.

Actually, since we use host mode, its always safe to use V2 with default hop limit of one, so will just use that. No need to worry about this. V2 is always enabled by default now.

PettitWesley requested a review from a team as a code owner December 15, 2022 00:02

PettitWesley marked this pull request as draft December 15, 2022 00:02

PettitWesley commented Dec 15, 2022

View reviewed changes

ecs/daemon/inputs/host/all-host.conf Show resolved Hide resolved

PettitWesley commented Dec 15, 2022

View reviewed changes

This was referenced Dec 16, 2022

🚨 This example is very old and will soon be replaced aws-samples/amazon-ecs-fluent-bit-daemon-service#3

Open

[ECS Metadata Filter][2.29.0] can crash when metadata requests fail #505

Open

PettitWesley added 10 commits January 3, 2023 17:49

WIP: ECS daemon input configs

787fa88

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: finished all input files

4410713

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: finished all s3 output files

b7b3caf

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add all cloudwatch output files

d740650

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add task def templates for task logs

23e1827

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add table of contents for entire guide

0a3c7d2

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: cloudwatch all logs task def + fixes to task defs and configs

5556407

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add prerequisites and config file list

1f56c8e

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add description for /ecs/daemon/inputs/task.conf built-in config

d3882d7

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: add FLB_AWS_USER_AGENT=ecs-daemon

4901138

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley force-pushed the ecs-fluent-bit-daemon-collector branch from af78f68 to 4901138 Compare January 4, 2023 01:49

cherry-pick: add recent fixes for ECS filter

76b33bc

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley commented Jan 5, 2023

View reviewed changes

daemon: inputs: fix matching of rotated log files which would cause d…

4989c32

…uplicates Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley commented Jan 5, 2023

View reviewed changes

daemon: inputs: syslog parser does not work on dmesg logs

8952852

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley commented Jan 5, 2023

View reviewed changes

PettitWesley commented Jan 6, 2023

View reviewed changes

PettitWesley added 2 commits January 5, 2023 18:21

daemon: add service.conf to all templates

303694f

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

daemon: FLB state dir must not be readonly

401a3e4

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

PettitWesley commented Jan 10, 2023

View reviewed changes

PettitWesley commented Jan 11, 2023

View reviewed changes

PettitWesley commented Jan 22, 2023

View reviewed changes

PettitWesley commented Jan 25, 2023

View reviewed changes

WIP: ECS Fluent Bit Daemon Collector #499

Are you sure you want to change the base?

WIP: ECS Fluent Bit Daemon Collector #499

Conversation

PettitWesley commented Dec 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Templating Documentation

Configuration JSON

Fluent Bit Config Generators

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment