-
Couldn't load subscription status.
- Fork 395
Description
Hi, and thank you for this project
I'm spending quite a bit of time trying to understand how ExcludeAtMatch and IncludeAtMatch are intended to work. Despite numerous attempts using different regex combinations — with and without anchors (^) — I haven't been able to successfully exclude the logs I want.
What I'm trying to achieve
I want to exclude at the source all ELB and CloudFront logs with status codes in the 2xx or 3xx range (they are stored in a S3 bucket if that matters)
What I’ve tried
I've written a unit test suite to verify my regexes against actual log lines (redacted), but they do not seem to work as expected when deployed.
import os
import re
import unittest
class TestFilterLogs(unittest.TestCase):
# Read regexp from file content
with open(os.path.join(os.path.dirname(__file__), "cloudfront_exclude_regex.txt")) as f:
cloudfront_exclude_regex_str = f.read().rstrip()
# Regexp to match logs that should be excluded in cloudfront
cloudfront_exclude_regex = re.compile(cloudfront_exclude_regex_str)
# Regexp to match logs that should be excluded in ELB
# Read regexp from file content
with open(os.path.join(os.path.dirname(__file__), "elb_exclude_regex.txt")) as f:
elb_exclude_regex_str = f.read().rstrip()
elb_exclude_regex = re.compile(elb_exclude_regex_str)
joined_regex_str = f"({cloudfront_exclude_regex_str})|({elb_exclude_regex_str})"
global_regex = re.compile(joined_regex_str)
# Globally excluded logs
excluded_logs = [
'2025-07-16 10:44:58 MXP63-P3 65451 1.2.3.102 GET xxxxxxxxxxxxx.cloudfront.net /pro.html 200 <redacted>',
'2025-07-16 10:51:57 ARN56-P1 1092 1.2.3.187 POST yyyyyyyyyyyyyyy.cloudfront.net /blog/wp-cron.php 307 - <redacted>',
'2025-07-16 10:54:56 MRS52-P1 61878 1.2.3.102 GET xxxxxxxxxxxxx.cloudfront.net /tape.html 200 <redacted>',
'2025-07-16 11:11:56 MXP63-P3 7362 1.2.3.55 GET yyyyyyyyyyyyyyy.cloudfront.net /products/yyyy/related_products.turbo_stream 200 <redacted>',
'https 2025-07-16T11:48:45.741692Z app/my-alb/edc98xxxx 1.2.3.141:57720 1.2.3.132:3000 0.001 0.021 0.000 200 200 3970 2071 "GET https://<redacted>',
]
def test_joined_regex(self):
for log in self.excluded_logs:
self.assertTrue(re.match(self.global_regex, log), f"Log did not match: {log}")
def test_exclude_elb(self):
example_logs = [
"This is not a REPORT log",
# Kept
'https 2025-07-15T11:59:59.059080Z app/my-alb/edc98xxxx 1.2.3.88:53962 1.2.3.115:3000 0.001 0.039 0.000 404 404 397 297 <redacted>',
# Excluded 200
'https 2025-07-16T08:45:01.556923Z app/my-alb/edc98xxxx 1.2.3.141:45808 1.2.3.102:3000 0.002 0.035 0.000 200 200 4216 2682 <redacted>',
# Excluded 304 with http protocol
'https 2025-07-16T10:05:59.805417Z app/my-alb/edc98xxxx 1.2.3.69:19324 1.2.3.82:3000 0.002 0.069 0.000 304 304 1849 791 <redacted>',
# Excluded 200 with h2 protocol
'h2 2025-07-16T10:02:02.308637Z app/my-alb/edc98xxxx 1.2.3.15:13228 1.2.3.179:8080 0.002 0.013 0.000 200 200 3249 363 <redacted>'
]
self.assertFalse(re.match(self.elb_exclude_regex, example_logs[1]))
self.assertTrue(re.match(self.elb_exclude_regex, example_logs[2])) # The 200
self.assertTrue(re.match(self.elb_exclude_regex, example_logs[3])) # The 304
self.assertTrue(re.match(self.elb_exclude_regex, example_logs[4])) # The 200 with h2 protocol
def test_exclude_cloudfront(self):
example_logs = [
"This is not a REPORT log",
'2025-07-16 09:16:53 FRA53-C1 9193 1.2.3.206 GET yyyyyyyyyyyyyyy.cloudfront.net /products/yyyy 200 <redacted>',
'2025-07-16 09:21:54 MXP63-P3 2747 1.2.3.56 GET yyyyyyyyyyyyyyy.cloudfront.net /products/track_product_view.turbo_stream 200 <redacted>',
'2025-07-16 09:16:45 ARN56-P1 1096 1.2.3.187 POST yyyyyyyyyyyyyyy.cloudfront.net /blog/wp-cron.php 307 - <redacted>',
'2025-07-16 09:19:54 MXP63-P3 51886 1.2.3.163 GET xxxxxxxxxxxxx.cloudfront.net /apple-touch-icon-120x120.png 404 - <redacted>',
'2025-07-15 16:12:57 MXP63-P3 1433 1.2.3.170 GET xxxxxxxxxxxxx.cloudfront.net /blog/some-post/ 503 <redacted>',
]
self.assertTrue(re.match(self.cloudfront_exclude_regex, example_logs[1])) # The 200
self.assertTrue(re.match(self.cloudfront_exclude_regex, example_logs[2])) # The 200
self.assertTrue(re.match(self.cloudfront_exclude_regex, example_logs[2])) # The 307
self.assertFalse(re.match(self.cloudfront_exclude_regex, example_logs[4])) # The 404
self.assertFalse(re.match(self.cloudfront_exclude_regex, example_logs[5])) # The 503
if __name__ == "__main__":
unittest.main()The regex files contain:
# ELB
^(h2|https?)\s(?P<elb_date_access>\d{4}-\d{2}-\d{2}\S+)\s(?P<elb_name>app/my-alb/\w+)\s(?P<client_address>\S+:\d+)\s(?P<destination_address>\S+:\d+)\s(?P<request_time>[\d.]+)\s(?P<backend_time>[\d.]+)\s(?P<response_time>[\d.]+)\s(?P<elb_status_code>(2|3)\d{2})\s
# CloudFront
^(?P<cf_date_access>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(?P<edge_location>\S+)\s(?P<bytes>\d+)\s(?P<source_ip>\d+\.\d+\.\d+\.\d+)\s(?P<method>GET|POST|PUT|DELETE|HEAD|OPTIONS)\s(?P<domain>\S+)\s(?P<uri>\S+)\s(?P<cf_status_code>(2|3)\d{2})\s
The named capture groups are there only for readability — could they be interfering?
I’ve tried:
- Each regex individually
- A combined regex string using
joined_regex_str = f"({cloudfront_exclude_regex_str})|({elb_exclude_regex_str})" - Variants with and without the
^anchor
Questions
- What exactly is passed to the regex in filter_logs?
Is it the raw log line or a parsed message? - Are named groups supported or should I remove them?
- Could you provide more documentation or examples on how ExcludeAtMatch/IncludeAtMatch work behind the scenes?
Side note: it would be very nice to support multiple regexp in lambda configuration to avoid unions and other complex regexps.
Run the Lambda locally?
I would love to test the Lambda locally with real log lines (but without sending anything to Datadog).
Is there a quick and recommended way to do that?
I tried invoking the handler manually but couldn’t figure out the right structure for the input event or context.
Thanks in advance for your help!