Replies: 4 comments
-
I am not sure. The solution is based on Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis. It is possible subscribing Firehose directly wasn't available in 2018. It's also possible |
Beta Was this translation helpful? Give feedback.
-
Hi Amir (you really are quick in answering) Thanks for the straight answer. A the moment I also like the Idea of having it in front of firehose. There are not many solutions to start with and I stumbled upon Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis. (AWS always has those 60-70% solutions in their blogs, they give you nice ideas but when I implement them I always find loads of duct tape engineering and loads of iam:*) I chose to get inspiration from your solution instead. I had a missing The bit I am currently trying to figure out is how to partition the data in a centralised S3 bucket. At the moment this solution gives you a file with the name of the log destination. (zipped if you do not use the processing lambda) I think a prefix on the s3 destination like this At the moment I can't seem to get it to work, I tried to add metadata in the processing lambda partition_keys = {
"owner": data['owner'],
"logGroup": data['logGroup'],
}
[....]
yield {
"recordId": record_id,
"metadata": { "partitionKeys": partition_keys },
"data": data,
"result": "Ok",
} as mentioned here https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html I also frankensteined your lambda with this to get redigestion. (Plus a sprinkle of https://docs.powertools.aws.dev/lambda/python/latest for logging, metrics and tracing) But the partition I can't seem to get the hang of, it yet.
Well, currently I am not. Maybe for others stumbling upon this daisy-chaining Kinesis Processors sadly does not work for Cloudwatch Subscriptions. I naively thought: Decompression(GZIP) -> RecordDeAggregation(JSON) -> MetadataExtraction(JQ) but it looks like the lambda is the way. If it is of interest to you I can update this, should I come up with a solution. |
Beta Was this translation helpful? Give feedback.
-
Pretty much exactly why I created this 😅
Hard to tell as it seems the current pushed code doesn't have it enabled right now. The only thing I can see is the partition key meta defined in the reingestion record and not in the yielded record as in your code above. I vaguely recall there was some Kinesis log I used to debug processor function issues. Maybe that will have something useful for you?
I think other people may end up here just like you did and may have similar requests. It will be good to share for sure. |
Beta Was this translation helpful? Give feedback.
-
Ok I am a bit stupid , or just at the end of my productivity cap for today. It is stated in the documentation docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html Processor Prefix prefix = "owner=!{partitionKeyFromQuery:owner}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/" Lambda Prefix prefix = "owner=!{partitionKeyFromLambda:owner}/logGroup=!{partitionKeyFromLambda:logGroup}/" Processing Lambda -> partionKeyFromLambda, duh. Well, these are the moving pieces, in place. (ok, the time thing !{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd} and some prepared athena queries) I will clean up my attempts and update here. I will check if I can revert to the processing lambda version of this repo and only add the patch for metadata. Then I can also answer if it needs reingestion or not. Right now I was just looking for a quick and dirty understanding of how this could work. |
Beta Was this translation helpful? Give feedback.
-
Hi,
First of all cool solution!
You use Kinesis Stream
LogStream
, as a source in Kinesis FirehoseDeliveryStream
.LogProcessorFunction
is on Firehose, S3 destination is on Firehose.What is the advantage of adding the Kinesis Stream
LogStream
and using it as aAWS::Logs::Destination
instead of the Kinesis Firehose directly?From the documentation CrossAccountSubscriptions
The documentation there is not going into when you should use which. I would like to know if you had a reason for the combination of both. If so, it makes sense to put it in the README.md.
Beta Was this translation helpful? Give feedback.
All reactions