Question regarding combination of Kinesis Stream and Kinesis Firehose #21

anden-dev · 2024-02-09T07:43:04Z

anden-dev
Feb 9, 2024

Hi,

First of all cool solution!

You use Kinesis Stream LogStream, as a source in Kinesis Firehose DeliveryStream.LogProcessorFunction is on Firehose, S3 destination is on Firehose.

What is the advantage of adding the Kinesis Stream LogStream and using it as a AWS::Logs::Destination instead of the Kinesis Firehose directly?

From the documentation CrossAccountSubscriptions

The documentation there is not going into when you should use which. I would like to know if you had a reason for the combination of both. If so, it makes sense to put it in the README.md.

kichik · 2024-02-09T15:10:49Z

kichik
Feb 9, 2024
Maintainer

I am not sure. The solution is based on Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis.

It is possible subscribing Firehose directly wasn't available in 2018. It's also possible DeliveryStream was added for its buffering capabilities and therefore added reliability.

0 replies

anden-dev · 2024-02-09T19:59:18Z

anden-dev
Feb 9, 2024
Author

Hi Amir (you really are quick in answering)

Thanks for the straight answer. A the moment I also like the Idea of having it in front of firehose.

There are not many solutions to start with and I stumbled upon Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis. (AWS always has those 60-70% solutions in their blogs, they give you nice ideas but when I implement them I always find loads of duct tape engineering and loads of iam:*)

I chose to get inspiration from your solution instead. I had a missing kms:generateDatakey

The bit I am currently trying to figure out is how to partition the data in a centralised S3 bucket. At the moment this solution gives you a file with the name of the log destination. (zipped if you do not use the processing lambda)

I think a prefix on the s3 destination like this
<origin_account_number>/<log_group_name>/
would make sense. Then ideally in a format that one can query with Athena.

At the moment I can't seem to get it to work, I tried to add metadata in the processing lambda

  partition_keys = {
            "owner": data['owner'],
            "logGroup": data['logGroup'],
        }
    
    [....]
    yield {
                "recordId": record_id,
                "metadata": { "partitionKeys": partition_keys },
                "data": data,
                "result": "Ok",

            }

as mentioned here https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html

I also frankensteined your lambda with this to get redigestion.

https://github.com/htnosm/terraform-aws-cloudwatch-logs-to-s3/blob/main/src/kinesis-firehose-cloudwatch-logs-processor/lambda_function.py

(Plus a sprinkle of https://docs.powertools.aws.dev/lambda/python/latest for logging, metrics and tracing)

But the partition I can't seem to get the hang of, it yet.

This is an expansion of the existing transform Lambda function that is available today with Kinesis Data Firehose. You can transform, >parse and return the data fields that you can then use for dynamic partitioning using the same Lambda function.

Well, currently I am not.

Maybe for others stumbling upon this daisy-chaining Kinesis Processors sadly does not work for Cloudwatch Subscriptions.

I naively thought: Decompression(GZIP) -> RecordDeAggregation(JSON) -> MetadataExtraction(JQ) but it looks like the lambda is the way.

If it is of interest to you I can update this, should I come up with a solution.

0 replies

kichik · 2024-02-09T20:24:42Z

kichik
Feb 9, 2024
Maintainer

(AWS always has those 60-70% solutions in their blogs, they give you nice ideas but when I implement them I always find loads of duct tape engineering and loads of iam:*)

Pretty much exactly why I created this 😅

as mentioned here https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html

Hard to tell as it seems the current pushed code doesn't have it enabled right now. The only thing I can see is the partition key meta defined in the reingestion record and not in the yielded record as in your code above.

I vaguely recall there was some Kinesis log I used to debug processor function issues. Maybe that will have something useful for you?

If it is of interest to you I can update this, should I come up with a solution.

I think other people may end up here just like you did and may have similar requests. It will be good to share for sure.

0 replies

anden-dev · 2024-02-09T20:54:31Z

anden-dev
Feb 9, 2024
Author

Ok I am a bit stupid , or just at the end of my productivity cap for today.

It is stated in the documentation docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html

Processor Prefix

prefix   = "owner=!{partitionKeyFromQuery:owner}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/"

Lambda Prefix

prefix = "owner=!{partitionKeyFromLambda:owner}/logGroup=!{partitionKeyFromLambda:logGroup}/"

Processing Lambda -> partionKeyFromLambda, duh.

Well, these are the moving pieces, in place. (ok, the time thing !{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd} and some prepared athena queries)

I will clean up my attempts and update here. I will check if I can revert to the processing lambda version of this repo and only add the patch for metadata. Then I can also answer if it needs reingestion or not. Right now I was just looking for a quick and dirty understanding of how this could work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding combination of Kinesis Stream and Kinesis Firehose #21

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Question regarding combination of Kinesis Stream and Kinesis Firehose #21

anden-dev Feb 9, 2024

Replies: 4 comments

kichik Feb 9, 2024 Maintainer

anden-dev Feb 9, 2024 Author

kichik Feb 9, 2024 Maintainer

anden-dev Feb 9, 2024 Author

anden-dev
Feb 9, 2024

kichik
Feb 9, 2024
Maintainer

anden-dev
Feb 9, 2024
Author

kichik
Feb 9, 2024
Maintainer

anden-dev
Feb 9, 2024
Author