feat(parser): add KinesisFirehoseModel #1556

ran-isenberg · 2022-09-29T16:26:48Z

Issue number: #1555

Summary

Add Kinesis Data Firehose event envelope and parser model

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Meet tenets criteria
I have performed a self-review of this change
Changes have been tested
Changes are documented
PR title follows conventional commit semantics

Is this a breaking change?

RFC issue number:

Checklist:

Migration process documented
Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

View rendered docs/utilities/parser.md

aws_lambda_powertools/shared/functions.py

aws_lambda_powertools/utilities/parser/envelopes/kinesis.py

aws_lambda_powertools/utilities/parser/models/kinesis.py

heitorlessa · 2022-10-03T13:08:15Z

@leandrodamascena we might want to do an E2E for this. Due to parser's nature, it has the highest risk of causing runtime disruption.

codecov-commenter · 2022-10-03T13:11:05Z

Codecov Report

Base: 99.73% // Head: 99.77% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (2512ab9) compared to base (22459e7).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1556      +/-   ##
===========================================
+ Coverage    99.73%   99.77%   +0.03%     
===========================================
  Files          124      126       +2     
  Lines         5718     5757      +39     
  Branches       651      656       +5     
===========================================
+ Hits          5703     5744      +41     
+ Misses           8        6       -2     
  Partials         7        7

Impacted Files	Coverage Δ
...a_powertools/utilities/parser/envelopes/kinesis.py	`100.00% <ø> (ø)`
aws_lambda_powertools/logging/formatter.py	`100.00% <100.00%> (ø)`
aws_lambda_powertools/shared/constants.py	`100.00% <100.00%> (ø)`
aws_lambda_powertools/shared/functions.py	`92.59% <100.00%> (+7.40%)`	⬆️
..._powertools/utilities/parser/envelopes/__init__.py	`100.00% <100.00%> (ø)`
...ols/utilities/parser/envelopes/kinesis_firehose.py	`100.00% <100.00%> (ø)`
...bda_powertools/utilities/parser/models/__init__.py	`100.00% <100.00%> (ø)`
...mbda_powertools/utilities/parser/models/kinesis.py	`100.00% <100.00%> (ø)`
...rtools/utilities/parser/models/kinesis_firehose.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

leandrodamascena · 2022-10-03T19:29:44Z

Hii @ran-isenberg thanks for one more PR 😃.. I'm starting to work on it now.

@heitorlessa in fact adding e2e tests is a big improvement in the parser area. I just need to check how much data we need to generate in Kinesis Firehose with PUT source and Kinesis Firehose with Kinesis Stream source to trigger an event in Lambda. I think we should open a new PR to include e2e tests in v2 branch while this is not the official branch.

ran-isenberg · 2022-10-03T19:31:26Z

+1 for E2E tests.
but on the flip side, it's going to make adding new services way harder for newcomers.

leandrodamascena · 2022-10-03T21:04:38Z

+1 for E2E tests. but on the flip side, it's going to make adding new services way harder for newcomers.

Yeah Ran, it makes sense and I understand your concern.. But for now we can write e2e tests when someone adds a new service, feature or even extends an existing feature and make life easier for new contributors.

ran-isenberg · 2022-10-04T06:07:44Z

+1 for E2E tests. but on the flip side, it's going to make adding new services way harder for newcomers.

Yeah Ran, it makes sense and I understand your concern.. But for now we can write e2e tests when someone adds a new service, feature or even extends an existing feature and make life easier for new contributors.

You use CDK right?

leandrodamascena · 2022-10-05T22:36:55Z

You use CDK right?

Yeah, we use CDK for e2 testing. But I have some considerations to make about e2e testing before we think about any implementation on Parser.

e2e tests will only work in v2, so in this PR we should not write end2end tests for the parser.
We have a release date for v2 later this month and we are working hard for it.
We believe that only the e2e tests are not enough in all cases. For e.g, if the parser fails for any change in the payload, we run a risk of breaking the function, because it is a runtime error. We need to write a RFC to think of a solution where we can update all lambda eventSources payload every X days/hours, because the payload can change on the service/lambda side and if we don't have the updated project it can be a problem. Maybe a separate project to deal with this, I'm not sure yet, but we'll start that discussion soon.

leandrodamascena

We need to update the code to account for differences between payloads.

Payload using Kinesis Firehose with a Kinesis Stream as source:

{
   "invocationId":"77c0024f-9869-4af0-951a-d83c102e8a63",
   "sourceKinesisStreamArn":"arn:aws:kinesis:us-east-1:200984112386:stream/kinesis-evento",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/KDS-S3-10POr",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602406469139358835205474851822974501572780333465650000000",
         "approximateArrivalTimestamp":1664028793294,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474851822974501572780333465650",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028793294
         }
      },
      {
         "recordId":"49633602406469139358835205474853031900321189264934043698000000",
         "approximateArrivalTimestamp":1664028820148,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474853031900321189264934043698",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028820148
         }
      }
   ]
}

Payload using Kinesis Firehose with Direct PUT:

{
   "invocationId":"25326478-5c08-45ad-a34d-7342edbc942d",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/PUT-S3-PCYyW",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602747871247603140515133584305740288993709252411394000000",
         "approximateArrivalTimestamp":1664029185290,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      },
      {
         "recordId":"49633602747871247603140515133585514666108608544585547778000000",
         "approximateArrivalTimestamp":1664029186945,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      }
   ]
}

We can go in the same direction as we went here #1499 and create separate classes for each type of payload.

ran-isenberg · 2022-10-06T05:09:08Z

You use CDK right?

Yeah, we use CDK for e2 testing. But I have some considerations to make about e2e testing before we think about any implementation on Parser.

e2e tests will only work in v2, so in this PR we should not write end2end tests for the parser.

We have a release date for v2 later this month and we are working hard for it.

We believe that only the e2e tests are not enough in all cases. For e.g, if the parser fails for any change in the payload, we run a risk of breaking the function, because it is a runtime error. We need to write a RFC to think of a solution where we can update all lambda eventSources payload every X days/hours, because the payload can change on the service/lambda side and if we don't have the updated project it can be a problem. Maybe a separate project to deal with this, I'm not sure yet, but we'll start that discussion soon.

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough.
If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

ran-isenberg · 2022-10-06T05:11:16Z

We need to update the code to account for differences between payloads.

Payload using Kinesis Firehose with a Kinesis Stream as source:

{
   "invocationId":"77c0024f-9869-4af0-951a-d83c102e8a63",
   "sourceKinesisStreamArn":"arn:aws:kinesis:us-east-1:200984112386:stream/kinesis-evento",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/KDS-S3-10POr",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602406469139358835205474851822974501572780333465650000000",
         "approximateArrivalTimestamp":1664028793294,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474851822974501572780333465650",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028793294
         }
      },
      {
         "recordId":"49633602406469139358835205474853031900321189264934043698000000",
         "approximateArrivalTimestamp":1664028820148,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474853031900321189264934043698",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028820148
         }
      }
   ]
}

Payload using Kinesis Firehose with Direct PUT:

{
   "invocationId":"25326478-5c08-45ad-a34d-7342edbc942d",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/PUT-S3-PCYyW",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602747871247603140515133584305740288993709252411394000000",
         "approximateArrivalTimestamp":1664029185290,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      },
      {
         "recordId":"49633602747871247603140515133585514666108608544585547778000000",
         "approximateArrivalTimestamp":1664029186945,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      }
   ]
}

We can go in the same direction as we went here #1499 and create separate classes for each type of payload.

I've modeled the metadata as optional so it works for both cases. The only reason to split them is if it's officially documented as such. do you think that's the case? I didnt find a mention that metadata params are applicable only for put events. If we add by ourselves, it's more likely to be broken in the future.
WDYT?

leandrodamascena · 2022-10-06T13:40:33Z

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough. If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

I'll open an issue and add this discussion in there, so we can keep focus here in this PR. ok?

ran-isenberg · 2022-10-06T13:48:29Z

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough. If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

I'll open an issue and add this discussion in there, so we can keep focus here in this PR. ok?

No problem.

leandrodamascena

@ran-isenberg could please check this changes?

aws_lambda_powertools/utilities/parser/models/kinesis_firehose.py

tests/functional/parser/test_kinesis_firehose.py

leandrodamascena · 2022-10-06T14:24:10Z

I've modeled the metadata as optional so it works for both cases. The only reason to split them is if it's officially documented as such. do you think that's the case? I didnt find a mention that metadata params are applicable only for put events. If we add by ourselves, it's more likely to be broken in the future. WDYT?

I totally agree with you! 💯

ran-isenberg · 2022-10-09T11:14:16Z

@leandrodamascena added my small fix and fixed develop conflict

leandrodamascena · 2022-10-10T10:09:09Z

@leandrodamascena added my small fix and fixed develop conflict

Heyy good morning and happy monday @ran-isenberg!
If you don't have any other considerations, I think we are good to merge it!
Thanks for addressing all feedback so quickly. ⏩

ran-isenberg · 2022-10-10T10:15:06Z

LGTM

Feature request: Kinesis Data Firehose event envelope and parser model

229bd6f

ran-isenberg requested a review from a team as a code owner September 29, 2022 16:26

ran-isenberg requested review from leandrodamascena and removed request for a team September 29, 2022 16:26

boring-cyborg bot added area/commons documentation Improvements or additions to documentation tests labels Sep 29, 2022

pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2022

Merge branch 'develop' into firehose

ea74092

ran-isenberg mentioned this pull request Sep 29, 2022

Feature request: Kinesis Data Firehose event envelope and parser model #1555

Closed

2 tasks

ran-isenberg commented Sep 29, 2022

View reviewed changes

aws_lambda_powertools/shared/functions.py Show resolved Hide resolved

ran-isenberg commented Sep 29, 2022

View reviewed changes

aws_lambda_powertools/utilities/parser/envelopes/kinesis.py Show resolved Hide resolved

ran-isenberg commented Sep 29, 2022

View reviewed changes

aws_lambda_powertools/utilities/parser/models/kinesis.py Show resolved Hide resolved

Merge branch 'develop' into firehose

2512ab9

leandrodamascena added the area/parser label Oct 3, 2022

leandrodamascena self-assigned this Oct 3, 2022

Merge branch 'develop' into firehose

f95afa8

leandrodamascena requested changes Oct 5, 2022

View reviewed changes

feat(parser): adjusts in kinesis firehose parser

0b48268

leandrodamascena requested changes Oct 6, 2022

View reviewed changes

aws_lambda_powertools/utilities/parser/models/kinesis_firehose.py Outdated Show resolved Hide resolved

aws_lambda_powertools/utilities/parser/models/kinesis_firehose.py Outdated Show resolved Hide resolved

tests/functional/parser/test_kinesis_firehose.py Show resolved Hide resolved

ran-isenberg and others added 2 commits October 8, 2022 20:22

Merge branch 'develop' into firehose

114caa1

add positive int and fix develop conflict

ca73ecd

leandrodamascena approved these changes Oct 10, 2022

View reviewed changes

leandrodamascena changed the title ~~Feature request: Kinesis Data Firehose event envelope and parser model~~ feat(parser): add KinesisFirehoseModel Oct 10, 2022

github-actions bot added the feature New feature or functionality label Oct 10, 2022

leandrodamascena merged commit 81afadb into aws-powertools:develop Oct 10, 2022

leandrodamascena mentioned this pull request Oct 11, 2022

feat(data_classes): add KinesisFirehoseEvent #1540

Merged

7 tasks

ran-isenberg deleted the firehose branch October 23, 2022 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser): add KinesisFirehoseModel #1556

feat(parser): add KinesisFirehoseModel #1556

ran-isenberg commented Sep 29, 2022 •

edited by leandrodamascena

heitorlessa commented Oct 3, 2022

codecov-commenter commented Oct 3, 2022 •

edited

leandrodamascena commented Oct 3, 2022

ran-isenberg commented Oct 3, 2022

leandrodamascena commented Oct 3, 2022

ran-isenberg commented Oct 4, 2022

leandrodamascena commented Oct 5, 2022

leandrodamascena left a comment

ran-isenberg commented Oct 6, 2022

ran-isenberg commented Oct 6, 2022

leandrodamascena commented Oct 6, 2022

ran-isenberg commented Oct 6, 2022

leandrodamascena left a comment

leandrodamascena commented Oct 6, 2022

ran-isenberg commented Oct 9, 2022

leandrodamascena commented Oct 10, 2022

ran-isenberg commented Oct 10, 2022

feat(parser): add KinesisFirehoseModel #1556

feat(parser): add KinesisFirehoseModel #1556

Conversation

ran-isenberg commented Sep 29, 2022 • edited by leandrodamascena

Summary

Checklist

Acknowledgment

heitorlessa commented Oct 3, 2022

codecov-commenter commented Oct 3, 2022 • edited

Codecov Report

leandrodamascena commented Oct 3, 2022

ran-isenberg commented Oct 3, 2022

leandrodamascena commented Oct 3, 2022

ran-isenberg commented Oct 4, 2022

leandrodamascena commented Oct 5, 2022

leandrodamascena left a comment

Choose a reason for hiding this comment

ran-isenberg commented Oct 6, 2022

ran-isenberg commented Oct 6, 2022

leandrodamascena commented Oct 6, 2022

ran-isenberg commented Oct 6, 2022

leandrodamascena left a comment

Choose a reason for hiding this comment

leandrodamascena commented Oct 6, 2022

ran-isenberg commented Oct 9, 2022

leandrodamascena commented Oct 10, 2022

ran-isenberg commented Oct 10, 2022

ran-isenberg commented Sep 29, 2022 •

edited by leandrodamascena

codecov-commenter commented Oct 3, 2022 •

edited