Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): add KinesisFirehoseModel #1556

Merged
merged 7 commits into from
Oct 10, 2022

Conversation

ran-isenberg
Copy link
Contributor

@ran-isenberg ran-isenberg commented Sep 29, 2022

Issue number: #1555

Summary

Add Kinesis Data Firehose event envelope and parser model

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.


View rendered docs/utilities/parser.md

@ran-isenberg ran-isenberg requested a review from a team as a code owner September 29, 2022 16:26
@ran-isenberg ran-isenberg requested review from leandrodamascena and removed request for a team September 29, 2022 16:26
@boring-cyborg boring-cyborg bot added area/commons documentation Improvements or additions to documentation tests labels Sep 29, 2022
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2022
@heitorlessa
Copy link
Contributor

@leandrodamascena we might want to do an E2E for this. Due to parser's nature, it has the highest risk of causing runtime disruption.

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2022

Codecov Report

Base: 99.73% // Head: 99.77% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (2512ab9) compared to base (22459e7).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1556      +/-   ##
===========================================
+ Coverage    99.73%   99.77%   +0.03%     
===========================================
  Files          124      126       +2     
  Lines         5718     5757      +39     
  Branches       651      656       +5     
===========================================
+ Hits          5703     5744      +41     
+ Misses           8        6       -2     
  Partials         7        7              
Impacted Files Coverage Δ
...a_powertools/utilities/parser/envelopes/kinesis.py 100.00% <ø> (ø)
aws_lambda_powertools/logging/formatter.py 100.00% <100.00%> (ø)
aws_lambda_powertools/shared/constants.py 100.00% <100.00%> (ø)
aws_lambda_powertools/shared/functions.py 92.59% <100.00%> (+7.40%) ⬆️
..._powertools/utilities/parser/envelopes/__init__.py 100.00% <100.00%> (ø)
...ols/utilities/parser/envelopes/kinesis_firehose.py 100.00% <100.00%> (ø)
...bda_powertools/utilities/parser/models/__init__.py 100.00% <100.00%> (ø)
...mbda_powertools/utilities/parser/models/kinesis.py 100.00% <100.00%> (ø)
...rtools/utilities/parser/models/kinesis_firehose.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@leandrodamascena
Copy link
Contributor

Hii @ran-isenberg thanks for one more PR 😃.. I'm starting to work on it now.

@heitorlessa in fact adding e2e tests is a big improvement in the parser area. I just need to check how much data we need to generate in Kinesis Firehose with PUT source and Kinesis Firehose with Kinesis Stream source to trigger an event in Lambda. I think we should open a new PR to include e2e tests in v2 branch while this is not the official branch.

@ran-isenberg
Copy link
Contributor Author

+1 for E2E tests.
but on the flip side, it's going to make adding new services way harder for newcomers.

@leandrodamascena
Copy link
Contributor

+1 for E2E tests. but on the flip side, it's going to make adding new services way harder for newcomers.

Yeah Ran, it makes sense and I understand your concern.. But for now we can write e2e tests when someone adds a new service, feature or even extends an existing feature and make life easier for new contributors.

@ran-isenberg
Copy link
Contributor Author

+1 for E2E tests. but on the flip side, it's going to make adding new services way harder for newcomers.

Yeah Ran, it makes sense and I understand your concern.. But for now we can write e2e tests when someone adds a new service, feature or even extends an existing feature and make life easier for new contributors.

You use CDK right?

@leandrodamascena
Copy link
Contributor

You use CDK right?

Yeah, we use CDK for e2 testing. But I have some considerations to make about e2e testing before we think about any implementation on Parser.

  • e2e tests will only work in v2, so in this PR we should not write end2end tests for the parser.
  • We have a release date for v2 later this month and we are working hard for it.
  • We believe that only the e2e tests are not enough in all cases. For e.g, if the parser fails for any change in the payload, we run a risk of breaking the function, because it is a runtime error. We need to write a RFC to think of a solution where we can update all lambda eventSources payload every X days/hours, because the payload can change on the service/lambda side and if we don't have the updated project it can be a problem. Maybe a separate project to deal with this, I'm not sure yet, but we'll start that discussion soon.

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the code to account for differences between payloads.

Payload using Kinesis Firehose with a Kinesis Stream as source:

{
   "invocationId":"77c0024f-9869-4af0-951a-d83c102e8a63",
   "sourceKinesisStreamArn":"arn:aws:kinesis:us-east-1:200984112386:stream/kinesis-evento",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/KDS-S3-10POr",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602406469139358835205474851822974501572780333465650000000",
         "approximateArrivalTimestamp":1664028793294,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474851822974501572780333465650",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028793294
         }
      },
      {
         "recordId":"49633602406469139358835205474853031900321189264934043698000000",
         "approximateArrivalTimestamp":1664028820148,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474853031900321189264934043698",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028820148
         }
      }
   ]
}

Payload using Kinesis Firehose with Direct PUT:

{
   "invocationId":"25326478-5c08-45ad-a34d-7342edbc942d",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/PUT-S3-PCYyW",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602747871247603140515133584305740288993709252411394000000",
         "approximateArrivalTimestamp":1664029185290,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      },
      {
         "recordId":"49633602747871247603140515133585514666108608544585547778000000",
         "approximateArrivalTimestamp":1664029186945,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      }
   ]
}

We can go in the same direction as we went here #1499 and create separate classes for each type of payload.

@ran-isenberg
Copy link
Contributor Author

You use CDK right?

Yeah, we use CDK for e2 testing. But I have some considerations to make about e2e testing before we think about any implementation on Parser.

  • e2e tests will only work in v2, so in this PR we should not write end2end tests for the parser.
  • We have a release date for v2 later this month and we are working hard for it.
  • We believe that only the e2e tests are not enough in all cases. For e.g, if the parser fails for any change in the payload, we run a risk of breaking the function, because it is a runtime error. We need to write a RFC to think of a solution where we can update all lambda eventSources payload every X days/hours, because the payload can change on the service/lambda side and if we don't have the updated project it can be a problem. Maybe a separate project to deal with this, I'm not sure yet, but we'll start that discussion soon.

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough.
If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

@ran-isenberg
Copy link
Contributor Author

We need to update the code to account for differences between payloads.

Payload using Kinesis Firehose with a Kinesis Stream as source:

{
   "invocationId":"77c0024f-9869-4af0-951a-d83c102e8a63",
   "sourceKinesisStreamArn":"arn:aws:kinesis:us-east-1:200984112386:stream/kinesis-evento",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/KDS-S3-10POr",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602406469139358835205474851822974501572780333465650000000",
         "approximateArrivalTimestamp":1664028793294,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474851822974501572780333465650",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028793294
         }
      },
      {
         "recordId":"49633602406469139358835205474853031900321189264934043698000000",
         "approximateArrivalTimestamp":1664028820148,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg==",
         "kinesisRecordMetadata":{
            "sequenceNumber":"49633602406469139358835205474853031900321189264934043698",
            "subsequenceNumber":0,
            "partitionKey":"0",
            "shardId":"shardId-000000000003",
            "approximateArrivalTimestamp":1664028820148
         }
      }
   ]
}

Payload using Kinesis Firehose with Direct PUT:

{
   "invocationId":"25326478-5c08-45ad-a34d-7342edbc942d",
   "deliveryStreamArn":"arn:aws:firehose:us-east-1:200984112386:deliverystream/PUT-S3-PCYyW",
   "region":"us-east-1",
   "records":[
      {
         "recordId":"49633602747871247603140515133584305740288993709252411394000000",
         "approximateArrivalTimestamp":1664029185290,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      },
      {
         "recordId":"49633602747871247603140515133585514666108608544585547778000000",
         "approximateArrivalTimestamp":1664029186945,
         "data":"eyJ1c2VyX2lkIjoidXNlcjEiLCAic2NvcmUiOiAxMDB9Cg=="
      }
   ]
}

We can go in the same direction as we went here #1499 and create separate classes for each type of payload.

I've modeled the metadata as optional so it works for both cases. The only reason to split them is if it's officially documented as such. do you think that's the case? I didnt find a mention that metadata params are applicable only for put events. If we add by ourselves, it's more likely to be broken in the future.
WDYT?

@leandrodamascena
Copy link
Contributor

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough. If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

I'll open an issue and add this discussion in there, so we can keep focus here in this PR. ok?

@ran-isenberg
Copy link
Contributor Author

The way I'd handle schemas is to put the models in a new repo and maintain proper versions BUT it should be managed and updated by the AWS services, not the community. They should be aware of how we use and model the events since sometime the docs are not good enough. If there are any unit/e2e tests, I'd expect the services to run them each time they make a change, some sort of a contract test. The E2E tests will run here per PR but you probably wont catch.

I'll open an issue and add this discussion in there, so we can keep focus here in this PR. ok?

No problem.

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ran-isenberg could please check this changes?

@leandrodamascena
Copy link
Contributor

I've modeled the metadata as optional so it works for both cases. The only reason to split them is if it's officially documented as such. do you think that's the case? I didnt find a mention that metadata params are applicable only for put events. If we add by ourselves, it's more likely to be broken in the future. WDYT?

I totally agree with you! 💯

@ran-isenberg
Copy link
Contributor Author

@leandrodamascena added my small fix and fixed develop conflict

@leandrodamascena
Copy link
Contributor

@leandrodamascena added my small fix and fixed develop conflict

Heyy good morning and happy monday @ran-isenberg!
If you don't have any other considerations, I think we are good to merge it!
Thanks for addressing all feedback so quickly. ⏩

@ran-isenberg
Copy link
Contributor Author

LGTM

@leandrodamascena leandrodamascena changed the title Feature request: Kinesis Data Firehose event envelope and parser model feat(parser): add KinesisFirehoseModel Oct 10, 2022
@github-actions github-actions bot added the feature New feature or functionality label Oct 10, 2022
@leandrodamascena leandrodamascena merged commit 81afadb into aws-powertools:develop Oct 10, 2022
@ran-isenberg ran-isenberg deleted the firehose branch October 23, 2022 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature New feature or functionality size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants