Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor sdlf-ddk-lightweight #15

Merged
merged 4 commits into from
Nov 2, 2022
Merged

refactor sdlf-ddk-lightweight #15

merged 4 commits into from
Nov 2, 2022

Conversation

RLashofRegas
Copy link

Description of changes:

  • refactor sdlf-ddk-lightweight example to better support custom pipelines (individual changes summarized below)
  • renamed various files to make purpose more clear, e.g. sdlf_pipeline_stack -> sdlf_base_stack, sdlf_dataset_stack -> standard_dataset_stack
  • re-organized files to make clear association with each pipeline.
    • moved code defining "standard pipeline" out of sdlf_pipeline_stack into a subfolder pipelines/standard_pipeline/standard_pipeline.py also moved sdlf_dataset_stack.py into this folder and renamed as mentioned above.
    • moved sdlf_heavy_transform.py and sdlf_light_transform.py to pipelines/common_stages/...
    • moved foundations stack to foundations folder
  • refactored light and heavy transform stages for better code re-use
  • added more type hints and fixed linting errors on the files that I refactored (i.e. not everywhere)
  • Made standard pipeline no longer dependent on dataset (each pipeline should support more than 1 dataset). Included removing glue crawler from pipeline and replacing with code in each glue job to update catalog on write
  • Added CustomPipeline and CustomDataset as an example of how customers can add custom pipelines.
  • added SDLFPipeline protocol (interface)
  • updated README and architecture diagrams to reflect the changes
  • moved routing function within pipeline definition (because it is specific to the standard pipeline) and updated S3 object created EventBridge rules to only pick up events for the specific datasets registered to that pipeline.
  • explicitly specified stack names so that they are easier to understand from the CloudFormation UI
  • changed parameters.json to have a config element for pipeline specific configuration to support custom pipelines.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Contributor

@malachi-constant malachi-constant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks for the cleanup and additions here! Haven't tested a deployment yet but will try that now.

sdlf-ddk-lightweight/README.md Show resolved Hide resolved
sdlf-ddk-lightweight/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kukushking kukushking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thank you very much for the refactoring. Added a few minor comments & clarifications

Copy link

@LeonLuttenberger LeonLuttenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of nitpicky comments but overall this looks great! Awesome work

Robin Zimmerman added 2 commits November 2, 2022 08:23
- use _environment_id from parent class
- only grant start execution to our state machine
@RLashofRegas
Copy link
Author

I have tested the most recent version end-to-end and everything works.

@malachi-constant malachi-constant merged commit 90d1fc9 into aws-samples:main Nov 2, 2022
@RLashofRegas RLashofRegas deleted the refactor-sdlf-ddk-lightweight branch November 2, 2022 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants