-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send EventBridge Events for MIT Harvests #61
Conversation
Why these changes are being introduced: Covered in more detail in docstrings, MIT harvests are expected to send EventBridge events for information known about MIT resources harvested e.g. deleted or restricted status. These events are handled by a StepFunction concerned with copying and deleting files. These changes implement an MITHarvester specific harvest step to send out EventBridge events for items processed. How this addresses that need: * Adds new EventBridgeClient class for sending messages * Completes MITHarvester.send_eventbridge_event harvest step Side effects of this change: * EventBridge events will now get published for MIT Harvests, which may invoke any listening AWS assets Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/GDT-87
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ghukill ! I think this looks good! I Just one question came to mind:
- Reading about the 3 paths the StepFunction "geo-upload--shapefile-handler" can take, is the following statement true: nothing gets deleted from the "Restricted" S3 bucket? Is this true even if the GIS team were to delete a restricted record?
Ah, nice catch!! Yes, commit added that adds "AND Restricted":
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the review delay!
@@ -97,20 +99,76 @@ def harvester_specific_steps(self, records: Iterator[Record]) -> Iterator[Record | |||
records = self.filter_failed_records(self.delete_sqs_messages(records)) | |||
yield from records | |||
|
|||
def send_eventbridge_event(self, records: Iterator[Record]) -> Iterator[Record]: | |||
"""Method to send EventBridge event indicating access restrictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great detail here
Purpose and background context
This PR allows the GeoHarvester to send EventBridge (EB) events when processing MIT harvests.
From the
send_eventbridge_events()
method:NOTE: It is possible that OGM may harvests may also need to send EventBridge events, e.g. we determine that an OGM resource is deleted, then sending an EB event is the mechanism by which the metadata is deleted from the Public CDN bucket. However, OGM harvests have been deprioritized until MIT harvests are fully formed, and so I'd propose to keep this logic as an MIT "harvester specific step" until the OGM work is started. Perhaps they will duplicate some logic, perhaps it will be refactored as a shared step.
How can a reviewer manually see the effects of these changes?
1- Set env vars:
2- Ensure AWS credentials are set for
TimdexManagers
in Dev13- Note the contents of Public CDN S3 bucket. Easiest way is likely to note timestamps of files, so you can observe they were updated.
4- Run local harvest of 4 sample zip files in fixtures:
After the harvest, note the following:
Note that a the zip file
SDE_DATA_AE_A8GNS_2003.zip
was created or modified in the Public CDN bucket; this is a result of the StepFunction handling the EventBridge event and copying the zip file from the Restricted CDN to Public CDN because the normalization of metadata indicated it is NOT a restricted resourceObserve the executions for the associated StepFunction; it should have 4 new invocations from the 4 records processed, each sending an EventBridge event
DeleteDataFromPublic
, as those resources were Restricted, therefore we need to remove from the Public bucket (gracefully handling if the file was not Public previously, so more like an attempt)Includes new or updated dependencies?
YES
Changes expectations for external applications?
YES - StepFunction may be invoked from MIT harvests
What are the relevant tickets?
Developer
Code Reviewer(s)