-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documents decision to process all messages every run
Why are these changes being introduced: - we needed to decide if the application would handle the message polling from SQS or if we'd have an external process send messages to process into the application Relevant ticket(s): - https://mitlibraries.atlassian.net/browse/ETD-424 How does this address that need: - Outlines various scenarios we considered and documents which one we will be implementing
- Loading branch information
Showing
1 changed file
with
113 additions
and
0 deletions.
There are no files selected for viewing
113 changes: 113 additions & 0 deletions
113
docs/architecture-decisions/0005-process-all-documents-in-the-queue-every-run.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# 5. Process all documents in the queue every run | ||
|
||
Date: 2021-09-02 | ||
|
||
## Status | ||
|
||
Accepted | ||
|
||
## Context | ||
|
||
There is more than one solution to how to trigger this application to run, where it runs, and how much work the | ||
application will do in a single run. These options affect not on the infrastructure, but also how the application is | ||
designed. | ||
|
||
They don't affect how applications using this service will interact, which remains directly writing to and reading from | ||
SQS queues. This is solely about when and how to process the queues. | ||
|
||
### Lambda | ||
|
||
AWS Lambda can trigger for each message that lands in SQS. In this model, this application would process the single SQS | ||
message that triggered the lamda. | ||
|
||
Pros: | ||
|
||
- Fully automated | ||
- Solves "when to run" problem cleanly | ||
|
||
Cons: | ||
|
||
- Lambda can scale much faster than our DSpace instance and extreme care would be needed to ensure we don't DOS our | ||
DSpace in our normal submission work | ||
- We don't have any experience in working with Lambda, SQS, or containers in this design pattern and our initial | ||
exploration showed it would require us to design the application to run in Lambda (even if we containerize) rather | ||
than just making an application that works and deploying as a container (for example in Fargate) which leads to more | ||
complex development environments | ||
|
||
### Manual Trigger Fargate | ||
|
||
Pros: | ||
|
||
- we can develop the application to run locally and containerize that without taking any specific AWS Lambda | ||
requirements into account which will result in much more efficient development and ongoing maintenance | ||
- the main message loop happens in the python app which means one message processed at a time so we can be sure to not | ||
flood DSpace (we could still scale out the number of Fargate tasks to run more concurrent submissions if needed in | ||
the future) | ||
|
||
Cons: | ||
|
||
- someone has to click "run". Long term this is not acceptable, but initially this is not as bad as it sounds as we only | ||
run 3 times a year for ETD and 12 times a year for Wiley. | ||
|
||
### Clock Trigger Fargate | ||
|
||
This could be a future addition to the Manual Trigger Fargate solution above. | ||
|
||
Pros: | ||
|
||
- Same as Manual Trigger Fargate | ||
- Solves the "manual click" annoyance | ||
|
||
Cons: | ||
|
||
- Still doesn't run as soon as data is added to the queue. For instance, if data is submitted at 8am and our clock runs | ||
daily at 11pm we'd have several hours of unnecessary delay. Obviously we can run more frequently (such as hourly) to | ||
reduce the unnecessary delay, but never really eliminate it. | ||
|
||
### Special 'go now' message | ||
|
||
This service could define a specific SQS message that lambda could be listening for that signals all messages in a batch | ||
have been sent and processing can start. | ||
|
||
Pros: | ||
|
||
- Same as Manual Trigger Fargate | ||
- Processing can start immediately | ||
|
||
Cons: | ||
|
||
- adds complexity to the applications using this service as they'd need to not only send the data, but also the | ||
"please do work now" | ||
- adds complexity to our infrastructure (having both a lambda and a fargate task) | ||
|
||
### Airflow | ||
|
||
Pros: | ||
|
||
- Airflow can handle the SQS polling to know when to run | ||
- Airflow can handle the error queues for applications that prefer not to handle it themselves (i.e that might not | ||
have a strong need to store state locally, such as Wiley) | ||
|
||
Cons: | ||
|
||
- Our current Airflow instance is out of date and a bit fragile, so we'd want to upgrade / move it to a vended service | ||
before considering this option | ||
|
||
## Decision | ||
|
||
We will build out the Manual Trigger Fargate option for initial release and process all documents in the queue on every | ||
run. | ||
|
||
We will then evaluate further which method, either documented above or something we have not yet considered, of | ||
automation to implement. Leaving it as a manual process only is not intended. | ||
|
||
## Consequences | ||
|
||
Allowing development to proceed with no special knowledge of Lambda outweighs any other cons. | ||
|
||
Looping over the entire queue on each run also allows for a simpler local development environment (i.e. we don’t need to | ||
trigger the local application via lambda, we can just run it and it’ll loop over the queue it is connected to which can | ||
either be a moto queue or a real AWS SQS queue depending on the needs at the time). | ||
|
||
As we gain more knowledge of both the requirements and of options in AWS, we can evaluate which method of automation | ||
best fits the problem and our team. |