-
Notifications
You must be signed in to change notification settings - Fork 0
Add ECSClient module #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2579b74 to
2eee405
Compare
2eee405 to
4637d0a
Compare
tests/conftest.py
Outdated
| def ecs_run_task_response_success(): | ||
| with open("tests/fixtures/ecs_run_task_response_success.json") as file: | ||
| return json.loads(file.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to use the boto3.ECS.client.run_task mocked response from moto, but a lot of the standard response parameters were excluded from the mocked response. I opted to create a fixture by creating a JSON file of the de-identified output from a test run.
10a9e0c to
e7776c7
Compare
9124648 to
d5aee02
Compare
9422868 to
298b284
Compare
webapp/utils/aws/ecs.py
Outdated
| def monitor_task(self, task: str) -> dict: | ||
| """Polls ECS for task status updates. | ||
| This method will periodically (every 5 seconds) check for the status | ||
| of the running ECS task that matches the provided task ARN. | ||
| When the task run is complete, details of the run are returned. | ||
| """ | ||
| while True: | ||
| response = self.client.describe_tasks(cluster=self.cluster, tasks=[task]) | ||
| task_run_details = self.parse_run_details(response) # type: ignore[arg-type] | ||
| task_status = task_run_details["status"] | ||
|
|
||
| message = f"Status for task {task}: {task_status}" | ||
| logger.info(message) | ||
|
|
||
| if task_status == "STOPPED": | ||
| logger.info("Task run has completed.") | ||
| break | ||
| time.sleep(5) | ||
| return task_run_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will likely be updated once we have a clearer idea of how to display running tasks on the front end (i.e., loading / status updates). For now, this function demonstrates how boto3.ECS.Client.describe_task can be used to check on the status of a running ECS task.
298b284 to
813d48a
Compare
813d48a to
326c6e4
Compare
| @pytest.fixture | ||
| def mock_ecs_task_state_transitions( | ||
| mock_ecs_cluster, mock_ecs_task_definition, mock_ecs_network_config | ||
| ): | ||
| """See https://github.com/getmoto/moto/blob/master/tests/test_ecs/test_ecs_boto3.py.""" | ||
| state_manager.set_transition( | ||
| model_name="ecs::task", transition={"progression": "manual", "times": 1} | ||
| ) | ||
|
|
||
| with mock_aws(): | ||
| ecs = boto3.client("ecs", region_name=AWS_DEFAULT_REGION) | ||
| response = ecs.run_task( | ||
| cluster="mock-sapinvoices-ecs-test", | ||
| launchType="FARGATE", | ||
| networkConfiguration=mock_ecs_network_config, | ||
| overrides={}, | ||
| taskDefinition="mock-sapinvoices-ecs-test:1", | ||
| ) | ||
| yield response["tasks"][0]["taskArn"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In implementing this fixture, I referenced the following moto test example.
326c6e4 to
79e256a
Compare
| def _getenv(self, name: str) -> Any: | ||
| if value := os.getenv(name): | ||
| return value | ||
| if name in self.REQUIRED_ENV_VARS: | ||
| message = f"'{name}' is a required environment variable." | ||
| raise AttributeError(message) | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by the @property env access methods. This differs from __getattr__, which is only called for attributes that are not explicitly set for the class.
The purpose of _getenv() is to implement some error handling so that the @property env access methods can have a return type of str, which circumvents the following mypy error when ECSClient is instantiated:
error: Incompatible types in assignment (expression has type "str | None", variable has type "str") [assignment]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions but this is nice and clean!
| class ECSClient: | ||
| """ECS Client for running and monitoring task runs.""" | ||
|
|
||
| cluster: str = field(factory=lambda: Config().ALMA_SAP_INVOICES_ECS_CLUSTER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious since I haven't used it before, what does the field(factory=lambda: pattern do here? Is this related to the @property decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can weigh in here as a proposer of this pattern. Using factory=lambda... allows each instance of this class to get fresh, up-to-date values from Config.
This allows for easier testing where monkeypatching values in conftest will successfully show up in any instance of this class. Otherwise, whenever the class is first imported -- and this is tricky to control, kind of a whack-a-mole -- whatever the env vars are at that moment will be the class defaults for those properties forever after, regardless of monkeypatching.
I think it comes down to our team's convention of using attrs, which I do like for the most part.
If we were not using attrs, I would envision this more like this in vanilla python:
class ECSClient:
def __init__(self):
config = Config()
self.cluster = config.ALMA_SAP_INVOICES_ECS_CLUSTER
self.task_definition = config.ALMA_SAP_INVOICES_ECS_TASK_DEFINITION
self.network_configuration = config.ALMA_SAP_INVOICES_ECS_NETWORK_CONFIG
self.container = config.ALMA_SAP_INVOICES_ECR_IMAGE_NAMEThat said, we could kind of split the difference using attrs by using __attrs_post_init__:
@define
class ECSClient:
cluster: str = field(default=None)
task_definition: str = field(default=None)
network_configuration: str = field(default=None)
container: str = field(default=None)
def __attrs_post_init__(self) -> None:
config = Config()
if self.cluster is None:
self.cluster = config.ALMA_SAP_INVOICES_ECS_CLUSTER
if self.task_definition is None:
self.task_definition = config.ALMA_SAP_INVOICES_ECS_TASK_DEFINITION
if self.network_configuration is None:
self.network_configuration = config.ALMA_SAP_INVOICES_ECS_NETWORK_CONFIG
if self.container is None:
self.container = config.ALMA_SAP_INVOICES_ECR_IMAGE_NAMEApologies @jonavellecuerdo for making this a discussion point in this PR, but would be nice to find convention for us all to use.
FWIW, I've used it more sparingly in other projects: HRQB QBClient and DWClient, etc.
I think one argument for the factory pattern would be minimizing use of __attrs_post_init__ which some of attrs functionality kind of avoids needing.
In summary: I'm in support of this pattern, but curious what the group thinks, as I think it will continue to come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time reading documentation on attrs.Factory but I think @ghukill described it best above in regards to "why field(default=Config().<ENV> doesn't work for our unit tests":
whenever the class (
webapp.config.Configis first imported / whatever the env vars are at that moment will be the class defaults for those properties forever after, regardless of monkeypatching)
I prefer the current approach over using __attrs_post_init__ (as shown in @ghukill 's example above); the latter, as the doc describes,
runs after attrs is done initializing your instance. This is useful if you want to derive some attribute from others or perform some kind of validation over the whole instance.
I interpret retrieval of the environment
variables as part of / the initialization step. Conceptually, the use of __attrs_post_init__ might be confusing. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonavellecuerdo I agree that it is a part of initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, think this is looking real good, and the logic and structure make sense to me.
Requested some changes (e.g. timeout for while loop in monitoring), but most other comments are discussion or optional.
Given that this ECS client is not wired into the app now, I'm feeling good about the structure and functionality at this point. Maybe when all the pieces come together, there will be some slight tweaks, but this seems like a good base.
| class ECSClient: | ||
| """ECS Client for running and monitoring task runs.""" | ||
|
|
||
| cluster: str = field(factory=lambda: Config().ALMA_SAP_INVOICES_ECS_CLUSTER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can weigh in here as a proposer of this pattern. Using factory=lambda... allows each instance of this class to get fresh, up-to-date values from Config.
This allows for easier testing where monkeypatching values in conftest will successfully show up in any instance of this class. Otherwise, whenever the class is first imported -- and this is tricky to control, kind of a whack-a-mole -- whatever the env vars are at that moment will be the class defaults for those properties forever after, regardless of monkeypatching.
I think it comes down to our team's convention of using attrs, which I do like for the most part.
If we were not using attrs, I would envision this more like this in vanilla python:
class ECSClient:
def __init__(self):
config = Config()
self.cluster = config.ALMA_SAP_INVOICES_ECS_CLUSTER
self.task_definition = config.ALMA_SAP_INVOICES_ECS_TASK_DEFINITION
self.network_configuration = config.ALMA_SAP_INVOICES_ECS_NETWORK_CONFIG
self.container = config.ALMA_SAP_INVOICES_ECR_IMAGE_NAMEThat said, we could kind of split the difference using attrs by using __attrs_post_init__:
@define
class ECSClient:
cluster: str = field(default=None)
task_definition: str = field(default=None)
network_configuration: str = field(default=None)
container: str = field(default=None)
def __attrs_post_init__(self) -> None:
config = Config()
if self.cluster is None:
self.cluster = config.ALMA_SAP_INVOICES_ECS_CLUSTER
if self.task_definition is None:
self.task_definition = config.ALMA_SAP_INVOICES_ECS_TASK_DEFINITION
if self.network_configuration is None:
self.network_configuration = config.ALMA_SAP_INVOICES_ECS_NETWORK_CONFIG
if self.container is None:
self.container = config.ALMA_SAP_INVOICES_ECR_IMAGE_NAMEApologies @jonavellecuerdo for making this a discussion point in this PR, but would be nice to find convention for us all to use.
FWIW, I've used it more sparingly in other projects: HRQB QBClient and DWClient, etc.
I think one argument for the factory pattern would be minimizing use of __attrs_post_init__ which some of attrs functionality kind of avoids needing.
In summary: I'm in support of this pattern, but curious what the group thinks, as I think it will continue to come up.
webapp/utils/aws/ecs.py
Outdated
| ) | ||
| valid_tasks = [task.split("/")[-1] for task in response["taskDefinitionArns"]] | ||
|
|
||
| task_family_revision = self.task_definition.split("/")[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this logic in a couple of places for task_family_revision; maybe that could be a method or property as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another @property-decorated method called task_family_revision.
4c067f1 to
46ab269
Compare
46ab269 to
a67789c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I would remove [wip] from the commit label before merging
| class ECSClient: | ||
| """ECS Client for running and monitoring task runs.""" | ||
|
|
||
| cluster: str = field(factory=lambda: Config().ALMA_SAP_INVOICES_ECS_CLUSTER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonavellecuerdo I agree that it is a part of initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for the updates and discussions. Looking forward to seeing this wired up in the flask app.
Why these changes are being introduced: * Provide a client for running and monitoring ECS tasks How this addresses that need: * Add ECSClient module with key commands: * execute_review_run * execute_final_run * monitor_task * Create custom exceptions * ECSTaskDefinitionDoesNotExistError * ECSTaskRuntimeExceededTimeoutError * Add fixtures and unit tests for ECSClient commands Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1015
a67789c to
84ada58
Compare
Purpose and background context
Add an
ECSClientmodule that can run and monitor ECS tasks.Integrating
webutils.aws.ecs.ECSClientwith the Flask application is out of scope for this PR. There will be a separate effort for workflow orchestration. That said, the orchestrator (TBD) is likely to use the client like so:Note: The PR only appears to have a lot of changes due to the updated dependencies. The main items for review are:
How can a reviewer manually see the effects of these changes?
make testand verify all tests are passing.Note: For now, running the preliminary set of unit tests is the best way to confirm that the implemented
webapp.utils.aws.ecs.ECSClientmodule is working as expected.Includes new or updated dependencies?
YES - Installed libraries: [
attrs,boto3-stubs,moto]Changes expectations for external applications?
NO
What are the relevant tickets?
https://mitlibraries.atlassian.net/browse/IN-1015
Developer
Code Reviewer(s)