Allow sharing task definitions amongst a worker process in order to conserve memory by matteius · Pull Request #219 · conductor-oss/python-sdk

matteius · 2023-10-31T21:05:04Z

Background: When invoking a conductor worker that has many tasks, some of which have work to do infrequently, and leveraging a Django monolith to utilize the ORM, it ends up consuming a lot of memory. With this optimization we can group tasks into lists that are shared and round-robined for that worker process, thus conserving memory.

Running an example worker set of tasks on a large Django monolith today:

Running the same worker tasks all within one worker using this optimization:

…t, the task is shared for the worker process and the worker process round robins between tasks.

coderabhigupta

I am not sure if this will work because get_task_definition_name is called at 3 places which are from inside __poll_task, __execute_task and __update_task. In run_once, we will end up operating on three task definitions in case we have multiple task definitions which will return erroneous results. Here is the aforementioned method

    def run_once(self) -> None:
        task = self.__poll_task()
        if task != None and task.task_id != None:
            task_result = self.__execute_task(task)
            self.__update_task(task_result)
        self.__wait_for_polling_interval()

I appreciate the optimization attempt but we need to implement something more robust here. Also, I am curious if your tests checked for accuracy and if you were able to confirm functional parity.

…rate on the same task definition

matteius · 2023-11-01T01:22:38Z

I think you are right -- I've pushed a refactor commit, I think now by leveraging cached_property and clearing it at the start of run_once we can be sure the name is consistent across those operations.

matteius · 2023-11-01T01:33:07Z

Actually -- I am trying to test this with the cached_property change and not quite working yet, let me give it another take.

matteius · 2023-11-01T02:22:46Z

OK -- The latest commit I pushed, I was able to test a 13 simple task worker where 3 of the tasks I was trying to execute and they did execute successfully and within short time.

coderabhigupta

Results might be accurate now but we still will end up referencing wrong task_definition_name for logging and metrics. See __execute_task and __update_task, they have this -
task_definition_name = self.worker.get_task_definition_name(). I think we might need to pass the task_definition_name from run_once and use it inside the two methods mentioned above.

coderabhigupta · 2023-11-01T13:17:30Z

+    def compute_task_definition_name(self):
+        if isinstance(self.task_definition_name, list):
+            task_definition_name = self.task_definition_name[self.next_task_index]
+            self.next_task_index = (self.next_task_index + 1) % len(self.task_definition_name)


We should probably check for emptiness to make sure we don't run into modulo by 0 if len(self.task_definition_name) == 0.

If someone instantiates a worker with an empty task list, would it be better to raise an error during initialization?

matteius · 2023-11-01T13:47:40Z

Results might be accurate now but we still will end up referencing wrong task_definition_name for logging and metrics. See __execute_task and __update_task, they have this - task_definition_name = self.worker.get_task_definition_name(). I think we might need to pass the task_definition_name from run_once and use it inside the two methods mentioned above.

I think those only get executed from within run_once? That is why I added the caching, so it remains consistent for each iteration of the loop -- but maybe I am not seeing the issue yet.

coderabhigupta · 2023-11-02T16:12:12Z

Results might be accurate now but we still will end up referencing wrong task_definition_name for logging and metrics. See __execute_task and __update_task, they have this - task_definition_name = self.worker.get_task_definition_name(). I think we might need to pass the task_definition_name from run_once and use it inside the two methods mentioned above.

I think those only get executed from within run_once? That is why I added the caching, so it remains consistent for each iteration of the loop -- but maybe I am not seeing the issue yet.

Can you turn on debug as check the logs to confirm what task_defination_name is getting logged.

matteius · 2023-11-03T23:17:25Z

@coderabhigupta It wasn't quite working as I found when writing a unit test, but I simplified how I was doing the caching and now it is working.

The logs were consistent with this change, but integration wise I am testing with a private worker and should not disclose what those logs look like.

matteius · 2023-11-03T23:18:03Z


    # Append task with the right shift operator `>>`
-    def __rshift__(self, task: TaskInterface | List[TaskInterface] | List[List[TaskInterface]]) -> Self:
+    def __rshift__(self, task: Union[TaskInterface, List[TaskInterface], List[List[TaskInterface]]]) -> Self:


pytest couldn't collect these tests because the type hint is wrong here.

Thanks for fixing this.

matteius · 2023-11-03T23:18:10Z

@@ -1,13 +1,11 @@
-from conductor.client.http.models.workflow_def import WorkflowDef
-from conductor.client.http.models.workflow_task import WorkflowTask


unused imports

matteius · 2023-11-03T23:41:09Z

Also fwiw I think the integration tests failed on the last Actions run because I lack some required env variables in my fork settings.

coderabhigupta · 2023-11-04T11:51:28Z


    # Append task with the right shift operator `>>`
-    def __rshift__(self, task: TaskInterface | List[TaskInterface] | List[List[TaskInterface]]) -> Self:
+    def __rshift__(self, task: Union[TaskInterface, List[TaskInterface], List[List[TaskInterface]]]) -> Self:


Thanks for fixing this.

matteius · 2023-11-07T21:50:52Z

@coderabhigupta Thanks for your approval -- is anything else needed from me to see this change make it into a release?

coderabhigupta · 2023-11-08T20:50:33Z

@coderabhigupta Thanks for your approval -- is anything else needed from me to see this change make it into a release?

@matteius I will merge and push this as part of our next release which should happen sometime next week.

Allow the task name to be either a string or a list -- when its a lis…

8e9ce90

…t, the task is shared for the worker process and the worker process round robins between tasks.

matteius commented Oct 31, 2023

View reviewed changes

Comment thread src/conductor/client/worker/worker_interface.py Outdated

coderabhigupta suggested changes Oct 31, 2023

View reviewed changes

Comment thread src/conductor/client/worker/worker.py Outdated

Comment thread src/conductor/client/worker/worker_interface.py Outdated

Refactor based on PR feedback -- use cached_property to ensure we ope…

215863b

…rate on the same task definition

matteius requested a review from coderabhigupta November 1, 2023 01:22

Works based on testing

5622cba

coderabhigupta suggested changes Nov 1, 2023

View reviewed changes

Corrections to caching approach and addition of unit test

b3786bd

matteius requested a review from coderabhigupta November 3, 2023 23:17

matteius commented Nov 3, 2023

View reviewed changes

coderabhigupta approved these changes Nov 4, 2023

View reviewed changes

coderabhigupta changed the base branch from main to 1.0.73-release-branch November 14, 2023 22:08

coderabhigupta merged commit 84ab28b into conductor-oss:1.0.73-release-branch Nov 14, 2023

coderabhigupta mentioned this pull request Nov 15, 2023

Fix do while task list and prototype shared tasks #223

Merged

coderabhigupta mentioned this pull request Dec 3, 2023

Added logging and gracefully terminating workers #230

Merged

		@@ -1,13 +1,11 @@
		from conductor.client.http.models.workflow_def import WorkflowDef
		from conductor.client.http.models.workflow_task import WorkflowTask

Conversation

matteius commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

coderabhigupta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matteius commented Nov 1, 2023

Uh oh!

matteius commented Nov 1, 2023

Uh oh!

matteius commented Nov 1, 2023

Uh oh!

coderabhigupta left a comment

Choose a reason for hiding this comment

Uh oh!

coderabhigupta Nov 1, 2023

Choose a reason for hiding this comment

Uh oh!

matteius Nov 1, 2023

Choose a reason for hiding this comment

Uh oh!

matteius commented Nov 1, 2023

Uh oh!

coderabhigupta commented Nov 2, 2023

Uh oh!

matteius commented Nov 3, 2023

Uh oh!

matteius Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

coderabhigupta Nov 4, 2023

Choose a reason for hiding this comment

Uh oh!

matteius Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

matteius commented Nov 3, 2023

Uh oh!

coderabhigupta Nov 4, 2023

Choose a reason for hiding this comment

Uh oh!

matteius commented Nov 7, 2023

Uh oh!

coderabhigupta commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matteius commented Oct 31, 2023 •

edited

Loading

coderabhigupta commented Nov 8, 2023 •

edited

Loading