Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of Parent task not depending on progress vs.total #91

Open
Skrattoune opened this issue Mar 22, 2022 · 11 comments
Open

Status of Parent task not depending on progress vs.total #91

Skrattoune opened this issue Mar 22, 2022 · 11 comments

Comments

@Skrattoune
Copy link
Contributor

Hi,
My tasks are organised as a set of sub-tasks
image
completion of each sub-task is adding 1 to the progress of the parent task

however if you look at the first snapshot that shows only the columns visible without scrolling, one could think that all activities are complete and that it would be ok to switch off the server.

But if you scroll right to see the progression data, the situation is very different:
image

At the moment, the status of the Parent task is not related at all to the task progression (TaskModel.progress_info[0] vs. TaskModel.total)

In my use case, that would be necessary to take task progression into account when a total is specified.

How could I ensure that the "signal_complete" is not sent before each sub-task is complete?
and that it is sent once the last sub-task has been completed?

@jedie
Copy link
Collaborator

jedie commented Mar 24, 2022

How could I ensure that the "signal_complete" is not sent before each sub-task is complete?
and that it is sent once the last sub-task has been completed?

That's on the Huey side: If your main task just fires the sub tasks, then it's completed and Huey send this signal. Don't know if there is something in Huey to recognized this.

Think you have to "wait" in your main task that all sub task are finished... But if you wait, than maybe to ran into a deadlook, if some of the sub tasks has a problem?!?

Quick idea: Maybe the main task can "wait" until all sub tasks signals are one of these:

ENDED_HUEY_SIGNALS = (

@Skrattoune
Copy link
Contributor Author

Thanks Jens,

Exploring further the code for v0.5.0, I had another idea that will require less access to database.
(by the way : great simplification ! congrats)

I'm testing this idea locally since yesterday.
I'll push it between today and week-end.

I'll propose you as well some things I have been using and could prove useful to other users

Skrattoune added a commit to Skrattoune/django-huey-monitor that referenced this issue Mar 24, 2022
Used for displaying parent_task progression (cf boxine#91)
Skrattoune added a commit to Skrattoune/django-huey-monitor that referenced this issue Mar 24, 2022
cf boxine#91:
displaying progression for parent_tasks if execution is on-going
(format: f'{obj.progress_count}/{obj.total}')
or last_signal
@Skrattoune
Copy link
Contributor Author

Now you have the different PRs
don't hesitate to adapt to your own style
or to ask clarifications if some things are not self-explaining

I saw also that one of the automatic test didn't succeed, but I have no clue why.
I'll need your input there.
Have fun !

@Skrattoune
Copy link
Contributor Author

Skrattoune commented Mar 25, 2022

I just realized this morning that in the admin, the parent_task with the sub-task which was latest updated only appeared in 4th position.
This is not very logic, and it makes it difficult to quickly assess where we are in the execution when a number of tasks have been launched.

  • I therefore propose to update update_dt of the parent task when a signal is received from a sub-task

Skrattoune added a commit to Skrattoune/django-huey-monitor that referenced this issue Mar 25, 2022
cf boxine#91: update update_dt of the parent task when a signal is received from a sub-task
jedie added a commit that referenced this issue Mar 30, 2022
#91 display progression for on-going parent_tasks
@jedie
Copy link
Collaborator

jedie commented Mar 30, 2022

I therefore propose to update update_dt of the parent task when a signal is received from a sub-task

Sounds a good idea!

jedie added a commit that referenced this issue Mar 30, 2022
Revert "#91 display progression for on-going parent_tasks"
@Skrattoune
Copy link
Contributor Author

I saw you merged than reverted.
It didn't work for you ?

@Skrattoune
Copy link
Contributor Author

  • we need also to check that the duration of the parent_task is updated according to the last update_dt (which is not the case on the version of huey-monitor which is currently running on my prod system)

I have a currently running task which runs for 10-15h and has a duration of 30.5 seconds :-)

@jedie
Copy link
Collaborator

jedie commented Apr 11, 2022

I saw you merged than reverted.
It didn't work for you ?

Yes, see: #96 (comment)

Think it's all a little complicated.
First of all, we need to distinguish two things:

  1. The status of a Task/Subtask (e.g.: "executing", "complete" etc.) -> The current problem, here, isn't it?
  2. The progress values and total/percentage/throuput etc. -> For this we have cumulate2parents=True/False

Then we have two kind of "Task/Subtask" usage:

  1. The maintask divided the work into chunks and start a subtask for every chunk -> cumulate2parents=True
  2. There are different jobs to do: A maintask starts "independ" subtask. Every subtask has own/different total count -> cumulate2parents=False

The first one, is easy and the current supported implementation: All subtask progress are cumulate to the main task. The test project contains the parallel_task() example here:

@task(context=True)
def parallel_sub_task(task, parent_task_id, item_chunk, **info_kwargs):
"""
Useless example: Just calculate the SHA256 hash from all files
"""
# Save relationship between the main and sub tasks:
TaskModel.objects.set_parent_task(
main_task_id=parent_task_id,
sub_task_id=task.id
)
total_items = len(item_chunk)
# Init progress information of this sub task:
process_info = ProcessInfo(
task, total=total_items, parent_task_id=parent_task_id, **info_kwargs
)
for entry in item_chunk:
# ...do something with >entry< ...
logger.info('process %s', entry)
time.sleep(1)
# Update sub and main task progress:
process_info.update(n=1)
logger.info('Chunk finish: %s', process_info)
@task(context=True)
def parallel_task(task, total=2000, task_num=3, **info_kwargs):
"""
Example of a parallel processing task.
"""
# Fill main task instance:
ProcessInfo(task, total=total, **info_kwargs)
# Generate some "data" to "process" in parallel Huey tasks
process_data = list(range(total))
# Split the file list into chunks and fire Huey tasks for every chunk:
chunk_size = math.ceil(total / task_num)
for no, chunk in enumerate(chunk_iterable(process_data, chunk_size), 1):
# Start sub tasks
logger.info('Start sub task no. %i', no)
time.sleep(5)
parallel_sub_task(parent_task_id=task.id, item_chunk=chunk, **info_kwargs)

For the second (your) case: We didn't have a example in test project. We should add a example to test everything.

About the main-/subtask status: This we have to implement some logic here for all usecases to display the "effective status" of the main task, e.g.:

  • If one of the sub task is "executing" -> main task == "executing"
  • If all sub tasks are "complete" -> main task "complete"
  • If all sub task are "complete" and at least one sub task is "failed" -> main task "failed"

Think this can be done independent in a separated PR.

@Skrattoune
Copy link
Contributor Author

Hi Jedie,

It took me some time to go through. I must say that I don't fully follow your manipulations with PRs, reversion & all.
I tried to cut the things I had implemented on my system in 3 PRs which could be self-difficient.

I did integrate the test project as part of the testing files in PR #97.
I tried to mimick as much as possible the way you are doing it.

I agree with your analysis, and the last part (main-/subtask status), I'm less confortable with the implementation

@Skrattoune
Copy link
Contributor Author

could you give me a feedback on this?

@Skrattoune
Copy link
Contributor Author

Skrattoune commented Nov 28, 2022

About the main-/subtask status: This we have to implement some logic here for all usecases to display the "effective status" of the main task, e.g.:

  • If one of the sub task is "executing" -> main task == "executing"
  • If all sub tasks are "complete" -> main task "complete"
  • If all sub task are "complete" and at least one sub task is "failed" -> main task "failed"

Think this can be done independent in a separated PR.

Hi @jedie,

I think I found an "easy" solution for this:
Signals are associated to task_models using tasks.store_signals
I propose to add to tasks.store_signals the following lines :

        if task_model_instance.parent_task:
            update_task_instance(
                instance=task_model_instance.parent_task,
                last_signal=last_signal,
            )

That way, the last_signal will be associated to the parent task in addition to the sub-task

  • If all sub tasks are "complete" -> main task "complete" (because the last signal will be "complete")
  • If one of the sub task is sending "executing" signal -> main task == "executing"
  • if the last signal from a sub-task is sending "complete" => in most cases, another sub-task is launched, then status wiull turn again to "executing". If not, status is "complete" ... which anyway is the case nowadays
  • If one of the sub task is sending "error" signal -> main task == "error"
    with this solution, status "error" only stays if no new signal is associated to the parent task

We could easily prevent this "error" status to be overwritten, for instance by checking, at the begining of tasks.update_task_instance

if instance.state.signal_name = 'error':
    return

the main drawback for me would be that we are loosing the information that there is still a task running
... and as a consequence it is unwise to kill the huey server, for instance for fixing a bug

So I would prefer the situation where the main-task is always associated with the latest signal of the sub-tasks

What do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants