Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Houdini PDG Afanasy Scheduler #514

Open
timurhai opened this issue Jul 6, 2021 · 9 comments
Open

Houdini PDG Afanasy Scheduler #514

timurhai opened this issue Jul 6, 2021 · 9 comments
Assignees

Comments

@timurhai
Copy link
Member

timurhai commented Jul 6, 2021

Hi everybody!
This issue is created to discuss PDG Afanasy scheduler implementation.
Since the implementation scheduler is started, I decided to create a new issue for more concrete discussion.

@timurhai timurhai self-assigned this Jul 6, 2021
timurhai added a commit that referenced this issue Jul 6, 2021
@timurhai
Copy link
Member Author

timurhai commented Jul 6, 2021

Here is the first commit.

It uses dynamic method. On work item schedule a new block/task will be appended to an existing job.

Each TOP node work items are joined in a block. In feature we should have an ability to setup TOP node Afanasy task parameters via block parameters (capacity, service, parser and so on). Also it helps to visualize job structure in GUIs.

For now "control" job is just an empty task - an opened Houdini scene is needed with a running graph.

@timurhai
Copy link
Member Author

timurhai commented Jul 6, 2021

There is a lots of work to do.
Only few features/callbacks are implemented, no checks for any errors can happen.
I can say that for now it is the minimal version that can just work, if everything is just OK.

timurhai added a commit that referenced this issue Jul 6, 2021
timurhai added a commit that referenced this issue Jul 7, 2021
…n each cook start.

Before it, sheduler worked just once, than Houdini restart was needed.

Comments added on virual functions (callbacks)

References #514.
@lithorus
Copy link
Member

lithorus commented Jul 7, 2021

Would it perhaps be an idea to create the scheduler 100% using python and replace the .hda?

This way it's easier to subclass it to make customizations.

@timurhai
Copy link
Member Author

timurhai commented Jul 8, 2021

If it is possible, it will be better.
Is it possible? (may be i missed something)

timurhai added a commit that referenced this issue Jul 8, 2021
Now a new Afanasy job will be created on the first item onSchedule.

Job, block and task creation are in separate functions.

References #514.
@lithorus
Copy link
Member

lithorus commented Jul 8, 2021

Yes, look at the other schedulers.

In the templateBody class method. I really hope they extend this to not just TOP.

@timurhai
Copy link
Member Author

timurhai commented Jul 8, 2021

It seems that layout is not supported by templateBody
https://www.sidefx.com/forum/topic/74776/?page=1#post-318968

@lithorus
Copy link
Member

lithorus commented Jul 8, 2021

Hmmm.. I will try and see if something can be done through "on creation" callbacks..

timurhai added a commit that referenced this issue Jul 13, 2021
References: #514.

(documentation just started - header added)
@timurhai
Copy link
Member Author

"Submit Job As Graph" sends a job with 1 block and 1 task to cook TOP network.
This job will create another separate job and dynamically append job/tasks to it.
It will works the same as you to cook from Houdini session (that task command does the same).
So you can re-cook w/o opening Houdini, if you delete work items job and restart graph job.

timurhai added a commit that referenced this issue Jul 14, 2021
On farm artist machine can be not reachable by name.

References: #514.
@timurhai
Copy link
Member Author

By default, workItemResultServerAddr() returns local host name and port.
This address is used to notify PDG (in an opened Houdini session) that an item is done.
As Afanasy task can be not done if an item is in a batch.
This way PDG can start to render if the first frames of a simulation finished, but not the entire simulation task.

But on our farm, artist machine is not reachable by name, only by IP.

The solution to find a local IP address is used from:
https://stackoverflow.com/questions/166506/finding-local-ip-addresses-using-pythons-stdlib?page=1&tab=votes#tab-top

May be better to create an option (checkbox) for this on the scheduler node.

timurhai added a commit that referenced this issue Jul 20, 2021
Now you can specify graph job capacity and hosts mask to run it on some slow machine,
or run it on the same host and at the same time with work items tasks.

References #514.
timurhai added a commit that referenced this issue Jul 26, 2021
timurhai added a commit that referenced this issue Jul 26, 2021
This can help to assign cook job on a special host(s) and use a special ticket (licence).

References #514.
timurhai added a commit that referenced this issue Jul 27, 2021
Default service is hbatch and parser is mantra.
If it is a ropfetch with mantra, service is hbatch_mantra.
If it is an ffmpegencodevideo service and parser is ffmpeg.
That is all checks for now.
This function will be improved in feature.

References #514.
timurhai added a commit that referenced this issue Jul 27, 2021
This is a default parameter on scheduler nodes.
And it behaves as on other schedulers.

References: #514.
timurhai added a commit that referenced this issue Jul 28, 2021
Report work item fail on error.
Block on failed work items.
Use IP address as working item result server.
Tick Period and Max Items Per Tick.

References: #514.
timurhai added a commit that referenced this issue Aug 3, 2021
Scheduler node screenshot for documentation added.

References: #514.
timurhai added a commit that referenced this issue Aug 4, 2021
timurhai added a commit that referenced this issue Aug 6, 2021
timurhai added a commit that referenced this issue Aug 10, 2021
References: #514.
timurhai added a commit that referenced this issue Aug 12, 2021
timurhai added a commit that referenced this issue Aug 13, 2021
Task environment variables are passed to parser.

References: #514.
timurhai added a commit that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants