Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using PFIO server in new History3G design (stage, done, wait) #2792

Open
bena-nasa opened this issue Apr 25, 2024 · 12 comments
Open

Using PFIO server in new History3G design (stage, done, wait) #2792

bena-nasa opened this issue Apr 25, 2024 · 12 comments
Assignees
Labels
📈 MAPL3 MAPL 3 Related ❓ Question Further information is requested

Comments

@bena-nasa
Copy link
Collaborator

bena-nasa commented Apr 25, 2024

Current the PFIO server of any variety uses the the collective_stage_data method to stage data to be sent to the output server. Then after all staging has been done the client must call

      call o_Clients%done_collective_stage(_RC)
      call o_Clients%post_wait()

This presents a problem for the proposed History3G design. The individual collection components would call the collective_stage_data, however, the logical place for the done and wait is in the main History component, not the collection components after all the children have run.

However, we also have no need to call the done and wait if the no component actually wrote anything. But the parent has no way to know this information, that the providence of the child...

Just putting here so we don't forget about this

@bena-nasa bena-nasa added ❓ Question Further information is requested 📈 MAPL3 MAPL 3 Related labels Apr 25, 2024
@tclune
Copy link
Collaborator

tclune commented Apr 30, 2024

Interesting. Any idea if we lose any performance in practice by always posting done/wait?

Presumably old History has some check to see if any collections are active in a time step. It would not be unreasonable to have such logic in 3G. The gridcomp is awfully thin as is. Almost begging for a bit more responsibility.

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented May 3, 2024

@tclune
Right in the old History we only call the done/wait if SOME collection wrote that timestep, otherwise it is not called. I would think we would want this in the new History, it's just the the logical place in HistoryGridComp, NOT the HistoryCollectionGridComp, how would HistoryGridComp know if any of its children actually wrote. I could parse the same file and make the same alarms, but that seems like its just duplicating the same logic...

As for performance, the question is, if we call the done/wait all the time in HistoryGridcomp, I guess what if somehow the front end of the server were still transmitting data to the backend, would that cause the front to be unresponsive and block on the client side?

@weiyuan-jiang do you know the answer to that offhand?

@tclune
Copy link
Collaborator

tclune commented May 3, 2024

I could parse the same file and make the same alarms, but that seems like its just duplicating the same logic...

Ideally we will find something a bit more clever, but it may well come down to something like this.

@tclune
Copy link
Collaborator

tclune commented May 3, 2024

As for performance, the question is, if we call the done/wait all the time in HistoryGridcomp, I guess what if somehow the front end of the server were still transmitting data to the backend, would that cause the front to be unresponsive and block on the client side?

Yes - probably cannot take that risk.

@tclune
Copy link
Collaborator

tclune commented May 3, 2024

Maybe the client itself can decide that it has nothing in that iteration? Not good probably, but just trying to get more ideas on the table.

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented May 3, 2024

Maybe the client itself can decide that it has nothing in that iteration? Not good probably, but just trying to get more ideas on the table.

That seems like a good idea, seems like it has to know if it has something to do one would think and just not communicate with the server if it doesn't

Actually, didn't we make this change on the "Input" side because of the spurious time issues that were confusing us with UFS?

@weiyuan-jiang
Copy link
Contributor

The 'wait: probably would do no harm because there is nothing to wait. But I need to check the done message

@bena-nasa
Copy link
Collaborator Author

shoot, it looks we made a change in ExtData not to make this call, not at the client level :( Still seems like the client has to be aware what it is going to do in a way that it could say, don't do anything?

@weiyuan-jiang
Copy link
Contributor

Yes. I think we did that before. The same logic should apply. Where is the code ?

@weiyuan-jiang
Copy link
Contributor

I remembered that we used the size of IObundle to decide if iclients calls done or not.

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented May 3, 2024

Yep, just found that, we made a change in ExtData not to make a call to collective_prefetch, not at the client level (unless you've subsequently done some more work, but I didn't see anything in the git history that would imply that in pfio). So that doesn't help us here. We either have to be able to make the client smarter so it knows, you called done_collective_stage but there's nothing to, I don't need to talk to the server. I don't how much abstraction this buried under.

Otherwise back to the issue as I originally framed it.

@bena-nasa
Copy link
Collaborator Author

bena-nasa commented May 3, 2024

@weiyuan-jiang thinks he can modify the client for both I and O so done/wait simply do not communicate with server if nothing in queue do which could solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 MAPL3 MAPL 3 Related ❓ Question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants