Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

What happens if a job is killed while calling BB_StartTransfer()? #931

Closed
tonyhutter opened this issue Jun 10, 2020 · 2 comments
Closed

Comments

@tonyhutter
Copy link

Question:

What happens if a job is killed while calling BB_StartTransfer()? Is it possible to start a transfer for a job after a job is killed?:

  1. job1 calls BB_StartTransfer()
  2. BB_StartTransfer() sends a message to bbserver saying "start the transfer".
  3. job1 dies, BB_StartTransfer() doesn't return
  4. job2 gets launched and is assigned job1's node & SSD.
  5. bbserver is really slow, and finally gets job1's "start the transfer" message and starts the transfer

Answer:

place final answer here

Approach:

here replace this with a short summary of how you addressed the problem. in the comments place step by step notes of progress as you go

What is next:

Define the next steps and follow up here

@tgooding
Copy link
Contributor

I suspect you are referring more to jobsteps, than jobs here. But the answer is the same with a job containing a single step. The 2nd job on the same node is orthogonal as it uses a separate LV and separate connection to bbProxy.

BBAPI calls are bundled and sent to bbProxy to carry out the operation and there is a single response back to BBAPI when its done. (i.e., there are not multiple volleys)

If app death occurs before bbProxy receives the message, bbProxy will detect the socket close and cleanup. No BB_StartTransfer will have occurred.

If app death occurs after bbProxy receives the message, but before bbProxy has completed all the steps of the operation. The operation will complete (e.g., it is not asynchronously terminated), but sending the response back to BBAPI will fail. bbProxy will then cleanup its connections. In the case of a StartTransfer, assuming the parameters and files were ok, the transfer should start/complete as expected.

@tonyhutter
Copy link
Author

@tgooding thanks for the write-up. It sounds like there's not a possibility of a race condition then. Closing the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants