Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange start/end time issue with quick-running dependent jobs #324

Open
sb10 opened this issue Jul 1, 2020 · 0 comments
Open

Strange start/end time issue with quick-running dependent jobs #324

sb10 opened this issue Jul 1, 2020 · 0 comments
Labels
Projects

Comments

@sb10
Copy link
Member

sb10 commented Jul 1, 2020

The following 2 jobs claim they started and ended in the same millisecond, and their output files had the same timestamp (but only to second accuracy). wr notes that the jobs took over 0.3s walltime, so clearly the reported start and end time are bogus. The concern is, did they actually run sequentially according to the dependency (1 is dependent on the other)?

[
  {
    "Key": "e248a45513c553563647f88a4490150f",
    "RepGroup": "28204-run_archival_in_progress",
    "DepGroups": [
      "run_archival_in_progress-28204-3489770802"
    ],
    "Dependencies": [
      "pipeline_start-28204-3486915445"
    ],
    "Cmd": "( npg_status2file --id_run 28204 --status \"archival in progress\" --dir_out /redacted ) 2>&1 | tee -a \"/redacted\"",
    "State": "complete",
    "Cwd": "/wr_cwd/e/2/4/8a45513c553563647f88a4490150f994389581/cwd",
    "CwdBase": "/tmp",
    "HomeChanged": false,
    "Behaviours": "{\"on_exit\":[{\"cleanup\":true}]}",
    "Mounts": "",
    "MonitorDocker": "",
    "ExpectedRAM": 2000,
    "ExpectedTime": 3600,
    "RequestedDisk": 0,
    "OtherRequests": [
      "rtimeout:1"
    ],
    "Cores": 0,
    "PeakRAM": 58,
    "PeakDisk": 0,
    "Exited": true,
    "Exitcode": 0,
    "FailReason": "",
    "Pid": 17523,
    "Host": "esa-analysis-20190114-server-ra6hafsqyxln",
    "HostID": "",
    "HostIP": "192.168.0.107",
    "Walltime": 0.384074345,
    "CPUtime": 0.366763,
    "Started": 1550143766,
    "Ended": 1550143766,
    "StdErr": "",
    "StdOut": "",
    "Env": null,
    "Attempts": 1,
    "Similar": 0
  },
  {
    "Key": "0921d0c35f7a38f4b61225c2aa2056cc",
    "RepGroup": "28204-run_qc_complete",
    "DepGroups": [
      "run_qc_complete-28204-3678272585"
    ],
    "Dependencies": [
      "run_run_archived-28204-3232308204"
    ],
    "Cmd": "( npg_status2file --id_run 28204 --status \"qc complete\" --dir_out /redacted ) 2>&1 | tee -a \"/redacted\"",
    "State": "complete",
    "Cwd": "/wr_cwd/0/9/2/1d0c35f7a38f4b61225c2aa2056cc159060972/cwd",
    "CwdBase": "/tmp",
    "HomeChanged": false,
    "Behaviours": "{\"on_exit\":[{\"cleanup\":true}]}",
    "Mounts": "",
    "MonitorDocker": "",
    "ExpectedRAM": 2000,
    "ExpectedTime": 3600,
    "RequestedDisk": 0,
    "OtherRequests": [
      "rtimeout:1"
    ],
    "Cores": 0,
    "PeakRAM": 59,
    "PeakDisk": 0,
    "Exited": true,
    "Exitcode": 0,
    "FailReason": "",
    "Pid": 13251,
    "Host": "esa-analysis-20190114-server-ra6hafsqyxln",
    "HostID": "",
    "HostIP": "192.168.0.107",
    "Walltime": 0.363033561,
    "CPUtime": 0.361349,
    "Started": 1550169077,
    "Ended": 1550169077,
    "StdErr": "",
    "StdOut": "",
    "Env": null,
    "Attempts": 1,
    "Similar": 0
  },
  {
    "Key": "851165b3bd805bfa2785dea1084affa5",
    "RepGroup": "28204-run_run_archived",
    "DepGroups": [
      "run_run_archived-28204-3232308204"
    ],
    "Dependencies": [
      "upload_fastqcheck_to_qc_database-28204-2674049854",
      "update_ml_warehouse-28204-1565627731"
    ],
    "Cmd": "( npg_status2file --id_run 28204 --status \"run archived\" --dir_out /redacted ) 2>&1 | tee -a \"/redacted\"",
    "State": "complete",
    "Cwd": "/wr_cwd/8/5/1/165b3bd805bfa2785dea1084affa5712177281/cwd",
    "CwdBase": "/tmp",
    "HomeChanged": false,
    "Behaviours": "{\"on_exit\":[{\"cleanup\":true}]}",
    "Mounts": "",
    "MonitorDocker": "",
    "ExpectedRAM": 2000,
    "ExpectedTime": 3600,
    "RequestedDisk": 0,
    "OtherRequests": [
      "rtimeout:1"
    ],
    "Cores": 0,
    "PeakRAM": 58,
    "PeakDisk": 0,
    "Exited": true,
    "Exitcode": 0,
    "FailReason": "",
    "Pid": 13245,
    "Host": "esa-analysis-20190114-server-ra6hafsqyxln",
    "HostID": "",
    "HostIP": "192.168.0.107",
    "Walltime": 0.393142191,
    "CPUtime": 0.363732,
    "Started": 1550169077,
    "Ended": 1550169077,
    "StdErr": "",
    "StdOut": "",
    "Env": null,
    "Attempts": 1,
    "Similar": 0
  }
]
@sb10 sb10 added the bug label Jul 1, 2020
@sb10 sb10 added this to To Reconfirm in wr Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
wr
To Reconfirm
Development

No branches or pull requests

1 participant