Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream log to stdout? #599

Open
vsoch opened this issue Jun 17, 2023 · 14 comments
Open

Stream log to stdout? #599

vsoch opened this issue Jun 17, 2023 · 14 comments

Comments

@vsoch
Copy link
Contributor

vsoch commented Jun 17, 2023

Hiya! I'm wondering if it might be possible to do the equivalent of a flux run, where instead of a submit, we hang in the terminal until it's running, and then stream the output to the terminal. The reason is that in a Kubernetes context, we often run one-off jobs and don't have access to the filesystem (and with hq we couldn't know which worker was writing the log file) so we'd maybe want to do:

hq submit --wait --log <some directive for stdout?)

I was specifically looking here:

      --log <LOG>
          Stream the output of tasks into this log file

So I tried this:

$ hq submit --wait --log /dev/stdout lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite

but no cigar! I found cat, but it looks like it's having issues without the shared filesystem:

$ hq job cat 4 stdout
2023-06-17T18:56:09Z WARN File `/opt/lammps/examples/reaxff/HNS/job-4/0.stdout` cannot be opened: Os { code: 2, kind: NotFound, message: "No such file or directory" }

It looks like --progress can show me a waiting bar, but that's akin to wait:

# hq submit --nodes 2 --progress lmp -
v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
Job submitted successfully, job ID: 6
2023-06-17T19:00:14Z INFO Waiting for 1 job with 1 task
[########################################] 0/1 jobs, 0/1 tasks (1 RUNNING)

I was able to shell into different workers and then find the output, although that's not ideal. But I was able to add --log to get it to write specifically to the server, and then use --wait so I know it's finished, and then "cat" at the end!

image

It's not streaming (so we can watch it while it runs but it's definitely good for now)! I think this headless use case (of getting logs in stdout, real time) would be really fantastic if you want to chat about it!

@spirali
Copy link
Collaborator

spirali commented Jun 18, 2023

Log file is created by server (workers streams to the server and it writes the data).

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 18, 2023

Hi, this is an interesting use-case. It's definitely something that could be useful, although it would be quite complex to implement, and we would have to think about various edge-cases (what should happen when the client disconnects?). I expect that this could take several weeks or months to implement, if we decide to do it.

Btw, how do you envision handling of output of multiple tasks? If you do something like hq job submit-stream, the output of that command would receive the outputs of all tasks (interleaved in pretty much random order). Would that be ok for your use-case?

@vsoch
Copy link
Contributor Author

vsoch commented Jun 18, 2023

At least for Flux, when there are multiple output stream we still direct the users to look at different log files.

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 18, 2023

Interesting. Let me ask something else than: what's the added value of this feature, vs hq submit --wait && hq cat?

@vsoch
Copy link
Contributor Author

vsoch commented Jun 18, 2023

This feature would show the log streaming in real time, vs the wait and cat requires the job to completely finish before showing the log.

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 19, 2023

I see. I suppose that for your use-case it wouldn't be enough to read from the logfile in a streaming fashion instead, and you need the output to be streamed via the network connection between the server and the client?

@vsoch
Copy link
Contributor Author

vsoch commented Jun 19, 2023

Yes, but please don’t consider this as urgent or a priority because the alternative wait and cat is acceptable! For the upcoming experiments I’d like to run #595 is most important.

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 19, 2023

I'm just trying to grasp the use-case :) Because you mentioned that if you have multiple tasks, and/or multiple streams (stdout/stderr), you probably want to look at separate files anyway. And it's quite unusual to use HQ to execute only a single task with a single stream (although it could be useful for debugging).

@vsoch
Copy link
Contributor Author

vsoch commented Jun 19, 2023

I'm just trying to grasp the use-case :) Because you mentioned that if you have multiple tasks, and/or multiple streams (stdout/stderr), you probably want to look at separate files anyway. And it's quite unusual to use HQ to execute only a single task with a single stream (although it could be useful for debugging).

I think for a lot of cases, you'd just be interested in the output streamed from the main runner, although it might be using others for the work. E.g., think of LAMMPS, or a workflow engine tool.

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 19, 2023

I can imagine that I would filter the streams so that I would only read the output of one "main" task. But I wonder how would that main task produce the output - in HQ, all tasks are independent and HQ is responsible for scheduling and executing them. The only way that one "special" task could produce output about the state of the computation is that it would have to read some external state (filesystem/DB), and run until it finds out that the other tasks are completed (?). It doesn't sound very compatible with the design of HQ, that's why I'm wondering how would it be used.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 19, 2023

So MPI would not work? E.g running LAMMPS?

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 19, 2023

There are several approaches of combining MPI with HQ:

  • You can do it in a way that is opaque to HQ, by just creating a task that will start MPI processes internally. This would work with the "main task streaming". But in this case you somehow need to also manage the other MPI nodes (if the MPI computation is distributed), and in that case it's arguable whether you even need HQ at all.
  • You can use MPI in a way that is visible to HQ, by creating multi-node tasks. In that case HQ starts tasks on multiple nodes and it is then your responsibility to execute the MPI computation on them. Again, this in theory could work with worker <-> client streaming, but if you will create only one such task, then there's again probably no need for HQ. And if you create many of them, then it's not clear what output should be streamed.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 21, 2023

You can do it in a way that is opaque to HQ, by just creating a task that will start MPI processes internally. This would work with the "main task streaming". But in this case you somehow need to also manage the other MPI nodes (if the MPI computation is distributed), and in that case it's arguable whether you even need HQ at all.

So to ask a very dumb question - if I give an mpirun command that hits an application, that seems to work using two nodes? Unless I just am reading this wrong? E.g., here I'm looking at the logs - we submit the job (mpirun) to hq, and ask for 2 nodes, and it reports 2 tasks (one thread each) which is what I would expect

$ kubectl logs -n hyperqueue-operator hyperqueue-sample-server-0-0-jzj9m -f
Hello, I am a server with hyperqueue-sample-server-0-0
Found extra command mpirun -np 2 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
2023-06-21T01:08:08Z INFO No online server found, starting a new server
2023-06-21T01:08:08Z INFO Storing access file as '/root/.hq-server/001/access.json'
+------------------+-------------------------------------------------------------------------------+
| Server directory | /root/.hq-server                                                              |
| Server UID       | lLfkCy                                                                        |
| Client host      | hyperqueue-sample-server-0-0.hq-service.hyperqueue-operator.svc.cluster.local |
| Client port      | 6789                                                                          |
| Worker host      | hyperqueue-sample-server-0-0.hq-service.hyperqueue-operator.svc.cluster.local |
| Worker port      | 1234                                                                          |
| Version          | 0.15.0-dev                                                                    |
| Pid              | 19                                                                            |
| Start date       | 2023-06-21 01:08:08 UTC                                                       |
+------------------+-------------------------------------------------------------------------------+
hq submit --wait --name lammps --nodes 2 --log log.out mpirun -np 2 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
Job submitted successfully, job ID: 1
2023-06-21T01:08:13Z INFO Worker 1 registered from 10.244.0.27:47442
2023-06-21T01:08:19Z INFO Worker 2 registered from 10.244.0.25:58466
Wait finished in 19s 635ms 180us 48ns: 1 job finished
HQ:log�LAMMPS (29 Sep 2021 - Update 2)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
�  triclinic box = (0.0000000 0.0000000 0.0000000) to (22.326000 11.141200 13.778966) with tilt (0.0000000 -5.0260300 0.0000000)
5  2 by 1 by 1 MPI processor grid
  reading atoms ...
%  304 atoms
  reading velocities ...
  304 velocities
   read_data CPU = 0.003 seconds
Replicating atoms ...
�  triclinic box = (0.0000000 0.0000000 0.0000000) to (44.652000 22.282400 27.557932) with tilt (0.0000000 -10.052060 0.0000000)
  2 by 1 by 1 MPI processor grid
  bounding box image = (0 -1 -1) to (0 1 1)
  bounding box extra memory = 0.03 MB
?  average # of replicas added to proc = 5.00 out of 8 (62.50%)
-  2432 atoms
  replicate CPU = 0.001 seconds
�Neighbor list info ...
  update every 20 steps, delay 0 steps, check no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 11
  ghost atom cutoff = 11
  binsize = 5.5, bins = 10 5 6
  2 neighbor lists, perpetual/occasional/extra = 2 0 0
  (1) pair reax/c, perpetual
      attributes: half, newton off, ghost
      pair build: half/bin/newtoff/ghost
      stencil: full/ghost/bin/3d
      bin: standard
  (2) fix qeq/reax, perpetual, copy from (1)
      attributes: half, newton off, ghost
      pair build: copy
      stencil: none
      bin: none
Setting up Verlet run ...
  Unit style    : real
  Current step  : 0
  Time step     : 0.1
�Per MPI rank memory allocation (min/avg/max) = 143.9 | 143.9 | 143.9 Mbytes
Step Temp PotEng Press E_vdwl E_coul Volume 
       0          300   -113.27833    437.52118   -111.57687   -1.7014647    27418.867 
X      10    299.38517   -113.27631    1439.2824   -111.57492   -1.7013813    27418.867 
X      20    300.27107   -113.27884     3764.342   -111.57762   -1.7012247    27418.867 
X      30    302.21063   -113.28428    7007.6629   -111.58335   -1.7009363    27418.867 
X      40    303.52265   -113.28799    9844.8245   -111.58747   -1.7005186    27418.867 
X      50    301.87059   -113.28324    9663.0973   -111.58318   -1.7000523    27418.867 
X      60    296.67807   -113.26777    7273.8119   -111.56815   -1.6996137    27418.867 
X      70    292.19999   -113.25435    5533.5522   -111.55514   -1.6992158    27418.867 
X      80    293.58677   -113.25831    5993.4438   -111.55946   -1.6988533    27418.867 
X      90    300.62635   -113.27925    7202.8369   -111.58069   -1.6985592    27418.867 
�     100    305.38276   -113.29357    10085.805   -111.59518   -1.6983874    27418.867 
Loop time of 11.1452 on 2 procs for 100 steps with 2432 atoms

Performance: 0.078 ns/day, 309.589 hours/ns, 8.972 timesteps/s
99.9% CPU use with 2 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 7.8546     | 8.1361     | 8.4177     |   9.9 | 73.00
Neigh   | 0.22941    | 0.2298     | 0.23019    |   0.1 |  2.06
Comm    | 0.011889   | 0.29354    | 0.57519    |  52.0 |  2.63
Output  | 0.00022833 | 0.00026634 | 0.00030434 |   0.0 |  0.00
Modify  | 2.4842     | 2.4848     | 2.4853     |   0.0 | 22.29
Other   |            | 0.0007025  |            |       |  0.01

Nlocal:        1216.00 ave        1216 max        1216 min
Histogram: 2 0 0 0 0 0 0 0 0 0
Nghost:        7591.50 ave        7597 max        7586 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:        432912.0 ave      432942 max      432882 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 865824
Ave neighs/atom = 356.01316
Neighbor list builds = 5
Dangerous builds not checked
Total wall time: 0:00:11

Does all that seem OK / properly working to you? If so - this is really great! And my next question is - do you have a plan for a next release? I'm currently installing from the nightly release from today (the 20th).

@Kobzol
Copy link
Collaborator

Kobzol commented Jun 21, 2023

So to ask a very dumb question - if I give an mpirun command that hits an application, that seems to work using two nodes? Unless I just am reading this wrong? E.g., here I'm looking at the logs - we submit the job (mpirun) to hq, and ask for 2 nodes, and it reports 2 tasks (one thread each) which is what I would expect

It's not a dumb question ;) The case that you have posted represents the case where HQ is aware of MPI (or rather it is aware of a multi-node task), and it looks reasonable. If you want to execute only a single job like this at a time, then streaming output to a client makes sense. Although I'm not sure if HQ helps in this case - you could just run qsub or sbatch and you would mostly achieve the same result. The strength of HQ lies in giving you the ability to run a lot of tasks (at once), but this situation also kind of breaks the premise of "streaming the output of a single task to a client". In any case, we plan to refactor the streaming part of HQ, and while doing that we can also try to implement client streaming. But it's probably not something that will come soon-ish.

I think that we can put out a new release in the coming week(s), as there are more new features. CC @spirali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants