jobs unhack #112

mralusw · 2021-10-29T15:34:40Z

Not an issue, just some ideas here. While working on my copy of kak-bundle, I've come up with another approach to keeping track of jobs and avoiding complicated hacks.

The trick is to communicate via the filesystem (code from vvc() in that repo):

        > "$tmp_file.running"; >"$tmp_file".log
        ( ( "$@" ); rm -f "$tmp_file.running" ) >"$tmp_file".log 2>&1 3>&- &

For each job, you create a .running file in a tmpdir, and when the job ("$@") sub-subshell exits, the subshell removes it. With this, there's no need to wait(), or to count processes. You can simply count the number of *.running glob matches

set -- "$tmp_dir"/*.running
[ $# != 1 ] || [ -e "$1" ] || set --  # glob failure
running=$#

When all jobs are done, $running = 0.

The text was updated successfully, but these errors were encountered:

andreyorst · 2021-10-29T19:26:47Z

wouldn't this use a lot of file descriptors if we install an insane amount of plugins in parallel? Not that this would be an issue, as there aren't so many plugins, as usually available file descriptors but still...

I don't think that the current implementation is particularly ugly or hacky. It's a bit verbose, though. The only thing that is really bothering me (although I haven't looked into it yet) is why I need to multiply the number of processes by 5 in the case of plug_update.

If this trickery could be moved into a function of some sort, I would not mind this change of algorithm. But it's a bit cryptic to me 😅

mralusw · 2021-10-29T20:54:33Z

Just to be clear, I'm not proposing that you change anything in plug. This is just for thought / reference for the future (you said at some point you're contemplating a plug.kak rewrite IIRC).

wouldn't this use a lot of file descriptors if we install an insane amount of plugins in parallel? Not that this would be an issue, as there aren't so many plugins, as usually available file descriptors but still...

Are you referring to the >job.log fd's? But you also have an unlimited number of background jobs which write log files; am I missing something obvious?

In kak-bundle, I've implemented a limit on the number of parallel jobs (see vvc(). Once that limit is reached, the code simply calls wait $! to let the current job (and possibly others) finish.

I don't think that the current implementation is particularly ugly or hacky. It's a bit verbose, though. The only thing that is really bothering me (although I haven't looked into it yet) is why I need to multiply the number of processes by 5 in the case of plug_update.

That, and your comment this is a hacky way to measure amount of active jobs + the linked Debian bug. 5 probably comes from the way you background jobs inside backgrounded jobs. It's a good thing that it's stable.

I've been thinking about a better architecture for a while. wait is quite limited — it lacks a non-blocking mode. jobs suffers from the bug you linked to. Calling plug_load | kak -p relies on kak being in the path, under that exact name.

In kak_bundle, I let the background processes communicate back to kak (after the parent %sh{} exits) via the filesystem. And I've implemented a "periodic hook" for the *status* buffer which, well, periodically updates the displayed logs.

I dunno, I just thought you might be interested.

andreyorst · 2021-10-30T10:06:27Z

But you also have an unlimited number of background jobs which write log files; am I missing something obvious?

Yes, jobs are writing log files, but that's what we have to do, so that's not an issue. What I'm saying is that in the case of the current implementation, we only create 1 additional file for jobs, and write active jobs to it. In the case of your code, if I understand it correctly, each job opens a file descriptor for a .running file and holds to it until the subshell is done.

That, and your comment this is a hacky way to measure amount of active jobs + the linked Debian bug.

Yeah, I mean it's hackier than just calling jobs directly in the loop, as it involves writing to a file.

Calling plug_load | kak -p relies on kak being in the path, under that exact name.

I believe there are tons of scripts that rely on this, so it should not be an issue. I mean, if someone renames kak or moves it out from the PATH, almost everything will break, and there's no way to workaround it AFAIK.

Ideally I would like to move away from creating too much files, but I see it as a low priority for now.

mralusw · 2021-10-30T10:20:44Z

Yes, jobs are writing log files, but that's what we have to do, so that's not an issue. What I'm saying is that in the case of the current implementation, we only create 1 additional file for jobs, and write active jobs to it. In the case of your code, if I understand it correctly, each job opens a file descriptor for a .running file and holds to it until the subshell is done.

No holding on to fd's (other than the log fd) in my trick. The file is created before the sub-subshell starts, and removed after the sub-subshell finishes. All in tmpfs storage, hopefully (though that's not available on termux, WSL, or non-optimized Linux).

Calling plug_load | kak -p relies on kak being in the path, under that exact name.

I believe there are tons of scripts that rely on this, so it should not be an issue. I mean, if someone renames kak or moves it out from the PATH, almost everything will break, and there's no way to workaround it AFAIK.

I know. Practically speaking, it's true, but it seems misguided. I have a couple of kak's in /opt/kak/bin (debug build, clang / gcc etc). /usr/local/bin/kak is a script that sources %val{config}/kak.env (they're not expecting me to type KAKOUNE_POSIX_SHELL=... every time I start kak, right?).

Given the widespread usage, kak should provide a %val{server_env_KAKOUNE_BIN}. But alas, that's another problem: there's no %val{server_env_*} and you cannot access env vars in kakrc (since there's no client, there's no %val{client_env_*} either). You have to issue a %sh{echo $envvar} in kakrc just to read the environment. Time for another issue.

Ideally I would like to move away from creating too much files, but I see it as a low priority for now.

You're only creating one jobs file; in practice it is robust, but in theory I'm not sure I could prove it is robust, or that the 5x factor is robust :)

mralusw closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs unhack #112

jobs unhack #112

mralusw commented Oct 29, 2021 •

edited

Loading

andreyorst commented Oct 29, 2021

mralusw commented Oct 29, 2021

andreyorst commented Oct 30, 2021

mralusw commented Oct 30, 2021 •

edited

Loading

jobs unhack #112

jobs unhack #112

Comments

mralusw commented Oct 29, 2021 • edited Loading

andreyorst commented Oct 29, 2021

mralusw commented Oct 29, 2021

andreyorst commented Oct 30, 2021

mralusw commented Oct 30, 2021 • edited Loading

mralusw commented Oct 29, 2021 •

edited

Loading

mralusw commented Oct 30, 2021 •

edited

Loading