Skip to content
This repository has been archived by the owner on Oct 23, 2022. It is now read-only.

jobs unhack #112

Closed
mralusw opened this issue Oct 29, 2021 · 4 comments
Closed

jobs unhack #112

mralusw opened this issue Oct 29, 2021 · 4 comments

Comments

@mralusw
Copy link
Contributor

mralusw commented Oct 29, 2021

Not an issue, just some ideas here. While working on my copy of kak-bundle, I've come up with another approach to keeping track of jobs and avoiding complicated hacks.

The trick is to communicate via the filesystem (code from vvc() in that repo):

        > "$tmp_file.running"; >"$tmp_file".log
        ( ( "$@" ); rm -f "$tmp_file.running" ) >"$tmp_file".log 2>&1 3>&- &

For each job, you create a .running file in a tmpdir, and when the job ("$@") sub-subshell exits, the subshell removes it. With this, there's no need to wait(), or to count processes. You can simply count the number of *.running glob matches

set -- "$tmp_dir"/*.running
[ $# != 1 ] || [ -e "$1" ] || set --  # glob failure
running=$#

When all jobs are done, $running = 0.

@andreyorst
Copy link
Owner

wouldn't this use a lot of file descriptors if we install an insane amount of plugins in parallel? Not that this would be an issue, as there aren't so many plugins, as usually available file descriptors but still...

I don't think that the current implementation is particularly ugly or hacky. It's a bit verbose, though. The only thing that is really bothering me (although I haven't looked into it yet) is why I need to multiply the number of processes by 5 in the case of plug_update.

If this trickery could be moved into a function of some sort, I would not mind this change of algorithm. But it's a bit cryptic to me 😅

@mralusw
Copy link
Contributor Author

mralusw commented Oct 29, 2021

Just to be clear, I'm not proposing that you change anything in plug. This is just for thought / reference for the future (you said at some point you're contemplating a plug.kak rewrite IIRC).

wouldn't this use a lot of file descriptors if we install an insane amount of plugins in parallel? Not that this would be an issue, as there aren't so many plugins, as usually available file descriptors but still...

Are you referring to the >job.log fd's? But you also have an unlimited number of background jobs which write log files; am I missing something obvious?

In kak-bundle, I've implemented a limit on the number of parallel jobs (see vvc(). Once that limit is reached, the code simply calls wait $! to let the current job (and possibly others) finish.

I don't think that the current implementation is particularly ugly or hacky. It's a bit verbose, though. The only thing that is really bothering me (although I haven't looked into it yet) is why I need to multiply the number of processes by 5 in the case of plug_update.

That, and your comment this is a hacky way to measure amount of active jobs + the linked Debian bug. 5 probably comes from the way you background jobs inside backgrounded jobs. It's a good thing that it's stable.

I've been thinking about a better architecture for a while. wait is quite limited — it lacks a non-blocking mode. jobs suffers from the bug you linked to. Calling plug_load | kak -p relies on kak being in the path, under that exact name.

In kak_bundle, I let the background processes communicate back to kak (after the parent %sh{} exits) via the filesystem. And I've implemented a "periodic hook" for the *status* buffer which, well, periodically updates the displayed logs.

I dunno, I just thought you might be interested.

@mralusw mralusw closed this as completed Oct 29, 2021
@andreyorst
Copy link
Owner

But you also have an unlimited number of background jobs which write log files; am I missing something obvious?

Yes, jobs are writing log files, but that's what we have to do, so that's not an issue. What I'm saying is that in the case of the current implementation, we only create 1 additional file for jobs, and write active jobs to it. In the case of your code, if I understand it correctly, each job opens a file descriptor for a .running file and holds to it until the subshell is done.

That, and your comment this is a hacky way to measure amount of active jobs + the linked Debian bug.

Yeah, I mean it's hackier than just calling jobs directly in the loop, as it involves writing to a file.

Calling plug_load | kak -p relies on kak being in the path, under that exact name.

I believe there are tons of scripts that rely on this, so it should not be an issue. I mean, if someone renames kak or moves it out from the PATH, almost everything will break, and there's no way to workaround it AFAIK.

Ideally I would like to move away from creating too much files, but I see it as a low priority for now.

@mralusw
Copy link
Contributor Author

mralusw commented Oct 30, 2021

Yes, jobs are writing log files, but that's what we have to do, so that's not an issue. What I'm saying is that in the case of the current implementation, we only create 1 additional file for jobs, and write active jobs to it. In the case of your code, if I understand it correctly, each job opens a file descriptor for a .running file and holds to it until the subshell is done.

No holding on to fd's (other than the log fd) in my trick. The file is created before the sub-subshell starts, and removed after the sub-subshell finishes. All in tmpfs storage, hopefully (though that's not available on termux, WSL, or non-optimized Linux).

Calling plug_load | kak -p relies on kak being in the path, under that exact name.

I believe there are tons of scripts that rely on this, so it should not be an issue. I mean, if someone renames kak or moves it out from the PATH, almost everything will break, and there's no way to workaround it AFAIK.

I know. Practically speaking, it's true, but it seems misguided. I have a couple of kak's in /opt/kak/bin (debug build, clang / gcc etc). /usr/local/bin/kak is a script that sources %val{config}/kak.env (they're not expecting me to type KAKOUNE_POSIX_SHELL=... every time I start kak, right?).

Given the widespread usage, kak should provide a %val{server_env_KAKOUNE_BIN}. But alas, that's another problem: there's no %val{server_env_*} and you cannot access env vars in kakrc (since there's no client, there's no %val{client_env_*} either). You have to issue a %sh{echo $envvar} in kakrc just to read the environment. Time for another issue.

Ideally I would like to move away from creating too much files, but I see it as a low priority for now.

You're only creating one jobs file; in practice it is robust, but in theory I'm not sure I could prove it is robust, or that the 5x factor is robust :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants