Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process substitution / anonymous named pipes #66

Closed
warpfork opened this issue Jul 7, 2012 · 9 comments
Closed

process substitution / anonymous named pipes #66

warpfork opened this issue Jul 7, 2012 · 9 comments
Labels
Milestone

Comments

@warpfork
Copy link

warpfork commented Jul 7, 2012

I have a situation where I want to call one program which can only accept a certain kind of input via a file which must be named in the arguments, but I want to feed it content generated from another program.

In other words, I have a situation that would be expressed in bash with a process substitution like this:

 tail -f <(echo "generated")

(Relevant: https://en.wikipedia.org/wiki/Process_substitution )

In python, I can solve this with a tempfile fairly easily.

A step better: I can also solve it with a named pipe with a mkfifo call fairly easily, which gives me the joys of in-memory rather than actually hitting the filesystem needlessly.

However, that still leaves something to be desired; I have to pick a name for my fifo, and I have to remove it again when I'm done. If I get SIGKILL, I leave a dangling fifo hanging around on my filesystem. What would really be excellent is if I could tap into the magic stuff in the /proc/$pid/fd and /dev/fd/$fd areas common in a linux world... that would give me a system where the kernel itself is functioning as my cleanup.

That example of process substitution in bash up above does something clever like that. If you run that example and then look at what actually happened with ps, you'll see something like this:

tail -f /dev/fd/63

Bash created a fifo somewhere where I don't have to worry about it (I think it's somewhere under /proc/ so it just goes away when the processes die?); stdout of the echo writes into the fifo and the reading end of the fifo is made into file descriptor 63 for tail. And then the "/dev/fd/63" part is magic that happens to be a name for the fifo that is fd 63 to the current process.

What would really be excellent is if I could tap into the same level of magic up in the python world.

In the course of writing this, I ended up realizing that I can use "/dev/fd/0" as an argument to get a program to read its own standard in as a file, and since I don't happen to be using stdin already in my current case, this solves my immediate problem. A more general solution would still be excellent, though, and for that we would need the ability to pass arbitrarily numbered file descriptors into child processes, instead of being limited to stdin/stdout/stderr aka 0/1/2.

Also, I'm not sure how portable the "/dev/fd/$fd" stuff is; I feel a little uncomfortable hardcoding that in, and bash takes care of it for me, but I have no idea how I'd go about finding out in a cross platform way what the location is for the magic filenames-to-selfprocess-file-descriptors.

@amoffat
Copy link
Owner

amoffat commented Jul 10, 2012

I kind of follow what you're getting at. Could you write up a few pbs example use-cases here of how you envision it? It will become more clear to me then.

Are you thinking of something like this?

import pbs
pbs.tail(pbs.cat("/tmp/test", _out="fifo"))

@warpfork
Copy link
Author

Okay, so suppose for example I want to do something like this in bash (this was my original use case):

git config -f <(curl http://raw.github.com/heavenlyhash/projectWhatever/master/.gitmodules) -l

Now I want to do that in PBS, and it's a little tough, but what I ended up hacking into being was this:

from pbs import git;
with closing(urllib.urlopen(githubRawUrl+"/.gitmodules")) as f:
    remoteModulesStr = f.read();
git.config("-f", "/dev/fd/0", "-l", _in=remoteModulesStr)

And that works, because /dev/fd/0 is already a magic file in my system that is a fifo that will read from standard in of that process.

In the more general case though, what if stdin is already used by that process for something special? Or I want to do

diff <(curl http://thingy.com/resource1)  <(curl http://thingy.com/resource2)

Now that trick with stdin won't work; I need other channels, or several of them.

To see what bash is doing here, you can do something like this:

diff <(tail -f /dev/null) <(tail -f /dev/null) &
ps -f | grep diff

...and you'll see something like "diff /dev/fd/63 /dev/fd/62". Possibly exactly that.

So, the most direct way to expose this from pbs might look like this:

pbs.diff("/dev/fd/63", "/dev/fd/62", __63=inMemStrA, __62=inMemStrB)

That's a little ugly. Cooler would be maybe more like...

pbs.diff(pbs.stream(inMemStrA), pbs.stream(inMemStrB));

Actually, coolest might be somewhere in the middle. Gimme a syntax to pass arbitrarily numbered channels in and out (and then _in, _out, and _err become mere special cases of that system and are synonymous to __0, __1, and __2), just in case I'm interfacing with some crazy program that uses the higher numbers. Then also have a wrapper object that makes the pbs command invocation step aware that there's something here that should be shunted via an anonymous pipe, and there hide all the numbers (and more importantly, the /dev/ shenanigans) from the library user.

@amoffat
Copy link
Owner

amoffat commented Jul 17, 2012

I think I'm going to hold off on this one for right now, but it will go on the roadmap. The dev branch desperately needs to get finished up and merged to master, and it has a complete rewrite of subprocesss.Popen in it. So doing this feature might be easier on the dev branch.

@brentp
Copy link

brentp commented Apr 17, 2013

is there a newer way to do process substitution since this was originally issued?

@amoffat
Copy link
Owner

amoffat commented Apr 17, 2013

@brentp negative

@StyXman
Copy link

StyXman commented Jul 28, 2013

for the record, what bash does (as per what strace -ff shows):

[...]
pipe([3, 4])                            = 0
fcntl(63, F_GETFD)                      = -1 EBADF (Bad file descriptor)
dup2(3, 63)                             = 63
close(3)                                = 0
[...]
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe4425f59d0) = 16235
[...]
execve("/usr/bin/tail", ["tail", "-f", "/dev/fd/63"], [/* 40 vars */]) = 0

and on the echo side:

dup2(4, 1)                              = 1
[...]
write(1, "generated\n", 10)             = 10

So it's the same as setting up a pipe (|), but then using an arbitrary fd (63) to dup2() the reading end of it, and use /dev/fd/63 (which point to /proc/self/fd, where /proc/self point to the subdir in /proc for the current process) as the input file for tail. This should be not very difficult to reproduce in sh.

@ecederstrand
Copy link
Collaborator

ecederstrand commented May 24, 2021

@amoffat This is a really old suggestion and hasn't seen any support from others for the last many years. Maybe we should just close it without fixing?

@fracai
Copy link

fracai commented May 24, 2021

I for one would still be interested in this. Granted, my use case is currently handled by just using sh to call a script that handles the named pipes, but in the interest of feature completeness I think it'd be useful to at least keep this on the roadmap.

@amoffat
Copy link
Owner

amoffat commented May 25, 2021

@ecederstrand I think I tend to agree with you. It is a cool idea, and it seems like when people need it, it would very very convenient, but it also seems like people don't need it very often. I'll close it and we can re-open if more momentum builds behind it.

@amoffat amoffat closed this as completed May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants