Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Tee #16

Closed
marceloboeira opened this issue Jul 10, 2019 · 14 comments
Closed

[Feature Request] Tee #16

marceloboeira opened this issue Jul 10, 2019 · 14 comments

Comments

@marceloboeira
Copy link

marceloboeira commented Jul 10, 2019

It would be cool to have a way to write a buffer to multiple sinks, in cases where you want to intercept parts of the incoming data but still pipe the rest for other matches...

script.Stdin().Match("error").Tee(script.File("all_errors")).Match("foo").Stdout()
cat logs | grep error | tee all_errors | grep foo 

In this scenario Tee writes to the File whilst still piping the content to the next item on the pipe Match that eventually pipes to stdout or any other sink.

@bitfield WDYT?

Something like:

fun (*p Pipe) Tee(io.Writer) *p { ... }
// or 
fun (*p Pipe) Tee(p Pipe) *p { ... }
@bitfield
Copy link
Owner

Yes, great idea! I had a similar thought, but I couldn't work out exactly what the API should be.

Maybe Tee returns two pipes?

@marceloboeira
Copy link
Author

Yeah, I've struggled with the API a bit now looking at the code...

I think I see what you mean, so something like:

fun (*p Pipe) Tee() (*p1, *p2) { ... }

It would be less intuitive syntax-wise, but it makes more sense. I have to spare some time to think about it and understand the code a bit better.

@bitfield
Copy link
Owner

One problem with that is you can't then chain anything on to a Tee() call. You could do, for example:

p, q := script.Exec("journalctl").Tee()
p.AppendFile("log.txt")
q.Stdout()

...but it's not ideal.

@Xaelias
Copy link

Xaelias commented Jul 12, 2019

I'm not going to pretend I can try to write the function definition :-D
But tee doesn't try to create two stdout. The purpose is to still have a stdout you can keep chaining stuff on, and send the second one to something else.
I definitely think the base syntax should basically be command().tee(q).keep_doing_stuff() and then you can do q.match(...).AppendFile("...").
That's actually pretty close to what the shell tee syntax looks like, and from a user's perspective it makes sense to me (feel free to disagree :-D).

@marceloboeira
Copy link
Author

marceloboeira commented Jul 12, 2019

@Xaelias Yes, you're correct. That's what I was trying to grasp with the first definition, but I think an io.Writer is not generic enough. So this would work, theoretically, as you've described:

// Definition
fun (*p Pipe) Tee(*tp Pipe) *p { ... }

// Usage
script.Stdin().Match("error").Tee(script.Match("bar").AppendFile("bar")).Match("foo").Stdout()

Where *tp would be the reference to the pipe to tee content too, and the returning pipe would allow you to continue working with it.

However, the issue might be that the function itself Tee steals the runtime ... it won't be piped simultaneously to both Pipes, it would run the .Match("bar").Append... and after it would run the Match("foo")....

To actually do it simultaneously we'd have to spawn a go routine with a reader and the main pipe would have to send the message through channels.

In theory, the result would be the same, or could be the same in both cases, depending on implementation, but it would run not simultaneously... which might be misleading.

That's why I haven't attempt a PR yet, I'm marinating the problem in my head.

@bitfield
Copy link
Owner

But tee doesn't try to create two stdout. The purpose is to still have a stdout you can keep chaining stuff on, and send the second one to something else.

Fair point. So maybe something like this:

script.Exec("journalctl").Tee("/tmp/log.txt").Match("error").Stdout()

(Don't worry about the implementation; let's get the design right first. If it turns out to be impossible to implement, then we'll have to think again!)

@Xaelias
Copy link

Xaelias commented Jul 12, 2019

That's one use for it. But it's already too restrictive. For instance you can do ... | tee /tmp/log.txt and ... | tee -a /tmp/log.txt :-)
But in theory you can also have way more complex examples:

- Create a directory called "example", count the number of characters in "example" and write "example" to the terminal:
    echo "example" | tee >(xargs mkdir) >(wc -c)

Which means Tee() should probably be something like Tee(pipe1, pipe2, ..., pipen) if wee want to keep feature equivalency.

@bitfield
Copy link
Owner

In theory is exactly the point, isn't it? We can dream up any number of wild and crazy things to do with Tee(). I'm looking for just one practical, real-life example where somebody would use this to do some kind of devops or sysadmin task. Can you think of one?

@Xaelias
Copy link

Xaelias commented Jul 12, 2019

I can try.
I'll be honest, I use tee mostly to log stuff and still see it / do some other stuff with it on the CLI.
If I have to re-use it to do multiple things, I usually end up doing something like:

output=$(...)
do_first_thing_with("${output}")
do_second_thing_with("${output}")
...

But that's because I'm a noob and have never actually tried to use tee instead :-D

If this is just going to be tee as a facility to log to a file, it shouldn't be called tee. It should be something like duplicate_to_file or something.

I'm willing to accept you don't want to replicate tee. But I also think that if it's called Tee, it should be able to do (at least a majority of) what tee` can do :-)

@marceloboeira
Copy link
Author

marceloboeira commented Jul 12, 2019

tbh, I don't get the Tee(pipe1, pipe2, pipe3 ...).

I would grep all errors, you don't need infinite pipes there because the whole point is that the output pipe will allow you to continue your script.

If tee's pipe is a pipe, you can grep inside of if, or tee it again.

e.g.:

cat foo.logs | grep error | tee error_logs | grep bar > bar_error_logs
                     |                                |
                     |> grep -> main pipe             | ---> grep -> stdout --> bar_errro_logs file
                     |________________________________|
                                    | (tee's pipe)
                                    | (more greps or whatnot ...) 
                                    |_______ error_logs file
script.Stdin().Match("error").Tee(script.AppendFile("error_logs")).Match("bar").AppendFile("bar_error_logs")
                   |                                               |
                   |> grep -> main pipe                            | ---> grep ->stdout --> bar_errro_logs file
                   |_______________________________________________|
                                    | (tee's pipe)
                                    | (more greps or whatnot ...) 
                                    |_______ error_logs file
fun (*p Pipe) Tee(*tp Pipe) *p { ... }
                    |        |_______ Main pipe 
                    |___ tee's pipe
                                           

Or even:

script.Stdin().Match("error").Tee(script.Match("foo").AppendFile("foo_error_logs")).Match("bar").AppendFile("bar_error_logs")

This way you would grep for error, then tee, followed by grep for foo and redirect to foo_error_logs. While the main pipe continues from the first grep to grep for bar and append to bar_errro_logs.

@Xaelias
Copy link

Xaelias commented Jul 12, 2019

- Create a directory called "example", count the number of characters in "example" and write "example" to the terminal:
    echo "example" | tee >(xargs mkdir) >(wc -c)

That's an example with 2 pipes plus stdout.
It's simple, I just stole it from tldr.

If you only have one pipe, you can't replicate this behavior easily. I guess in theory you could Tee a Tee? But that becomes messier IMO.

I don't know for a real example... Let's say you parse apache logs. You match all the 4xxs/5xxs, put that in a log file, from these errors, grep all the IPs you can find and try a reverse DNS lookup on them, while you count frequency of each endpoint on your end to try and find patterns, and also count the frequencies of each IPs.
That's 4 things you do with a single initial stdout:

  • filter on 4xx/5xx HTTP code
  • extract IP and reverse DNS lookup
  • extract IP and count frequency
  • extract endpoint and count frequency

That's one filter with a tee that does 3 things with the output plus sends the result to a logfile. 3 out pipes + stdout.

I guess the question we need to ask is what are we trying to achieve here? If the goal is only to be able to log to a file while still doing pipe stuff, I don't think Tee is really what we're trying to do is all. If we're trying to say we want to take one stream, and send it to multiple pipes (which is what tee does), then it's a whole different story, and logging to a file is just a matter of sending one of the resulting pipes to that file.

I'm not sure I'm super clear sorry ><

@bitfield
Copy link
Owner

extract IP and reverse DNS lookup

This gives me a tangentially related idea. Something like xargs, where each successive value can be interpolated into a command line. For example:

script.Stdin().Column(1).ExecWithEach("dig +short -x").Stdout()

does a reverse lookup on each line in the pipe. Needs a better name (not Xargs())...

@bitfield
Copy link
Owner

To address the substantive issue: a lot of the uses of tee I'm hearing are basically about doing interactive work at a command line. It's tremendously useful for that, of course, but there's a limited overlap with the type of thing you'd want to do with script programs.

Doing anything with Tee() other than teeing the pipe to a file threatens a complicated and hard-to-follow code layout. For example:

script.Stdin().Match("error").Tee(script.Match("foo").AppendFile("foo_error_logs")).Match("bar").AppendFile("bar_error_logs")

It's really hard to see what's going on here. Pipelines, by their very nature, are convenient to read left to right in a straight line. The only possible way I can see of making this readable is to have Tee() return two pipes, and then continue each of those on a separate line. Even that's confusing.

My take on this proposal as it stands:

  1. Lacks a readily plausible use case for real-world script programs.
  2. Lacks a simple, obvious, convenient, flexible design.

I can't see us progressing it at the moment. Maybe something will come up.

@bitfield
Copy link
Owner

Closing this pending more compelling use cases, or solutions to the design problem. Thanks for the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants