Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust how command output is returned #1

Closed
DiscoRiver opened this issue Dec 6, 2020 · 4 comments
Closed

Adjust how command output is returned #1

DiscoRiver opened this issue Dec 6, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@DiscoRiver
Copy link
Owner

Currently, command output is only returned after every host has finished executing;

for r := 0; r < len(c.Hosts); r++ {
		res = append(res, <-results)
	}

return res

After implementing this package into a production app, for an environment of around 200 servers, it became clear that this can cause an almost invisible bottleneck, especially if the worker pool is low compared to the number of hosts being touched. I wanted to refer to it as a perceived delay, but it can actually hang up the caller because it is waiting for every host to finish the work and return something.

There are two requirements I have to improve this;

  • Be able to track how many hosts haven't returned anything, and give some indication on what work is left.
  • Have the caller be able to process command output as a host finishes executing, rather than waiting.
@DiscoRiver DiscoRiver added the enhancement New feature or request label Dec 6, 2020
@DiscoRiver DiscoRiver changed the title Adjust return options Adjust how command output is returned Dec 6, 2020
@DiscoRiver DiscoRiver self-assigned this Jan 12, 2021
@DiscoRiver
Copy link
Owner Author

I've decided to go down the more complicated route of doing proper streaming output. I've managed to add single-channel streaming that holds output from all hosts with no metadata such as hostname;

d5a69c1

See session_test.go for my example.

The current challenge is making a system where a channel of type Result is returned with stdout and stderr channels in the struct. Basically, I need to be able to stream the output but also know what host it's for, to feed into sorting algorithms for omnivore. I've tried some methods, but I think I need to re-configure the worker function to accommodate both returning []Result in bulk after all commands exit, and streaming a channel of Result which doesn't wait. Currently I have to send a blank Result to the results channel so that the run function returns correctly;

https://github.com/DiscoRiver/massh/blob/add-streaming-output/session.go#L133

Without this the program hangs forever so I need to re-design this.

@DiscoRiver
Copy link
Owner Author

So, I've been messing around on the playground, and I think I can achieve what I want with something like this;

package main

import (
	"fmt"
	"sync"
)

func main() {
	ch := make(chan int)	
	done := make(chan bool)
	
	count := 100
	
	var wg sync.WaitGroup
	
	for i := 0; i < count; i++ {
		wg.Add(1)
	}
	
	go in(count, ch, done, &wg)
	
	
	var intSlice []int
	for {
		select {
		case d := <-ch:
			intSlice = append(intSlice, d)
			fmt.Printf("%d\n", d)
			wg.Done()
		case <-done:
			fmt.Println("Length of intslice: ", len(intSlice))
			return
		}
	}
}

func in(count int, ch chan<- int, done chan<- bool, wg *sync.WaitGroup) {
	for i := 0; i < count; i++ {
		go func(j int) {
			ch <- j
		}(i)
	}
	wg.Wait()
	done <- true
}

I'm fortunate that the tool will always know how many receives from the channel it needs to perform, so calling wg.Done() only when we've received one value is a good way to track that, and by the time we do done <- true, we know for sure that we've received everything from ch that we're expecting.

@DiscoRiver
Copy link
Owner Author

Managed to work everything out here: 7cd9f66

Major cleanup and restructuring is necessary, but the logic is there. The issue I was having is that I was inadvertently returning a nil Result, I think due to trying to code this in the early hours, but things seem to be working. I need to write some generalised functions to reduce the duplicate code I've got right now.

@DiscoRiver
Copy link
Owner Author

Coming together nicely: ca2a2ba

Things left;

  • Look at better definition for Returned
  • Implement a timeout feature so that tasks can be closed after a period of inactivity (and no exit) on stdout and stderr. This is because it's possible for a command to hang forever, and given this is a generic package for ssh commands, who knows what people will use it for. It's nice to know for sure that things will eventually exit :)
  • Document examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant