Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splunk_hec generator does participate fully in shutdown, hangs shutdown #155

Closed
blt opened this issue Mar 23, 2022 · 2 comments
Closed

splunk_hec generator does participate fully in shutdown, hangs shutdown #155

blt opened this issue Mar 23, 2022 · 2 comments

Comments

@blt
Copy link
Collaborator

blt commented Mar 23, 2022

The splunk_hec generator spawns its own sub-tasks which stay active even while the rest of lading shut down. This leaves background tasks still running, which can hang the program on controlled shutdown. We do try to work around this by using a timed shutdown by this does not appear to work in all cases, as seen here. The top-level process must signal to its spawned tasks that the time has come to shutdown.

@blt
Copy link
Collaborator Author

blt commented Mar 24, 2022

This is somewhat less pressing with #156, although clean shutdown is still desirable.

blt added a commit that referenced this issue Mar 29, 2022
This commit sets up a perf record sub-process that monitors the target process
if the user gives the `--target-perf-data-path` flag. I haven't done anything to
ensure this will actually work, which it won't depending on your Linux sysctl
settings and will never work on OS X or Windows. Still. I have had to undo some
of #156 so that perf can shutdown correctly, pull in Nix to avoid SIGKILL
signals to both the target and perf. This means the PR is blocked until #155 is
resolved.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue Mar 29, 2022
This commit sets up a perf record sub-process that monitors the target process
if the user gives the `--target-perf-data-path` flag. I haven't done anything to
ensure this will actually work, which it won't depending on your Linux sysctl
settings and will never work on OS X or Windows. Still. I have had to undo some
of #156 so that perf can shutdown correctly, pull in Nix to avoid SIGKILL
signals to both the target and perf. This means the PR is blocked until #155 is
resolved.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue Apr 13, 2022
This commit sets up a perf record sub-process that monitors the target process
if the user gives the `--target-perf-data-path` flag. I haven't done anything to
ensure this will actually work, which it won't depending on your Linux sysctl
settings and will never work on OS X or Windows. Still. I have had to undo some
of #156 so that perf can shutdown correctly, pull in Nix to avoid SIGKILL
signals to both the target and perf. This means the PR is blocked until #155 is
resolved.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue Apr 15, 2022
* Collect perf report data from target process

This commit sets up a perf record sub-process that monitors the target process
if the user gives the `--target-perf-data-path` flag. I haven't done anything to
ensure this will actually work, which it won't depending on your Linux sysctl
settings and will never work on OS X or Windows. Still. I have had to undo some
of #156 so that perf can shutdown correctly, pull in Nix to avoid SIGKILL
signals to both the target and perf. This means the PR is blocked until #155 is
resolved.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Fiddle with shutdown mechanism some more

In this commit I've made `Shutdown` hide its interior mechanism and implemented
Clone on it. This should, hopefully, make it easier to thread through splunk et
al. I have also slightly changed the diagnostic data on shutdown, especially
with regard to controlled versus not.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Run an 'inspector' sub-process

This commit adjusts the perf run away from a specialized sub-process to a more
general 'inspector' sub-process, allowing the user to run whatever they
want. Consider this config:

```
inspector:
  command: sh
  arguments:
    - "-c"
    - "perf record --verbose --compression-level=22 --call-graph=dwarf --pid=`pidof vector` --output=/tmp/vector.perf.data"
  environment_variables: {}
  output:
    stderr: "/tmp/captures/inspector.stderr.log"
    stdout: "/tmp/captures/inspector.stdout.log"
```

Lading'll manage the sh, the sh'll manage perf and perf'll record captures. This
allows perf to correctly shut down, while giving the user way more
freedom. Neat.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* collapse target Cmd enum

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue May 21, 2022
In vectordotdev/vector#12776 we've seen that
the splunkhec generator doesn't quite work right with regard to
shutdown. This has been a long-running problem, see #155, and this
commit is intended to refactor the generator to be a little more
direct but, more importantly, have timeouts and definite loop breaks
in key places.

Whether this actually functions as expected will depend on Vector
where reproduction is the most feasible.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
blt added a commit that referenced this issue May 21, 2022
In vectordotdev/vector#12776 we've seen that
the splunkhec generator doesn't quite work right with regard to
shutdown. This has been a long-running problem, see #155, and this
commit is intended to refactor the generator to be a little more
direct but, more importantly, have timeouts and definite loop breaks
in key places.

Whether this actually functions as expected will depend on Vector
where reproduction is the most feasible.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt
Copy link
Collaborator Author

blt commented May 21, 2022

It's possible that this is resolved by #203 but considering that we don't have tests in place to prove this I'll leave this ticket open.

blt added a commit that referenced this issue May 24, 2022
* Resolve SplunkHec generator freeze bug

In vectordotdev/vector#12776 we've seen that
the splunkhec generator doesn't quite work right with regard to
shutdown. This has been a long-running problem, see #155, and this
commit is intended to refactor the generator to be a little more
direct but, more importantly, have timeouts and definite loop breaks
in key places.

Whether this actually functions as expected will depend on Vector
where reproduction is the most feasible.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* PR feedback

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* correct clippy dings

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt blt closed this as completed Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant