Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix events plugin #48

Merged
merged 5 commits into from
Jun 19, 2019
Merged

Fix events plugin #48

merged 5 commits into from
Jun 19, 2019

Conversation

maimaisie
Copy link
Contributor

After discovering the issue with k8s watch and list API regarding the resource version parameter, we redesigned the plugin flow. Now, the main thread acts as a monitor that loops forever (until the plugin is told to shut down) to update configmap and preemptively recreate the watch thread periodically. In more detail, it mainly does these things:

  1. Pull latest resource version from the List API and store in configmap every configmap_update_interval_seconds (10s by default).
  2. Check if the watch thread is running. If it's not, create the watcher thread using the resource version from the the configmap to resume from an earlier point in time if
    1. this is the first iteration (when the plugin just started and the watch thread does not exist)
    2. the watcher thread somehow died (thread is not alive)
  3. Check if it has been watch_interval_seconds since the last time we recreated the watch thread. If so, first pull latest resource version, close the watch thread by calling kubeclient method to close the http connection, then create a new watch stream using the RV we just pulled

We listen for SIGTERM so on graceful shutdown we can break out of the while loop in start_monitor (apparently if the while loop keeps running, the shutdown sequence won't get triggered)

FYI @rvmiller89 @lei-sumo @frankreno

def close
@watcher.each &:finish
def stop
log.debug "Clean up before stopping completely"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be sure to document in the README how to enable debug logs for a fluentd plugin (is there a way to only enable debug logs for a specific plugin?) so that the customer/support team can enable these debug logs as needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the README Sam put in troubleshooting steps that includes turning on debug logs for metrics. We'll make that generic to all three data types afterwards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've just updated the plugin with steps to enable debug logs for a specific plugin

Copy link
Contributor

@samjsong samjsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, one nit. Question though, what is the benefit of periodically recreating the watcher thread?

@maimaisie
Copy link
Contributor Author

@samjsong Good question. We found that if we don't specify the timeout parameter on the watch stream, it will be closed by the API server anyway after around 50 minutes even if events are continuously being generated. So now we specify a timeout on the watch stream and preemptively close it before the timeout so we can pull the latest RV before we close it. If we only rely on the timeout and let the API server close it, we wound't have a way to do anything before it gets closed. (we can keep a timer ourselves but we don't know if it will be in sync with the k8s api server)

@maimaisie maimaisie merged commit d32a2cb into master Jun 19, 2019
@maimaisie maimaisie deleted the maisie-fixes branch June 19, 2019 22:42
psaia pushed a commit to psaia/sumologic-kubernetes-collection that referenced this pull request May 25, 2021
* make a true cli

* update messages

* rename env vars

* remove codified defaults

* remove debug line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants