Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Usage with 0.60.0 #1465

Closed
iNoahNothing opened this issue Apr 25, 2019 · 8 comments
Closed

High CPU Usage with 0.60.0 #1465

iNoahNothing opened this issue Apr 25, 2019 · 8 comments
Assignees

Comments

@iNoahNothing
Copy link
Contributor

There appears to be an issue with CPU spikes with Ambassador 0.60.0.

Reports from different users follows:

Any one noticing high cpu and OOM restarts on Ambassador 0.60.0? Note also trying endpoint instead of service based load balancing

2019-04-24 20:56:07 diagd 0.60.0 [P60TAmbassadorEventWatcher] ERROR: couldn't save Kubernetes resources: [Errno 12] Out of memory
2019/04/24 20:56:08 aggregator: watch hook failed: signal: killed
2019/04/24 20:56:08 aggregator: found 0 kubernetes watches
2019/04/24 20:56:08 aggregator: found 0 consul watches
2019/04/24 20:56:08 kubewatchman: processing 0 kubernetes watch specs
2019/04/24 20:56:08 consulwatchman: processing 0 consul watches
2019-04-24 20:56:08 diagd 0.60.0 [P60TAmbassadorEventWatcher] ERROR: could not reconfigure: a string or stream input is required
2019-04-24 20:56:08 diagd 0.60.0 [P60TAmbassadorEventWatcher] ERROR: a string or stream input is required
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ambassador-0.0.0.dev0-py3.6.egg/ambassador_diag/diagd.py", line 590, in run
    self.load_config_watt(rqueue, url)
  File "/usr/lib/python3.6/site-packages/ambassador-0.0.0.dev0-py3.6.egg/ambassador_diag/diagd.py", line 696, in load_config_watt
    fetcher.parse_watt(serialization)
  File "/usr/lib/python3.6/site-packages/ambassador-0.0.0.dev0-py3.6.egg/ambassador/config/resourcefetcher.py", line 125, in parse_watt
    watt_dict = parse_yaml(serialization)[0]
  File "/usr/lib/python3.6/site-packages/ambassador-0.0.0.dev0-py3.6.egg/ambassador/utils.py", line 67, in parse_yaml
    return list(yaml.load_all(serialization, Loader=yaml_loader))
  File "/usr/lib/python3.6/site-packages/yaml/__init__.py", line 81, in load_all
    loader = Loader(stream)
  File "/usr/lib/python3.6/site-packages/yaml/cyaml.py", line 24, in __init__
    CParser.__init__(self, stream)
  File "ext/_yaml.pyx", line 303, in _yaml.CParser.__init__
TypeError: a string or stream input is required

I am seeing a very high cpu spike whenever a new instance comes up and it triggers autoscaling which triggers more pods and more cpu usage... After sometime the issue subsides and it goes back to normal (after 5 minutes or so ) in 0.60

can someone tell me what is this snapshot
time="2019-04-25T12:00:28Z" level=info msg="Loaded file /ambassador/envoy/envoy.json"
time="2019-04-25T12:00:28Z" level=info msg="Pushing snapshot v119"
it goes till 120 snapshot before starting

@kflynn kflynn self-assigned this Apr 25, 2019
@kflynn kflynn closed this as completed Apr 25, 2019
@kflynn
Copy link
Member

kflynn commented Apr 25, 2019

Hopefully tackled as of #1462 and #1466.

@kflynn kflynn reopened this Apr 25, 2019
@kflynn
Copy link
Member

kflynn commented Apr 25, 2019

Whoops, didn't mean to close yet.

@vaibhavrtk
Copy link

vaibhavrtk commented Apr 26, 2019

It seems to create multiple snapshots on the startup time (last version had around 15 as soon as i updated)
Changed deployment 443->8443, 80->8080
Service + ambassador module(service port) + tls(redirect) 443->8443, 80 -> 8080
Ambassador is crashing (because of new requirements to watch endpoints)
Update Cluster role to add endpoints
Ambassador starts working (says that I have 83 endpoints)
Starts updating snapshots 1 by 1
reach till snapshot 196 (meanwhile very high CPU usage trigger autoscaling)
then CPU goes down again from 300% to 150 % in 15 min

@flands
Copy link
Contributor

flands commented Apr 26, 2019

Two things caused this for me:

  1. Not setting AMBASSADOR_SINGLE_NAMESPACE environment variable -- I suspect polling the k8s API across namespaces was a heavy operation and given all Ambassador annotations were set in a single namespace no need for this overhead
  2. redirect_cleartext_from set to 80 instead of 8080

With both fixed the CPU/Mem issue went away.

@pjediny
Copy link

pjediny commented Apr 26, 2019

I'm observing a similar issue with 0.60.1, OOM crashes, spamming with snapshot messages in the log, getting to 244 just before crash.

@kflynn
Copy link
Member

kflynn commented Apr 26, 2019

We may have identified the culprit for the memory usage.

Beyond that, @pjediny, are you on the Slack channel? Are you willing to exec into the pod and run the grab-snapshots.py script you'll find? I'm wondering why you're getting so many updates...

@pjediny
Copy link

pjediny commented Apr 29, 2019

Just a note for others, it looks like the 0.60.2-rc1 did fix the issue for me.

@richarddli
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants