New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OMSAgent fills the os disk and wreck havoc the system #366
Comments
Where could we give the logs ? 24 go ;) |
I needed to recover my service so I deleted it but here is a tail -n 200 on the omsagent : |
I've published a quick fix on my blog using logrotate to avoid that before a fix http://etienne.deneuve.xyz/2017/02/02/agent-oms-sur-linux-dans-azure/ |
Getting the same issue. Same error message. Relevant lines from our logs below. There's nothing relevant for hours before this happens. After, the last line below is logged about 2000 times per second until omsagent is restarted.
|
Getting this issue also on Ubuntu 16.04 in multiple OMS agents connecting to multiple dashboards |
I'm getting this issue on CentOS release 6.8 (Final). It creates 20GB of logs on each machine :O |
Any update on this? We're not the only ones affected! |
+1 to having this issue |
@robbiezhang @lagalbra @agup006 @NarineM - anyone able to take a look at this for us? |
We have been fighting this issue too. It has filled the disk on several of our prod servers. Thankfully, our Platform Admins have prevented outages. Here is what we are seeing in our logs: Then: The last two entries repeat until the disk fills. Etienne's fix does not work for us as we already had similar settings on our logrotate. The only sure way to prevent this, that I know of, is to put the logs on a separate partition. We are at the point of shutting down the agent on all Linux prod systems. We had already disabled centralized configuration due to another issue that was consuming RAM on our systems. Clearly that wasn't enough to prevent updates from impacting prod. |
Update: We are setting our logging level to "fatal" to avoid filling up disk. If anyone knows of a way to change the location of the logs that would give us better options. |
Sorry @hbrother if my fix not work on your side.. feel free to give my contact info to your admins, maybe we can find a better solution together :) |
@EtienneDeneuve No worries. It may be that one of the other settings we have interfered with your working solution. Our settings were: "{ |
The root issue has been mitigated. An error-causing conf file Our team is currently working on improving our logging strategy to avoid generating large log files. If the
|
Thanks @lagalbra. I notice the installation instructions refer to version 1.3.0, yet that version isn't in the releases. Will that be released soon? |
I can confirm that my machine did still have the |
@webash Our goal is to release OMSAgent version 1.3.0 this month. |
@lagalbra can you please advise how we deal with the OMS Agent Extension object? It will still be extant on the machines even though the OMS agent will be installed in Step 4 of your instructions, and ideally we'd like to use this method to reinstall rather than transferring files to EVERY affected linux machine in our enterprise. Thanks, |
Unfortunately this issue hasn't been resolved by reinstalling the agent. |
@webash Could you copy the contents of |
We too had the security_baseline.conf file. Just wanted to confirm that the purge uninstall/reinstall fixed the issue for us. Thanks, @lagalbra |
Apologies for the delay @lagalbra, here is the contents of
|
@webash Your We should have a new configuration file for this issue deployed within the week. |
Its a personal lab environment where I'm having the most profound effects from this, so I've simply stopped the OMS service on the affected machines for now. Will you be able to update us here when the new conf file has been deployed? |
The new conf file has been deployed, but is not publicly available for the purposes of a safer preview period. If you re-onboard your machines to the OMS service now, you will not receive any version of the |
two different causes, same effects: the omsagent.log grows out of control filling the os disk and basically disrupting the monitored system. This is a condition that must be avoided at all costs, the agent must not disrupt the monitored system.
during a solution upgrade pushed from the cloud we got the following entry filling the log and the disk of several VMs:
[error]: exec failed to run or shutdown child process error="No such file or directory - /opt/microsoft/omsagent/plugin/omsbaseline" err$
the in_tail plugin doesn't manage appropriately any access denied error, filling the log and the disk with:
2017-02-02 10:21:49 +0100 [error]: /opt/microsoft/omsagent/ruby/lib/ruby/gems/2.2.0/gems/fluentd-0.12.24/lib/fluent/plugin/in_tail.rb:484:in
on_timer'2017-02-02 10:21:49 +0100 [error]: /opt/microsoft/omsagent/ruby/lib/ruby/gems/2.2.0/gems/cool.io-1.4.4/lib/cool.io/loop.rb:88:in
run_once' 2017-02-02 10:21:49 +0100 [error]: /opt/microsoft/omsagent/ruby/lib/ruby/gems/2.2.0/gems/cool.io-1.4.4/lib/cool.io/loop.rb:88:in
run'2017-02-02 10:21:49 +0100 [error]: /opt/microsoft/omsagent/ruby/lib/ruby/gems/2.2.0/gems/fluentd-0.12.24/lib/fluent/plugin/in_tail.rb:253:in
run' 2017-02-02 10:21:50 +0100 [error]: Permission denied @ rb_file_s_stat - /var/log/php-fpm/error.log
The text was updated successfully, but these errors were encountered: