Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent fails and isn't recoverable if an extension's log directory is not present anymore #812

Closed
sriramsa opened this issue Jul 24, 2017 · 0 comments
Milestone

Comments

@sriramsa
Copy link

If an extension's log directory is deleted by users, WALinuxAgent fails and exists with the following error. Some customers are cleaning up /var/log directory to reclaim space and are inadvertently deleting the /var/log/azure folder. Agent should handle this gracefully.

This is the error thrown.

2017/07/17 10:18:56.439857 INFO WALinuxAgent-2.2.10 did not terminate cleanly
2017/07/17 10:18:56.439962 ERROR Event: name=WALinuxAgent, op=Restart, message=WALinuxAgent-2.2.10 did not terminate cleanly
2017/07/17 10:18:56.510020 WARNING Agent WALinuxAgent-2.2.12 is blacklisted - skipping download
2017/07/17 10:18:56.510858 WARNING Agent WALinuxAgent-2.2.13 is blacklisted - skipping download
2017/07/17 10:18:56.520270 WARNING Agent WALinuxAgent-2.2.10 failed with exception: [Errno 20] Not a directory: '/var/log/azure/Microsoft.OSTCExtensions.VMAccessForLinux'
2017/07/17 10:18:56.524184 WARNING Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/azurelinuxagent/ga/update.py", line 257, in run
    exthandlers_handler.run()
  File "build/bdist.linux-x86_64/egg/azurelinuxagent/ga/exthandlers.py", line 187, in run
    self.handle_ext_handlers(etag)
  File "build/bdist.linux-x86_64/egg/azurelinuxagent/ga/exthandlers.py", line 210, in handle_ext_handlers
    self.handle_ext_handler(ext_handler, etag)
 File "build/bdist.linux-x86_64/egg/azurelinuxagent/ga/exthandlers.py", line 213, in handle_ext_handler
    ext_handler_i = ExtHandlerInstance(ext_handler, self.protocol)
  File "build/bdist.linux-x86_64/egg/azurelinuxagent/ga/exthandlers.py", line 345, in __init__
    fileutil.mkdir(self.get_log_dir(), mode=0o755)
  File "build/bdist.linux-x86_64/egg/azurelinuxagent/common/utils/fileutil.py", line 94, in mkdir
    os.makedirs(dirpath)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 20] Not a directory: '/var/log/azure/Microsoft.OSTCExtensions.VMAccessForLinux'

In the above instance, a customer had deleted the /var/log/azure directory causing waagent to continuously throw failures and not start up.

Waagent, ideally should be resilient to this scenario and throw an error/warning. This is a standard behavior across services that couldn't find it's log directory. For example, all these other services are run, but without logging:

Jul 18 02:18:43 riot auditd: Could not open dir /var/log/audit (Not a directory)
Jul 18 02:18:44 riot sa1: Cannot open /var/log/sa/sa18: Not a directory
Jul 18 02:18:48 riot nginx: nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (20: Not a directory)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants