Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitForBootCompletion(self) does not retry #982

Closed
tedhtchang opened this issue May 2, 2016 · 7 comments
Labels

Comments

@tedhtchang
Copy link
Contributor

@tedhtchang tedhtchang commented May 2, 2016

The decorator @vm_util.Retry(log_errors=False, poll_interval=1) was removed from https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/providers/openstack/os_virtual_machine.py#L180 in v1.4.0 for some reason. The code exit on the first get hostname attempt while VM still booting up.

Command:

./pkb.py --cloud=OpenStack --benchmarks=cluster_boot --machine_type=m1.small --image=ubuntu-14.04  --openstack_public_network=public --openstack_private_network=private --metadata=mytag:teddevstackvm
2016-04-29 17:17:44,087 a367d867 MainThread INFO     Verbose logging to: /tmp/perfkitbenchmarker/run_a367d867/pkb.log
2016-04-29 17:17:44,088 a367d867 MainThread INFO     PerfKitBenchmarker version: v1.4.0
2016-04-29 17:17:44,177 a367d867 MainThread INFO     Flag values:
--machine_type=m1.small
--metadata=mytag:teddevstackvm
--image=ubuntu-14.04
--cloud=OpenStack
--benchmarks=cluster_boot
--openstack_private_network=private
--openstack_public_network=public
2016-04-29 17:17:44,230 a367d867 MainThread cluster_boot(1/1) INFO     Provisioning resources for benchmark cluster_boot
2016-04-29 17:17:46,977 a367d867 Thread-1 cluster_boot(1/1) INFO     Running: cat /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile.pub
2016-04-29 17:18:06,117 a367d867 Thread-1 cluster_boot(1/1) INFO     floating-ip associated: 192.168.49.212
2016-04-29 17:18:06,804 a367d867 Thread-1 cluster_boot(1/1) INFO     VM: 192.168.49.212
2016-04-29 17:18:06,804 a367d867 Thread-1 cluster_boot(1/1) INFO     Waiting for boot completion.
2016-04-29 17:18:21,818 a367d867 Thread-1 cluster_boot(1/1) INFO     Running: ssh -A -p 22 ubuntu@192.168.49.212 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile hostname
2016-04-29 17:18:26,823 a367d867 Thread-1 cluster_boot(1/1) INFO     Ran ssh -A -p 22 ubuntu@192.168.49.212 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile hostname. Got return code (255).
STDOUT: 
STDERR: ssh: connect to host 192.168.49.212 port 22: Connection timed out

2016-04-29 17:18:26,825 a367d867 MainThread cluster_boot(1/1) ERROR    Exception occurred while calling PrepareVm(192.168.49.212):
Traceback (most recent call last):
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/vm_util.py", line 245, in _ExecuteThreadCall
    queue.put(ThreadCallResult(call_id, target(*args, **kwargs), None))
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 286, in PrepareVm
    vm.WaitForBootCompletion()
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/providers/openstack/os_virtual_machine.py", line 187, in WaitForBootCompletion
    resp, _ = self.RemoteCommand('hostname', retries=1)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/linux_virtual_machine.py", line 313, in RemoteCommand
    suppress_warning, timeout)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/linux_virtual_machine.py", line 374, in RemoteHostCommand
    raise errors.VirtualMachine.RemoteCommandError(error_text)
RemoteCommandError: Got non-zero return code (255) executing hostname
Full command: ssh -A -p 22 ubuntu@192.168.49.212 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile hostname
STDOUT: STDERR: ssh: connect to host 192.168.49.212 port 22: Connection timed out


2016-04-29 17:18:27,869 a367d867 MainThread cluster_boot(1/1) ERROR    Error during benchmark cluster_boot
Traceback (most recent call last):
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 374, in RunBenchmark
    DoProvisionPhase(benchmark_name, spec, detailed_timer)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 224, in DoProvisionPhase
    spec.Provision()
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 214, in Provision
    vm_util.RunThreaded(self.PrepareVm, self.vms)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/vm_util.py", line 365, in RunThreaded
    max_concurrency=max_concurrent_threads)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/vm_util.py", line 308, in RunParallelThreads
    '{0}{1}'.format(os.linesep, os.linesep.join(error_strings)))
ThreadException: The following exceptions occurred during threaded execution:
Exception occurred while calling PrepareVm(192.168.49.212):
Traceback (most recent call last):
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/vm_util.py", line 245, in _ExecuteThreadCall
    queue.put(ThreadCallResult(call_id, target(*args, **kwargs), None))
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 286, in PrepareVm
    vm.WaitForBootCompletion()
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/providers/openstack/os_virtual_machine.py", line 187, in WaitForBootCompletion
    resp, _ = self.RemoteCommand('hostname', retries=1)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/linux_virtual_machine.py", line 313, in RemoteCommand
    suppress_warning, timeout)
  File "/home/tedchang/PerfKitBenchmarker/perfkitbenchmarker/linux_virtual_machine.py", line 374, in RemoteHostCommand
    raise errors.VirtualMachine.RemoteCommandError(error_text)
RemoteCommandError: Got non-zero return code (255) executing hostname
Full command: ssh -A -p 22 ubuntu@192.168.49.212 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile hostname
STDOUT: STDERR: ssh: connect to host 192.168.49.212 port 22: Connection timed out


2016-04-29 17:18:33,013 a367d867 Thread-4 cluster_boot(1/1) INFO     Instance not found, may have been already deleted
2016-04-29 17:18:35,926 a367d867 MainThread cluster_boot(1/1) ERROR    Benchmark 1/1 cluster_boot (UID: cluster_boot0) failed. Execution will continue.
2016-04-29 17:18:35,926 a367d867 MainThread cluster_boot(1/1) INFO     Benchmark run statuses:
-----------------------------------
Name          UID            Status
-----------------------------------
cluster_boot  cluster_boot0  FAILED
-----------------------------------
Success rate: 0.00% (0/1)
2016-04-29 17:18:35,926 a367d867 MainThread cluster_boot(1/1) INFO     Complete logs can be found at: /tmp/perfkitbenchmarker/run_a367d867/pkb.log

@wangxf1987

This comment has been minimized.

Copy link

@wangxf1987 wangxf1987 commented May 10, 2016

Hi,
I have same question. How to resolve it?

@tedhtchang

This comment has been minimized.

Copy link
Contributor Author

@tedhtchang tedhtchang commented May 10, 2016

add "@vm_util.Retry(log_errors=False, poll_interval=1)" before "def WaitForBootCompletion(self)"
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/providers/openstack/os_virtual_machine.py#L181

@hildrum

This comment has been minimized.

Copy link
Contributor

@hildrum hildrum commented May 11, 2016

@kivio, @meteorfox is there a reason there isn't the @vm_util.Retry decorator an WaitForBootCompletion for the OpenStack virtual machine?

@meteorfox

This comment has been minimized.

Copy link
Collaborator

@meteorfox meteorfox commented Jun 13, 2016

@hildrum @tedhtchang

Hmm.. that's interesting. Well I don't think is necessary to override WaitForBootCompletion(self). The class BaseLinuxMixin already provides an implementation that polls every second. Also, _PostCreate(self) already waits until VM reaches ACTIVE. The easy fix is to delete WaitForBootCompletion(self) all together from os_virtual_machine.py.

I also have PR #942 in code review that should address this and other issues.

@tedhtchang

This comment has been minimized.

Copy link
Contributor Author

@tedhtchang tedhtchang commented Jun 14, 2016

@meteorfox
Please run a benchmark on Openstack env with v1.4.0. If you have same issue we should fix it soon Since v1.4.0, I was not able to run a bemchmark on Openstack.
If you look at these 2 lines from my stdout. It waited 15 sec for the vm to boot up, polled once and then quit before the vm was ready.
2016-04-29 17:18:06,804 a367d867 Thread-1 cluster_boot(1/1) INFO Waiting for boot completion.
2016-04-29 17:18:21,818 a367d867 Thread-1 cluster_boot(1/1) INFO Running: ssh -A -p 22 ubuntu@192.168.49.212 -2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PreferredAuthentications=publickey -o PasswordAuthentication=no -o ConnectTimeout=5 -o GSSAPIAuthentication=no -o ServerAliveInterval=30 -o ServerAliveCountMax=10 -i /tmp/perfkitbenchmarker/run_a367d867/perfkitbenchmarker_keyfile hostname

@meteorfox

This comment has been minimized.

Copy link
Collaborator

@meteorfox meteorfox commented Sep 9, 2016

This issue should be already solved in master after PR #942 was merged. Can you please try with latest? Thanks!

@voellm voellm added bug P1 labels Sep 22, 2016
@voellm

This comment has been minimized.

Copy link
Collaborator

@voellm voellm commented Sep 22, 2016

Please reopen if this was not fixed.

@voellm voellm closed this Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.