New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transparent OpenStack provisioning for teuthology-suite #592

Merged
merged 13 commits into from Sep 2, 2015

Conversation

Projects
None yet
5 participants
@ghost
Copy link

ghost commented Aug 5, 2015

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 5, 2015

@zmc I think I addressed your comments, except the transition from testtool to py.test. How does that look ?

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 7, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1754/
Test FAILed.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 7, 2015

@zmc the files were reorganized to be in the openstack subpackage and the ubuntu user made configurable. The transition from testtools to py.test is not done yet. How does that look ?

README.rst Outdated

* Run the dummy suite as a test::

$ teuthology-openstack --key-name myself --suite dummy

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

Might want to clarify that 'myself' is "your openstack username"

This comment has been minimized.

@ghost

ghost Aug 7, 2015

Author

It's not the OpenStack user name, it's one of the keypair you registered for your user.

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

Actually maybe it's not (OS1 internal):
2015-08-07 15:38:54,990.990 ERROR:teuthology.misc:openstack server create --image 'teuthology-ubuntu-14.04' --flavor 'm1.small' --key-name zcerza --user-data /tmp/tmpPySLGY --security-group teuthology --wait teuthology error ERROR: openstack Invalid key_name provided. (HTTP 400) (Request-ID: req-dc994e8c-c85d-4a47-a12f-5dbfc5ff34ea)

Edit: user error; my keypair is called 'Zack' (it is old and I forgot) :)

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

Hrm, I'm seeing a lot of:
cloud_init_wait AuthenticationException Authentication failed.

This comment has been minimized.

@ghost

ghost Aug 8, 2015

Author

@zmc I'm guessing the troubles you have with authentication via paramiko are related to paramiko/paramiko#387

A simple test is http://paste2.org/ZagzFYZX

Maybe orchestra would work but I would have to add support for the key_filename option and figure out how to catch the exceptions properly.

" grep '" + self.up_string + "' " +
"/var/log/cloud-init*.log")
while proceed():
client = paramiko.SSHClient()

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

Curious why you're not using Remote here - is it just inappropriate?

It does seem like this while block might want to be split out as a psuedo-private method.

This comment has been minimized.

@ghost

ghost Aug 8, 2015

Author

The part that tracks the progress of the tail until a given string is found could probably be using Remote.

@@ -195,6 +200,168 @@ def __del__(self):
self.remove_config()


class ProvisionOpenStack(OpenStack):

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

Do you think it would be feasible/desirable to implement an abstract base Provision class that Downburst and ProvisionOpenStack (should maybe just be called OpenStack) could inherit from?

Just to formalize a bit more that they intend to provide a similar interface: create(), destroy(), build_config(), remove_config() - though the last two are currently implemented by differently-named methods in ProvisionOpenStack

This comment has been minimized.

@ghost

ghost Aug 7, 2015

Author

create() and destroy(), definitely. The build_config / remove_config are not shared, I think.

This comment has been minimized.

@zmc

zmc Aug 19, 2015

Member

Well, init_user_data() is "building a config" and __del__() is "removing a config"

@@ -0,0 +1,548 @@
#!/bin/bash

This comment has been minimized.

@zmc

zmc Aug 7, 2015

Member

I think you mentioned yesterday on the phone that you intend to (or intend to help) obsolete this with ansible or even just python; perhaps a comment at the top noting that would be helpful in case we both disappear :)

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 7, 2015

@dachary The reorganization looks very good - thanks!

The user appears to be mostly configurable, save for teuthology/openstack/openstack-teuthology.init and teuthology/openstack/test/user-data-test1.txt; I don't intend to block this PR on that though.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 8, 2015

@zmc I'm guessing the troubles you have with authentication via paramiko are related to paramiko/paramiko#387

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 8, 2015

teuthology/openstack/openstack-teuthology.init and teuthology/openstack/test/user-data-test1.txt

I modified teuthology/openstack/openstack-teuthology.init but did not changed teuthology/openstack/test/user-data-test1.txt because I assume we don't care since it's a test file

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 8, 2015

I can't make sense of this error:

import file mismatch:
imported module 'teuthology.test.task.test_selinux' has this __file__ attribute:
  /home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/py27/local/lib/python2.7/site-packages/teuthology/test/task/test_selinux.py
which is not the same as the test file we want to collect:
  /home/jenkins-build/build/workspace/teuthology-pull-requests/teuthology/test/task/test_selinux.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

what is it about ?

Update for the record: the solution is that the openstack/test directory did not have a init.py file. Adding the init.py file was enough to resolve the problem.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 8, 2015

@zmc you can change the name of the instance created to run the teuthology cluster with --name. I only used it for test purposes and was not sure if it was usable. I tried it and all is well. Note that all instances created by a given teuthology cluster have property=IP-of-the-teuthology-instance. When the cluster is teardown, it does so by destroying the instances that have that property. It allows multiple cluster to co-exist within the same tenant.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 8, 2015

@zmc testtools has been replaced with py.test

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 10, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1776/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 10, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1777/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 10, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1778/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 10, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1779/
Test PASSed.

* `create an account <https://www.ovh.com/fr/support/new_nic.xml>`_
* get $HOME/openrc.sh from `the horizon dashboard <https://horizon.cloud.ovh.net/project/access_and_security/?tab=access_security_tabs__api_access_tab>`_

Setup

This comment has been minimized.

@andrewschoen

andrewschoen Aug 10, 2015

Contributor

It's not entirely clear to me if these steps are happening on your host or the vm you just created.

This comment has been minimized.

@ghost

ghost Aug 10, 2015

Author

All the steps happen on your host. Only integration testing in the chapter below happen on the vm that was created. This is indeed confusing and I'm assuming the person willing to run the integration tests will have enough motivation to sort that out.

Any idea how to clarify that there is no need to ssh to something for the steps below ?

This comment has been minimized.

@andrewschoen

andrewschoen Aug 10, 2015

Contributor

I'm working through getting this setup now. After I get it work I most likely will have some suggestions, I hope at least.

I'm also trying a different openstack provider than OVH, so there might be some differences there as well.

This comment has been minimized.

@zmc

zmc Aug 18, 2015

Member

Maybe just put something like "On your own workstation:" just above the steps?

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 10, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1782/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 11, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1783/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 14, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1787/
Test PASSed.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 18, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1793/
Test PASSed.

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 18, 2015

@dachary I'm still seeing the same authentication issues I saw last time I tried to use this - so I can't get it to work.

If this is merged to master as-is, there may be many people that have the same issue as me; since I don't know how to fix it myself, I'd have no idea what to tell them.

It seems like teuthology.openstack.OpenStack.cloud_init_wait() should be using teuthology.orchestra.connection

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 18, 2015

@zmc --verbose will now also set the paramiko debug flag and give us more information about the error you're seeing. Could you please upload the output somewhere for me to look at ?

Although orchestra would most certainly fix the specific problem you currently have, it has other issues such as not exposing the key_filename argument and maybe others. If it was just a matter of switching, I would happily do it. But it's more a matter of patching orchestra to fit the needs of the OpenStack backend and I feel the cons currently outweight the pros.

To be entirely honest I have to admit that part of my resistance is that switching to orchestra would require two or three days of work on my part.


class TeuthologyOpenStack(OpenStack):

def __init__(self, args, config, argv):

This comment has been minimized.

@zmc

zmc Aug 18, 2015

Member

What is the difference between args and argv here? A docstring might be useful for this method.

This comment has been minimized.

@ghost

ghost Aug 18, 2015

Author

argv is the orginal arv list so that it can be displayed as is should an error occur. Using args is more confusing to the reader.

This comment has been minimized.

@zmc

zmc Sep 2, 2015

Member

Sorry if I'm being thick here, but I don't quite understand. Could we just get a docstring here please?

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 18, 2015

@dachary it looks like it's using the wrong SSH key; it's ignoring my ~/.ssh/config, which tells everything else which one to use.

I think it would be worthwhile to enable this PR to use orchestra; I might be willing to do the work myself if I knew what all was needed. It looks like exposing key_filename would be trivial, for example.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 18, 2015

@zmc the other thing that's non trivial (at least to me) is handling the errors in the right way at the various stages of the virtual machine boot (which may involve a reboot). It looks like orchestra is not transparent when it comes to bubbling the errors from lower levels. I did not investigate much though.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 18, 2015

@zmc it occurs to me that there may be a need to explicitly ask paramiko to handle the ~/.ssh/config file ? I though that was implicit but maybe it's not ?

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 27, 2015

I'll attempt to address some of these, but I'll admit I'm having a very hard time navigating the codebase. A lot of them would be solved by simply not attempting to set up an entirely separate teuthology cluster.

I agree. But how to run integration tests if not by setting up an entire cluster ? Things would be infinitely better if the cluster deployment was done via ansible instead of a shell script. Do you think we would be better off without the integration tests and we should wait for ansible to be able to deploy a full cluster instead ?

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 28, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1807/
Test PASSed.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 28, 2015

Rebased. The git clone command is now configurable. The configuration options are documented and sensible defaults were added.

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 28, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1811/
Test PASSed.

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 28, 2015

@dachary Thanks for the config change. I had a similar work in progress in a branch of mine, but yours is more complete.

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 28, 2015

@dachary You may want to cherry-pick zmc@97e9b39 . Let me know if PRs work better for this.

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Aug 28, 2015

@dachary RE: testing:

While I like that there are "integration" tests here (my quotes will make sense in a second), I'd like to get some unit tests into this branch.

In looking at how we might bring the existing and future tests in line with the way the rest of teuthology tries to stay organized, I noticed that test_openstack.py looks more like an integration test than a unit test. This is because it hits the network, creates VMs, etc. It also doesn't want to pass for me locally.

What we're currently calling "integration" tests look more like system tests. I think we should organize tests to reflect this. If these seem like meaningless words, the main point is that we should have a set of tests that are able to execute very quickly (sub-second runtimes) that we can expand on as we make changes.

Regarding the concept of always setting up a teuthology cluster inside of OpenStack, it's not that I oppose that functionality; I just want to make sure we aren't tied to it.

As with most of the changes I'm suggesting, I'm willing to do most or all of the work, assuming I'm able :)

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 28, 2015

cherry-picked 810cce8 (a PR works fine too)

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 28, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1813/
Test PASSed.

@ghost

This comment has been minimized.

Copy link
Author

ghost commented Aug 28, 2015

@zmc I understand your concern regarding integration tests being mandatory.

parser.add_argument(
'--archive-upload',
help='rsync destination to upload archives',
default='ubuntu@teuthology-logs.public.ceph.com:./',

This comment has been minimized.

@ghost

ghost Aug 31, 2015

Author

Updated with the ceph.com URL

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Aug 31, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1814/
Test PASSed.

ldachary and others added some commits Jul 19, 2015

transparent OpenStack provisioning for teuthology-suite
The teuthology-openstack command is a wrapper around teuthology-suite
that transparently creates the teuthology cluster using OpenStack
virtual machine.

For machines of machine_type == openstack in paddles, when locking a
machine, an instance is created in the matching OpenStack cluster with:

   openstack server create redhat

And renamed into redhat042010 if assigned the IP x.x.42.10/16. It is then
locked in paddles which has been prepare with one slot for each
available IP in the range.

An OpenStack cluster is defined in the .teuthology.yaml file as follows:

openstack:
  user-data: teuthology/openstack/openstack-{os_type}-{os_version}-user-data.txt
  nameserver: 167.114.252.136
  machine:
    disk: 10 # GB
    ram: 7000 # MB
    cpus: 1
  volumes:
    count: 0
    size: 1 # GB
  flavor-select-regexp: ^vps-ssd
  subnet: 167.114.224.0/19

When the machine is unlocked, it is destroyed with

   openstack server delete redhat042010

The python-openstackclient command line is used instead of the
corresponding API because it is well maintained and documented.

Integration tests require an OpenStack tenant.

http://tracker.ceph.com/issues/6502 Fixes: #6502

Signed-off-by: Loic Dachary <loic@dachary.org>
upload teuthology archive on completion
Signed-off-by: Loic Dachary <loic@dachary.org>
openstack: centos-7 image must be recent
Using the CentOS 7 image available at

  http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2

ansible will think it runs

  RedHat 7.1.1503 (Core)

and with a more recent image

  http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1503.qcow2

it will think that it runs

  CentOS Linux release 7.1.1503 (Core)

http://tracker.ceph.com/issues/12725 Fixes: #12725

Signed-off-by: Loic Dachary <loic@dachary.org>

tmp
openstack: teuthology.init needs credentials to nuke
And it cannot assume it is running from the home directory of the user
although it's the case most of the time.

Signed-off-by: Loic Dachary <loic@dachary.org>
connect(): Make retries optional
Signed-off-by: Zack Cerza <zack@redhat.com>
connect(): Optionally override key_filename
Signed-off-by: Zack Cerza <zack@redhat.com>
OpenStack(): Don't connect via paramiko directly
Instead, use teuthology.orchestra.connection.connect() - to take
advantage of its parsing of the ssh config. This way users won't run
into surprises when using OpenStack vs. other machine types.

Signed-off-by: Zack Cerza <zack@redhat.com>
openstack: remove debug line that does not print
It misses the IP that was set previous to 20bdef4

Signed-off-by: Loic Dachary <loic@dachary.org>
Add unit tests for OpenStack config defaults
Signed-off-by: Zack Cerza <zack@redhat.com>
openstack: set config default and documentation
Add documentation for the ~/.teuthology.yaml OpenStack configuration
values and set reasonable defaults.

Signed-off-by: Loic Dachary <loic@dachary.org>
openstack: make git clone configurable
When teuthology-openstack clone theuthology for the purpose of creating
the cluster, use the clone configuration value instead of a hardcoded value.

Signed-off-by: Loic Dachary <loic@dachary.org>
openstack: --upload defaults to teuthology-logs.public.ceph.com
Instead of a non-ceph domain.

Signed-off-by: Loic Dachary <loic@dachary.org>
openstack: explains args vs argv
Signed-off-by: Loic Dachary <loic@dachary.org>
@ghost

This comment has been minimized.

Copy link
Author

ghost commented Sep 2, 2015

@zmc commented on args + argv and rebased and repushed

@ceph-jenkins

This comment has been minimized.

Copy link

ceph-jenkins commented Sep 2, 2015

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/teuthology-pull-requests/1821/
Test PASSed.

@zmc

This comment has been minimized.

Copy link
Member

zmc commented Sep 2, 2015

@dachary Thanks, that's a bit more clear.

To repeat what I've said to you privately, I don't wait to wait until this particular PR feels "perfect" to get it into master - that could drag on endlessly and not help anyone. I think the major issues are taken care of, so I'll merge it now.

Sometime in the near future I'll look at adapting the feature to be usable seamlessly with the current workflow we use in our labs. In the meantime this is hugely useful for external contributors! Thanks so much for all your work and patience :)

zmc added a commit that referenced this pull request Sep 2, 2015

Merge pull request #592 from dachary/wip-6502-openstack-v3
transparent OpenStack provisioning for teuthology-suite

@zmc zmc merged commit 36d148e into ceph:master Sep 2, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Merged build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment