Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/ansible: Ansible orchestrator module #24445

Merged
merged 1 commit into from
Dec 3, 2018
Merged

mgr/ansible: Ansible orchestrator module #24445

merged 1 commit into from
Dec 3, 2018

Conversation

jmolmo
Copy link
Member

@jmolmo jmolmo commented Oct 5, 2018

A Ceph Manager Orchestrator that uses a external REST API service to execute Ansible playbooks.
Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

A first running version of this orchestrator manager module.

Still lot of things to do, but this allows to start getting feedback from the community.

Just manual tests ran:

enable/disable module and check logs
test command line (get inventory)

Details

[root@ceph build]# ./bin/ceph mgr module ls
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2018-10-04 13:09:12.430 7fa64be26700 -1 WARNING: all dangerous and experimental features are enabled.
2018-10-04 13:09:12.499 7fa64be26700 -1 WARNING: all dangerous and experimental features are enabled.
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "devicehealth",
        "iostat",
        "prometheus",
        "restful",
        "status"
    ],
    "disabled_modules": [
        {
            "name": "ansible_orchestrator",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "diskprediction",
            ...
------------------------------------------------------------------------------------------------------------
[root@ceph build]# ./bin/ceph mgr module enable ansible_orchestrator
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2018-10-04 13:10:34.121 7f91d0c71700 -1 WARNING: all dangerous and experimental features are enabled.
2018-10-04 13:10:34.237 7f91d0c71700 -1 WARNING: all dangerous and experimental features are enabled.

------------------------------------------------------------------------------------------------------------
Ceph Mgr logs:
2018-10-04 13:10:34.638 7ff6ecc646c0  1 mgr[py] Loading python module 'ansible_orchestrator'
2018-10-04 13:10:34.663 7ff6ecc646c0  4 mgr[py] load_subclass_of: found class: 'ansible_orchestrator.Module'
2018-10-04 13:10:34.663 7ff6ecc646c0  4 mgr[py] Standby mode not provided by module 'ansible_orchestrator'
2018-10-04 13:10:35.608 7ff6c869b700  4 mgr[py] Starting ansible_orchestrator
2018-10-04 13:10:35.608 7ff6c869b700  1 mgr load Constructed class from module: ansible_orchestrator
2018-10-04 13:10:35.608 7ff6c869b700  4 mgr start_one Starting thread for ansible_orchestrator
2018-10-04 13:10:35.609 7ff6c6697700  4 mgr entry Entering thread for ansible_orchestrator
2018-10-04 13:10:35.609 7ff6c6697700  4 mgr[ansible_orchestrator] Starting Ansible Orchestrator module ...
2018-10-04 13:10:35.609 7ff6c6697700  4 mgr[ansible_orchestrator] No pending operations
2018-10-04 13:10:45.609 7ff6c6697700  4 mgr[ansible_orchestrator] No pending operations
....

------------------------------------------------------------------------------------------------------------

[root@ceph build]# ./bin/ceph  inventory
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2018-10-04 13:11:36.733 7f7247aeb700 -1 WARNING: all dangerous and experimental features are enabled.
2018-10-04 13:11:36.794 7f7247aeb700 -1 WARNING: all dangerous and experimental features are enabled.
Textual result of the playbook execution
<<<Textual result of the playbook execution>>>

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

doc/mgr/ansible_orchestrator.rst Outdated Show resolved Hide resolved
doc/mgr/ansible_orchestrator.rst Outdated Show resolved Hide resolved
src/pybind/mgr/ansible_orchestrator/ansible_runner_svc.py Outdated Show resolved Hide resolved
@jcsp
Copy link
Contributor

jcsp commented Oct 11, 2018

With this model of each operation being an individual playbook execution, I'm having trouble seeing how it will re-use ceph-ansible itself. In ceph-ansible users edit their hosts file to define services + optionally devices for OSDs, and then run site.yaml -- i.e. there is one big ansible execution that updates everything in the system to the desired state, rather than lots of little playbooks being run. @leseb can jump in if I'm reading the docs wrong on this.

One of the purposes of the wait() function is to enable this kind of use case: individual operations can update the hosts file and return completions, but the actual playbook execution wouldn't happen until someone called wait on those completions.

So I guess the key question is: is this module intending to re-use ceph-ansible, or is it intended to write some other set of playbooks? And if it will re-use ceph-ansible, then how will that work?

@jmolmo
Copy link
Member Author

jmolmo commented Oct 11, 2018

Thank you for your comments John, your help with the "orientation" and "concepts" will be very much appreciated. :-)

With this model of each operation being an individual playbook execution, I'm having trouble seeing how it will re-use ceph-ansible itself. In ceph-ansible users edit their hosts file to define services + optionally devices for OSDs, and then run site.yaml -- i.e. there is one big ansible execution that updates everything in the system to the desired state, rather than lots of little playbooks being run. @leseb can jump in if I'm reading the docs wrong on this.

I think that the model follows what the Ceph Mgr Orchestrator dictates.
The Orchestrator will have a set of operations to be done, reachable through the Orchestrator API. This operations will be executed through Ansible playbooks executions.(using the Ansible Runner service as provider of these operations).

The Ansible Runner service is just a back-end to ease the execution of playbooks over a set of hosts previously provisioned in the same service. The user has to provision previously the hosts and groups of the hosts. Once provisioned in the service, user can execute playbooks over this hosts/groups of hosts.
So the "magnitude" of the operation depends of what are the tasks implemented in the playbook/s called by the Orchestrator API endpoint.

Example:
If the Orchestrator has a "get_inventory" method, then we will use a "inventory" playbook over the hosts provisioned.
If the Orchestrator has a "build site" method, then we will use a "site.yaml" playbook over the hosts provisioned.

Therefore, is really each of the Orchestrator API endpoints what defines the "magnitude" of each task. (even it will be possible implement one orchestrator API enpoint using several different playbook executions.)

One of the purposes of the wait() function is to enable this kind of use case: individual operations can update the hosts file and return completions, but the actual playbook execution wouldn't happen until someone called wait on those completions.

Maybe i do not understand well the documentation, and i have implemented this in the wrong way. What we have in the documentation is:
"All methods that read or modify the state of the system can potentially be long running. To handle that, all such methods return a completion object (a ReadCompletion or a WriteCompletion). Orchestrator modules must implement the wait method: this takes a list of completions, and is responsible for checking if they’re finished, and advancing the underlying operations as needed."

So what i understood (and i have drive my implementation in this direction):

  • Orchestrator methods launch operations and return completions objects where we can check the operation status. (so playbook execution starts here)
  • Wait method check operations finished (and clean them) and advance operations, in our case the completions objects that we are going to use represent basically playbooks executions, so once the playbook is launched the "wait" method can't do nothing except checking if the execution has finished or not. Is the completion object the responsible of check the operation status and update this information. ( so the "wait" method only is able to clean finished operations)

About "individual operations can update the hosts file"...

I think we are not aligned with this. I explain how works the Ansible Runner Service.

In the Ansible Runner service is the User who provides the "inventory" of hosts to the Service. Althought it is possible to manage the "inventory", our assumption is that for the moment the only one that can say what hosts are in the cluster and what is the function of each one is the User.

So... although we can add/remove hosts(groups) from the Orchestrator, i think that it will be difficult to know what we have to do ... Can you explain with more detail your idea/assumption?

So I guess the key question is: is this module intending to re-use ceph-ansible, or is it intended to write some other set of playbooks?

The module is intending to execute ansible playbooks, most of the functionality we have is in the ceph-ansible playbooks so we will try to reuse it.
For example...

  • for the "get_inventory" method need a completelly new "playbook" to obtain the information (this is not available in ceph.ansible, in general "discovery" playbooks are not present)
  • for the "create_osd" we will use the available playbooks in the ceph-ansible repo.

And if it will re-use ceph-ansible, then how will that work?

One Orchestrator method will be called, this will launch a new completion object that will be responsible for the execution of one or more playbooks, this completion object will be returned to the caller. The caller will use the "status" and "result" attributes of the completion object to get the information required and to know if the operation has been executed successfully.

I expect a high degree of ceph-ansible functionality reuse, because i think that most of the operations needed in the Orchestrator are things that are being covered by the current ceph-ansible playbooks.

I think that the big challenge here is to create an Orchestrator API that provides a very easy way to manage all Ceph clusters operations.

@jcsp
Copy link
Contributor

jcsp commented Oct 12, 2018

If the Orchestrator has a "build site" method, then we will use a "site.yaml" playbook over the hosts provisioned.

If you do a bunch of operations, and then call wait(), then the wait() method is essentially your "build site" method. If I can use a Star Trek analogy... think of all the normal operations (like creating an OSD) as Captain Picard giving his orders, and then the wait() method as him saying "Make it so!".

I am not trying to say that every module (or even this module) has to work that way, just that the interface is designed so that it's a a possibility (i.e. having a wait() implementation that essentially updates the Ansible inventory/config, and runs site.yaml).

Orchestrator methods launch operations and return completions objects where we can check the operation status. (so playbook execution starts here)

The "playbook execution starts here" part is not a requirement of the interface. This is a key point. Nothing requires or promises that operations will begin at the point the completion object is constructed -- they don't have to advance at all until someone calls wait().

So... although we can add/remove hosts(groups) from the Orchestrator, i think that it will be difficult to know what we have to do ... Can you explain with more detail your idea/assumption?

I'm looking at the current workflow in ceph-ansible, where the way to create an OSD is to put a host in the [osd] section with a devices= line (I hope we have all read https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/installation_guide_for_red_hat_enterprise_linux/deploying-red-hat-ceph-storage)

@jmolmo
Copy link
Member Author

jmolmo commented Oct 15, 2018

If I can use a Star Trek analogy... think of all the normal operations (like creating an OSD) as Captain Picard giving his orders, and then the wait() method as him saying "Make it so!".

Thanks for the analogy! Now i understand better the aim of the design... (the "wait" method name does not helped too much, although in the documentation is clearly defined what you points) ... Ok i will modify my implementation in order that make the wait method the engine that make operations in completion objects to progress/advance.

I'm looking at the current workflow in ceph-ansible, where the way to create an OSD is to put a host in the [osd] section with a devices= line (I hope we have all read https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/installation_guide_for_red_hat_enterprise_linux/deploying-red-hat-ceph-storage)

Using Ansible Runner Service the User do not have directly an inventory file, although the User can manage the groups and the hosts in each group using the REST service.
The User can execute any kind of playbook (loaded playbooks) over any of the hosts using the right parameters.

The basic flow of operations in Ansible Runner Service is:

  1. The User must add the "hosts" in the cluster to the Ansible Runner Service. The user should include the public key of the Ansible Runner Service in each of this hosts in order to allow passwordless ssh access.

  2. The user execute a playbook over one of the hosts/one of the host groups providing the right parameters to the playbook

Taking this into account, we can use the orchestrator to provide the User a set of higher logic level / and more easy operations, for example, OSD management like creating, replacing OSD's

Maybe this differences with the "Ceph Ansible way of work" can clarify the working behavior of The Ansible Orchestrator:

  1. Ansible Runner Service has an internal list of nodes/groups of nodes while In Ceph Ansible what we have is the inventory of nodes in a file.

  2. In Ceph Ansible you can have a cluster playbook where you define the composition/features of your cluster. With the Ansible Runner Service you don't have this kind of file, what you have is the possibility of execute any of the roles playbooks over the provisioned set of hosts.
    ( this does not imply that we can implement this feature)

  3. I think that the real power of the Orchestrator is more like a day 2 or 3 tool, it is not intended to install the whole cluster ( although it can do) , is aimed to ease/provide any kind of management operations over a installed cluster.

@jmolmo
Copy link
Member Author

jmolmo commented Oct 15, 2018

jenkins retest this please

@jcsp
Copy link
Contributor

jcsp commented Oct 15, 2018

I don't think any limitations of ansible-runner-service are important -- it's brand new unreleased code, so it can be changed however is necessary. If it needs an extension to its API to define inventories in the ceph-ansible style, then that shouldn't be hard.

I think that the real power of the Orchestrator is more like a day 2 or 3 tool, it is not intended to install the whole cluster

The orchestrator interface absolutely is intended for installation of all the Ceph services apart from the initial mon and mgr services. Mons and managers require little or no configuration or decision making (it's easy to set them up with a simple CLI tool), whereas OSDs require a guided process to select how devices should be used (a GUI is strongly preferred), so it makes sense to ensure that the OSD installation part of the process happens in the Ceph dashboard.

There is no meaningful separation between "day 1" and "day 2" when it comes to Ceph OSDs, because part of the ongoing lifecycle of a Ceph cluster is adding new OSDs (as the cluster grows, as drives fail).

@jmolmo
Copy link
Member Author

jmolmo commented Oct 17, 2018

Manual test

test lab used:

  • Three vagrant vm machines (mon0, mgr0, osd0) with:
    ceph version 14.0.0-4023-gd03a830 (d03a830) nautilus (dev)
  • A container with the last version of Ansible Runner Rest Service

Operations

  1. Disable modules
[root@mon0 ~]# ceph mgr module disable ansible_orchestrator
[root@mon0 ~]# ceph mgr module disable orchestrator_cli
  1. Enable modules
[root@mon0 ~]# ceph mgr module enable ansible_orchestrator
[root@mon0 ~]# ceph mgr module enable orchestrator_cli
  1. Set ansible_orchestrator as backend of orchestrator-cli
[root@mon0 ~]# ceph orchestrator set backend ansible_orchestrator
[root@mon0 ~]#  ceph orchestrator status
Backend: ansible_orchestrator
Available: True
  1. Get cluster nodes and free devices
[root@mon0 ~]# ceph orchestrator device ls
192.168.121.245:
192.168.121.61:
192.168.121.254:
  sdc (hdd, 53687091200b)
  1. Checking the devices availability in osd0 (192.168.121.254):
[vagrant@osd0 ~]$ lsblk
NAME                                                                                                               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                                                                                  8:0    0   50G  0 disk 
├─ceph--filestore--d251cce4--e04f--4ea2--ba9e--b1afacc3797e-osd--data--c1ee62a2--f7da--49c7--bc99--a5b249402adc    253:4    0   47G  0 lvm  /var/lib/ceph/osd/ceph-0
└─ceph--filestore--d251cce4--e04f--4ea2--ba9e--b1afacc3797e-osd--journal--af4f2e06--d670--49bc--8d11--803916bbc858 253:5    0    2G  0 lvm  
sdb                                                                                                                  8:16   0   50G  0 disk 
├─ceph--filestore--c36f9d1a--c0e8--4e03--b27f--86049dc895c4-osd--data--8b2643e6--e8f9--4783--96eb--0ec241b5df70    253:2    0   47G  0 lvm  /var/lib/ceph/osd/ceph-1
└─ceph--filestore--c36f9d1a--c0e8--4e03--b27f--86049dc895c4-osd--journal--c90b5bd0--33fe--4b88--ac2a--5788be9de990 253:3    0    2G  0 lvm  
sdc                                                                                                                  8:32   0   50G  0 disk 
vda                                                                                                                252:0    0   41G  0 disk 
├─vda1                                                                                                             252:1    0    1M  0 part 
├─vda2                                                                                                             252:2    0    1G  0 part /boot
└─vda3                                                                                                             252:3    0   39G  0 part 
  ├─VolGroup00-LogVol00                                                                                            253:0    0 37.5G  0 lvm  /
  └─VolGroup00-LogVol01                                                                                            253:1    0  1.5G  0 lvm  [SWAP]

Manager logs during the operation

Note: Once enabled orchestrator modules, in the manager node is needed to raise the log level in order to get this king of log events output

[root@mgr0 ansible_orchestrator]# sudo ceph daemon mgr.mgr0 config set debug_mgr 20/5


[root@mgr0 ceph]# tail -f ceph-mgr.mgr0.log  | grep ansible_orchestrator

2018-10-17 17:45:19.695 7f525093b700 10 ceph_config_get orchestrator found: ansible_orchestrator
2018-10-17 17:45:19.695 7f525093b700 20 mgr dispatch_remote Calling ansible_orchestrator.get_inventory...
2018-10-17 17:45:19.704 7f525093b700  0 mgr[ansible_orchestrator] http POST https://192.168.121.1:5001/api/v1/playbooks/probe-disks.yml [{}] <--> (202 - ACCEPTED)
2018-10-17 17:45:19.704 7f525093b700  4 mgr[ansible_orchestrator] Playbook execution launched succesfuly
2018-10-17 17:45:19.704 7f525093b700 10 ceph_config_get orchestrator found: ansible_orchestrator
2018-10-17 17:45:19.704 7f525093b700 20 mgr dispatch_remote Calling ansible_orchestrator.wait...
2018-10-17 17:45:19.715 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/playbooks/6a388e60-d234-11e8-a922-2016b900e38f <--> (200 - {
2018-10-17 17:45:19.715 7f525093b700  4 mgr[ansible_orchestrator] Requested playbook execution status is: 2
2018-10-17 17:45:19.715 7f525093b700  4 mgr[ansible_orchestrator] playbook <probe-disks.yml> status:2
2018-10-17 17:45:19.715 7f525093b700  4 mgr[ansible_orchestrator] Operations pending: 1
2018-10-17 17:45:24.720 7f525093b700 10 ceph_config_get orchestrator found: ansible_orchestrator
2018-10-17 17:45:24.720 7f525093b700 20 mgr dispatch_remote Calling ansible_orchestrator.wait...
2018-10-17 17:45:24.735 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/playbooks/6a388e60-d234-11e8-a922-2016b900e38f <--> (200 - {
2018-10-17 17:45:24.735 7f525093b700  4 mgr[ansible_orchestrator] Requested playbook execution status is: 2
2018-10-17 17:45:24.735 7f525093b700  4 mgr[ansible_orchestrator] playbook <probe-disks.yml> status:2
2018-10-17 17:45:24.735 7f525093b700  4 mgr[ansible_orchestrator] Operations pending: 1
2018-10-17 17:45:29.741 7f525093b700 10 ceph_config_get orchestrator found: ansible_orchestrator
2018-10-17 17:45:29.741 7f525093b700 20 mgr dispatch_remote Calling ansible_orchestrator.wait...
2018-10-17 17:45:29.748 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/playbooks/6a388e60-d234-11e8-a922-2016b900e38f <--> (200 - {
2018-10-17 17:45:29.748 7f525093b700  4 mgr[ansible_orchestrator] Requested playbook execution status is: 2
2018-10-17 17:45:29.748 7f525093b700  4 mgr[ansible_orchestrator] playbook <probe-disks.yml> status:2
2018-10-17 17:45:29.748 7f525093b700  4 mgr[ansible_orchestrator] Operations pending: 1
2018-10-17 17:45:34.754 7f525093b700 10 ceph_config_get orchestrator found: ansible_orchestrator
2018-10-17 17:45:34.754 7f525093b700 20 mgr dispatch_remote Calling ansible_orchestrator.wait...
2018-10-17 17:45:34.767 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/playbooks/6a388e60-d234-11e8-a922-2016b900e38f <--> (200 - {
2018-10-17 17:45:34.767 7f525093b700  4 mgr[ansible_orchestrator] Requested playbook execution status is: 0
2018-10-17 17:45:34.767 7f525093b700  4 mgr[ansible_orchestrator] playbook <probe-disks.yml> status:0
2018-10-17 17:45:34.793 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/jobs/6a388e60-d234-11e8-a922-2016b900e38f/events <--> (200 - {
2018-10-17 17:45:34.794 7f525093b700  4 mgr[ansible_orchestrator] Requested playbook result is: {"37-63977577-38d7-4a3a-ad59-b451ae59a56b": {"host": "192.168.121.254", "task": "RESULTS", "event": "runner_on_ok"}}
2018-10-17 17:45:34.800 7f525093b700  4 mgr[ansible_orchestrator] http GET https://192.168.121.1:5001/api/v1/jobs/6a388e60-d234-11e8-a922-2016b900e38f/events/37-63977577-38d7-4a3a-ad59-b451ae59a56b <--> (200 - {
2018-10-17 17:45:34.800 7f525093b700  4 mgr[ansible_orchestrator] Operations pending: 0


response = r

except Exception as ex:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching Exception is a bit broad. Do you want to catch more thanrequests.exceptions.RequestException here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special action will be taken in case of any kind of error here , so in my opinion differenciate the errors is not adding too much value.
In any case, following your advice, I changed the log method to ""exception" in order to have more information about "the context" of the error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with just logging exceptions is that the ceph-mgr log is not visible to users unless they go and poke around on the node where it's running. To actually make an error visible to users, it's better to let the exception surface.

In other words, if there's no special action to take in the case of the exception, then don't catch it. In the case of login(), it makes sense to have the caller catch exceptions, rather than to catch them inside and then have the caller check is_operable -- that way the caller can see the actual exception, and surface it to the user (e.g. via a health check) if they choose to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the behavior of the module with the login implementation with/without error management.
And In this case (login), as John is pointing, it seems sensible to move the error management to the caller.
(not too much difference, but the error message/stack trace is clear because the explanation appears first).

In the case of the rest of the http methods, i think that is better to leave the error management in the method, if i remove it, then probably a generic error management will be implemented at the caller level to avoid repeating the error management in each http call.

Details:

I stopped the Ansible Runner Service to check the error:

When i try to login again: ( error management in login method)

2018-10-24 14:38:40.301 7fa08c337700  0 mgr[ansible_orchestrator] Ansible runner service - Unexpected error
Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 171, in login
    verify = self.certificate)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 68, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
2018-10-24 14:38:40.301 7fa08c337700  0 mgr[ansible_orchestrator] Ansible Runner Service not available. Check external server status or connection options. If configuration options changed try to disable/enable the module.

When i try to login again: (login method without error management, it is implemented in the caller.) (This is the current version)

2018-10-24 15:05:31.908 7f1056d41700  0 mgr[ansible_orchestrator] Ansible Runner Service not available. Check external server status or connection options. If configuration options changed try to disable/enable the module.
Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/ansible_orchestrator/module.py", line 250, in serve
    logger = self.log)
  File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 159, in __init__
    self.login()
  File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 171, in login
    verify = self.certificate)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 68, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))

response = r

except Exception as ex:
self.log.error("Ansible runner service - Unexpected error: %s", ex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you call log.execption, it will also print the stack trace.

self.log.exception("Ansible runner service")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@returns: A requests object
"""
# TODO
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
raise NotImplementedError("TODO")

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

inventory_nodes = []

# Loop over the result events and request the event data
for event_key, data in inventory_events.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteritems is not supported in Python 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!. Thanks!

event_data = json.loads(event_response.text)["data"]["event_data"]

free_disks = event_data["res"]["free_disks"]
for item, data in free_disks.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteritems see above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if item not in [host.name for host in inventory_nodes]:

devs = []
for dev_key, dev_data in data.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteritems see above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Auxiliary functions
#==============================================================================

def process_inventary_json(inventory_events, ar_client, playbook_uuid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def process_inventary_json(inventory_events, ar_client, playbook_uuid):
def process_inventory_json(inventory_events, ar_client, playbook_uuid):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Thanks!

params = "{}")

# Assing the process_output function
ansible_operation.process_output = process_inventary_json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ansible_operation.process_output = process_inventary_json
ansible_operation.process_output = process_inventory_json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



# List of playbooks names used
GET_INVENTORY_PLAYBOOK = "probe-disks.yml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share this file as an example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is playbook i'm using in this moment is only providing "free disks". In any case it was good enough to be used as base for implement the "get_inventory" method. I'm modifying it in order to get a list of all the devices. ( in any case if you know other playbook with this function it will be welcome)

[jolmomar@localhost tmp]$ cat probe-disks.yml
---
#
# Playbook to scan a set of hosts and return a dict indexed by host containing
# a list of disks that are unused. Each disk is represented by a dict with the
# following fields;
#
# size_txt (str) e.g 10.0GB
# size_bytes (int) e.g. 21474836480
# sectorsize (int) e.g. 512
# sectors (int) e.g 41943040
#
# example output;
# ok: [con-1 -> 127.0.0.1] => {
#    "free_disks": {
#        "con-1": {
#            "vdd": {
#                "rotational": true,
#                "sectors": 41943040,
#                "sectorsize": 512,
#                "size_bytes": 21474836480,
#                "size_txt": "20.00 GB"
#            }
#        },

- name: probe hosts for free disks
  hosts:
    - osds
    - mgrs
    - mons
  vars:
    free_disks: |
      {%- set disk_table = dict() %}
      {%- for host in play_hosts %}
        {%- set _x = disk_table.__setitem__(host, {}) %}
        {%- set _devdata = dict() %}
        {%- for disk in hostvars[host].host_disk %}
            {%- set _meta = hostvars[host]['ansible_devices'][disk] %}
            {%- set _x = _devdata.__setitem__(disk, dict(size_txt=_meta['size'],
                                                         rotational=_meta['rotational']|bool,
                                                         sectors=_meta['sectors']|int,
                                                         sectorsize=_meta['sectorsize']|int,
                                                         size_bytes=_meta['sectors']|int * _meta['sectorsize']|int)) %}
        {%- endfor %}
        {%- set _x = disk_table.__setitem__(host, _devdata) %}
      {%- endfor %}
      {{ disk_table }}

  gather_facts: true
  tasks:
    - name: setup
      set_fact:
          host_disk: []
    - name: Get a list of block devices (excludes loop and child devices)
      command: lsblk -n --o NAME --nodeps --exclude 7
      register: lsblk_out
    - name: check if disk {{ item }} is free
      command: pvcreate --test /dev/{{ item }}
      ignore_errors: true
      register: pv_status
      with_items: "{{lsblk_out.stdout_lines}}"
    - name: Update hosts freedisk list
      set_fact:
        host_disk: "{{host_disk + [item.item]}}"
      ignore_errors: true
      when: item.rc == 0
      with_items: "{{ pv_status.results}}"
    - name: RESULTS
      debug:
        var: free_disks
      delegate_to: 127.0.0.1
      run_once: True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name of the playbook to "host-disks.yml"



# List of playbooks names used
GET_INVENTORY_PLAYBOOK = "probe-disks.yml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name is weird, you call it GET_INVENTORY_PLAYBOOK but is this the playbook or the inventory? It's confusing. Based on https://github.com/ceph/ceph/pull/24445/files#diff-5940840b32ed5f2781084ee6e9a7d408R270 we would think it's the playbook but https://github.com/ceph/ceph/pull/24445/files#diff-5940840b32ed5f2781084ee6e9a7d408R261 indicates an inventory...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant must contain the name of the playbook used to retrieve the list of storage devices present in the host affected by the playbook run.

As you pointed the name of the playbook is weird...
i will change it. I think that "get_storage_devices.yml" is more understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally changed to "host-disks.yml".


# Create a new read completion object for execute the playbook
ansible_operation = AnsibleReadOperation(client = self.ar_client,
playbook = GET_INVENTORY_PLAYBOOK,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

playbook is confusing if we are actually called an inventory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should leave the name of the constant without changes, although i will add the following comment over the definition of the constant.
Name of the playbook used in the "get_inventory" method. This playbook is expected to provide a list of storage devices in the host where the playbook is executed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. _ansible-orchestrator-module:

====================
Ansible Orchestrator
Copy link
Contributor

@jcsp jcsp Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest sticking with plain ansible as the name. If we find it becomes necessary to highlight/identify which modules are orchestrator modules, we should do that programmatically rather than with long names.

(I mean the actual python module name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
Now it follows the same pattern that other Orchestratpor modules and it is more elegant in commands. Thanks!

"""

OPTIONS = [
{'name': 'server_addr'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest merging the addr+port settings into a single URL setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the server's URL is set atomically (i.e. port and hostname together) -- if changing the server's address and port, you don't want to go through an intermediate stage where it's trying to talk to the wrong port on the right hostname, or vice versa.

Copy link
Member Author

@jmolmo jmolmo Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... but in the orchestrator these values are not effective when you change them, only when you disable/enable the module, ( unless we change implementation to allow "hot" change of config variables)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm assuming that you would at some point want to improve the connection/authentication stuff so that an authentication error or a config change didn't require a ceph-mgr restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ceph mgr restart is not needed at all to refresh/change configuration values.

The sequence is:

  • Disable module
  • Change configuration values as required
  • Enable module

When the ansible module is enabled, it reads all the configuration values, and once readed and validated the module can start to use them. So you can deem the read of the configuration values as a "transactional" operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling or enabling a module restarts ceph-mgr.

Copy link
Member Author

@jmolmo jmolmo Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8-| ... i didn't realize of that... in fact the service is not restarted, the pid of the binary continues being the same ... but internally as you say a restart is executed ....
Does not seem very healthy the fact of changing and make effective a setting in one module, force the restart of other modules ... now i understand your comment:

you would at some point want to improve the connection/authentication stuff so that an authentication error or a config change didn't require a ceph-mgr restart

But this is a problem that affects all the modules...
So what i think what we need is basically a method that refreshes configuration and can be called from CLI.

I propose:

  • To add in the MgrModule base class a method "refresh_config" to be overwritten by modules:
    In this method the module must read all the settings and apply the changes detected.
  • To implement in the Ansible Orchestrator this method:
  • Add in the Orchestrator_CLI a new command to call the "refresh_config" in the backend orchestrator

If you agree... i can do this... i think that this is really 'adding value' ... more than join/not join together two different settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, I still think that you should store your URL as a URL. Your web browser doesn't have different text boxes for the hostname vs. port, and neither should your settings. Trust me on this. The way to specify the destination for an HTTP connection is to have a URL setting, this isn't controversial.

While some modules would benefit from a notification on changes (and that could be implemented pretty easily from PyModuleRegistry::handle_config calling through to ActivePyModules::notify_all), you don't need it here. Because you're a client rather than a server, you can just look at the configured URL each time you make a request. If it's different from your established client session, just throw away your session and open a new one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. server + port is now one setting: server_url

{'name': 'server_port'},
{'name': 'username'},
{'name': 'password'},
{'name': 'certificate'} # Ansible runner https server certificate file
Copy link
Contributor

@jcsp jcsp Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a filename? Manager modules should not depend on files on local filesystem, rather the certificate should be stored like the username/password (see how this is done for server side certs in dashboard)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will check the dashboard implementation and i will follow your advice.
But this is not part of this PR.. so i prefer to implement it later. I wouldn't like to add features over features in an endless PR

OPTIONS = [
{'name': 'server_addr'},
{'name': 'server_port'},
{'name': 'username'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ansible-runner-service actually have/need multi-tenancy (multiple user accounts)? It feels like an unnecessary complexity when the service is just for a single cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how Ansible runner service works. Even in a single cluster multiple users with different privileges use to exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the following may need input from @pcuzner on the intended security model:

I don't think that ARS has users with different privileges. From reading https://github.com/pcuzner/ansible-runner-service/blob/master/runner_service/controllers/login.py#L46 it seems like they just have a password stored in plain text in their config file, and once you're logged in it doesn't matter what user you are.

Even if ARS did expand its user account concept beyond a dict in the config file, all the ansible playbooks are being run as root out on the cluster nodes, so would it be any meaningful security isolation?

BTW, I also notice that ansible_runner_service has a default crypto secret of "secret" and nothing in the installation instructions about how to set it to something unique per-installation, so hopefully there is a plan for resolving that.

I also don't see any mechanism for revoking JWT tokens, so it seems like even if a user account was removed from the ARS configuration file, login sessions would continue until expiry (default 24 hours).

The username/password handling seems quite superficial, so I'm left wondering why we don't just use the client TLS certificates for authentication. The user/pass stuff seems like it's liable to give a false sense of security -- if the certs are handled properly, they should be enough security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't users with different privileges in Ansible Runner Service... but probably users will want a certain level of security. (what i mean is that not all the users that can access/use the servers has the same possibilities of doing things)

Ansible Runner Service is quite new ( like me) so we need a little time in order to be completely functional. :-)
By the moment what we have is the "user login" and the use of tokens... and as you said ... we have several ways to improve this point.

# Once authenticated this token will be used in all the requests
self.token = ""

self.server_url = "https://{0}:{1}".format(self.server, self.port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not IPv6 safe. For simplicity reasons, I'd suggest to not assemble URLs by hand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed: Thx!
server, port settings are changed to only one setting: server_url


# Used to verify or not https server identity
if not certificate:
self.certificate = False
Copy link
Contributor

@sebastian-philipp sebastian-philipp Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the default to True? Disabling HTTPS validation is questionable. This is related to https://github.com/requests/requests/blob/master/requests/api.py#L41-L43

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a setting ( "certificate") config to specify the path to the CA Bundle to use for verification.
If this is not provided then is assumed that we cannot verify the Ansible Server Identity, (this would be like a "dev" mode.)

  • BTW, probably the name of the config setting is not the best one.... -

In any case, in the Ansible Runner Service there is a change to be implement for using client TLS certificates.
pcuzner/ansible-runner-service#74
And this probably will imply changes in the login method of the orchestrator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a setting ( "certificate") config to specify the path to the CA Bundle to use for verification. If this is not provided then is assumed that we cannot verify the Ansible Server Identity, (this would be like a "dev" mode.)

Instead of assuming per default that we cannot verify the Identity, would it be possible to let the user explicitly disable verification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ... safe by default... i got it. Sure!!. I will change it asap. thx!

@jmolmo
Copy link
Member Author

jmolmo commented Nov 5, 2018

jenkins retest this please

1 similar comment
@jmolmo
Copy link
Member Author

jmolmo commented Nov 6, 2018

jenkins retest this please

@sebastian-philipp
Copy link
Contributor

@jmolmo any progress with your virtualenv here? Maybe @noahdesu has a clue?

@dotnwat
Copy link
Contributor

dotnwat commented Nov 9, 2018

@jmolmo what seems to be the issue? i struggled to get the tox tests for the insights plugin to work, but i think you should be able to effectively copy that over for the ansible case.

@dotnwat
Copy link
Contributor

dotnwat commented Nov 9, 2018

i've gotta say that failure is a bit baffling to me. the error seems to be complaining about a bad symbol in the path of insights plugin test, but I cannot even see the word insights appear in your patch!

@dotnwat
Copy link
Contributor

dotnwat commented Nov 9, 2018

jenkins retest this please

@jmolmo
Copy link
Member Author

jmolmo commented Nov 12, 2018

i've gotta say that failure is a bit baffling to me. the error seems to be complaining about a bad symbol in the path of insights plugin test, but I cannot even see the word insights appear in your patch!

Thanks for your help @noahdesu. It seems that the problems resides in some kind of weird dependency between v.env in insights and ansible... i continue investigation ....

@dotnwat
Copy link
Contributor

dotnwat commented Nov 13, 2018

@jmolmo i don't think my fix will work. but kefu has some really helpful tips in that PR that might help fix this issue! #25065

@jmolmo
Copy link
Member Author

jmolmo commented Nov 15, 2018

@noahdesu , @tchaikov Thank you very much for your help with this!! Finally it seems solved!
As Kefu pointed in #25065 the problem was in the reuse of tox settings for the different virtual environments.(same tox workdir)
Probably and to avoid problems in the future it would be also good to change the tox work dir in "insights" and in the "dashboard". What do you think?

@tchaikov
Copy link
Contributor

tchaikov commented Nov 16, 2018

Probably and to avoid problems in the future it would be also good to change the tox work dir in "insights" and in the "dashboard". What do you think?

yeah, that'd be simpler and better than what we have now.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, thanks @jmolmo

src/pybind/mgr/ansible/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/ansible/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/ansible/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/ansible/run-tox.sh Show resolved Hide resolved
src/pybind/mgr/ansible/run-tox.sh Outdated Show resolved Hide resolved
doc/mgr/ansible.rst Outdated Show resolved Hide resolved
doc/mgr/ansible.rst Outdated Show resolved Hide resolved
doc/mgr/ansible.rst Outdated Show resolved Hide resolved
doc/mgr/ansible.rst Outdated Show resolved Hide resolved
doc/mgr/ansible.rst Outdated Show resolved Hide resolved
@jmolmo
Copy link
Member Author

jmolmo commented Nov 22, 2018

jenkins retest this please

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good initial iteration :)

@sebastian-philipp
Copy link
Contributor

Jenkins says:

The following tests FAILED:
	  8 - run-tox-mgr-ansible (Failed)

@jmolmo
Copy link
Member Author

jmolmo commented Nov 23, 2018

jenkins retest this please

envlist = py27,py3
skipsdist = true
toxworkdir = {env:CEPH_BUILD_DIR}/ansible
minversion = 2.8.1
Copy link
Contributor

@sebastian-philipp sebastian-philipp Nov 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.4.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to see what happen ... no clues about why it was working before (o_o)

@jmolmo
Copy link
Member Author

jmolmo commented Nov 26, 2018

jenkins retest this please

@sebastian-philipp
Copy link
Contributor

The arm jenkins failed:

No uninstalled build requires
New python executable in /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/install-deps-python2.7_tmp/bin/python
Installing Setuptools..............................................................................................................................................................................................................................done.
Installing Pip.....................................................................................................................................................................................................................................................................................................................................done.
Downloading/unpacking virtualenv
  Running setup.py egg_info for package virtualenv
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    error in virtualenv setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
    Complete output from command python setup.py egg_info:
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'

  warnings.warn(msg)

error in virtualenv setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/install-deps-python2.7_tmp/build/virtualenv
Storing complete log in /home/jenkins-build/.pip/pip.log
./install-deps.sh: line 383: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/install-deps-python2.7_tmp/bin/virtualenv: No such file or directory
./install-deps.sh: line 386: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/install-deps-python2.7/bin/activate: No such file or directory
Requirement already satisfied (use --upgrade to upgrade): setuptools<36,>=0.8 in /usr/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): pip>=7.0 in /usr/lib/python2.7/site-packages
Collecting wheel>=0.24
  Downloading https://files.pythonhosted.org/packages/ff/47/1dfa4795e24fd6f93d5d58602dd716c3f101cfd5a77cd9acbe519b44a0a9/wheel-0.32.3-py2.py3-none-any.whl
Installing collected packages: wheel
Exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python2.7/site-packages/pip/commands/install.py", line 326, in run
    strip_file_prefix=options.strip_file_prefix,
  File "/usr/lib/python2.7/site-packages/pip/req/req_set.py", line 742, in install
    **kwargs
  File "/usr/lib/python2.7/site-packages/pip/req/req_install.py", line 834, in install
    strip_file_prefix=strip_file_prefix
  File "/usr/lib/python2.7/site-packages/pip/req/req_install.py", line 1037, in move_wheel_files
    strip_file_prefix=strip_file_prefix,
  File "/usr/lib/python2.7/site-packages/pip/wheel.py", line 346, in move_wheel_files
    clobber(source, lib_dir, True)
  File "/usr/lib/python2.7/site-packages/pip/wheel.py", line 317, in clobber
    ensure_dir(destdir)
  File "/usr/lib/python2.7/site-packages/pip/utils/__init__.py", line 83, in ensure_dir
    os.makedirs(path)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/usr/lib/python2.7/site-packages/wheel'

@jmolmo
Copy link
Member Author

jmolmo commented Nov 26, 2018

jenkins retest this please

@sebastian-philipp
Copy link
Contributor

Test project /home/jenkins-build/build/workspace/ceph-pull-requests/build
        Start   1: run-rbd-unit-tests.sh
        Start   2: run-cli-tests
        Start   3: test_objectstore_memstore.sh
        Start   4: smoke.sh
        Start   5: unittest_bufferlist.sh
        Start   6: run-tox-mgr-dashboard
        Start   7: run-tox-mgr-insights
        Start   8: run-tox-mgr-ansible
  1/163 Test   #8: run-tox-mgr-ansible .....................***Failed    0.51 sec
Traceback (most recent call last):
  File "/usr/bin/tox", line 9, in <module>
    load_entry_point('tox==1.4.2', 'console_scripts', 'tox')()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 25, in main
    retcode = Session(config).runcommand()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 273, in runcommand
    return self.subcommand_test()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 353, in subcommand_test
    sdist_path = self.sdist()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 339, in sdist
    sdist_path = self._makesdist()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 291, in _makesdist
    raise tox.exception.MissingFile(setup)
tox.MissingFile: MissingFile: /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/ansible/setup.py

@dotnwat
Copy link
Contributor

dotnwat commented Nov 26, 2018

@sebastian-philipp @jmolmo the failure on ARM was fixed in 9538675#diff-47a21b3706c13e08943e223c12323aa1

Looks like you can resolve that by rebasing onto master.

@tchaikov
Copy link
Contributor

retest this please.

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp @jmolmo the failure on ARM was fixed in 9538675#diff-47a21b3706c13e08943e223c12323aa1

Looks like you can resolve that by rebasing onto master.

One last thing. Arm64 failes with

  3/160 Test   #8: run-tox-mgr-ansible .....................***Failed    0.82 sec
ERROR: tox version is 1.4.2, required is at least 2.3.1

@sebastian-philipp
Copy link
Contributor

Argh. Something went horribly wrong with your git branch.

@jmolmo
Copy link
Member Author

jmolmo commented Nov 28, 2018

After rebasing onto master, i checked build environment/tests execution is OK in local:

`
[jolmomar@juanmipc build]$ sudo make mgr-ansible-test-venv
-- NSS_LIBRARIES: /usr/lib64/libssl3.so;/usr/lib64/libsmime3.so;/usr/lib64/libnss3.so;/usr/lib64/libnssutil3.so
-- NSS_INCLUDE_DIRS: /usr/include/nss3
...
Installing collected packages: virtualenv, toml, pluggy, py, filelock, tox
Successfully installed filelock-3.0.10 pluggy-0.8.0 py-1.7.0 toml-0.10.0 tox-3.5.3 virtualenv-16.1.0
Built target mgr-ansible-test-venv

[jolmomar@juanmipc build]$ sudo ctest -R run-tox-mgr-ansible -V
UpdateCTestConfiguration  from :/home/jolmomar/Code/ceph/build/DartConfiguration.tcl
...
8: ========================== 10 passed in 0.07 seconds ===========================
8: ___________________________________ summary ____________________________________
8:   py27: commands succeeded
8:   congratulations :)
1/1 Test #8: run-tox-mgr-ansible ..............   Passed    1.03 sec

The following tests passed:
	run-tox-mgr-ansible

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   1.04 sec
[jolmomar@juanmipc build]$

`

but unfortunately i have another different fail. @tchaikov, @noahdesu can you give me advice or any clue?

..` 
4/160 Test   #8: run-tox-mgr-ansible .....................***Failed    1.12 sec
Traceback (most recent call last):
  File "/usr/bin/tox", line 9, in <module>
    load_entry_point('tox==1.4.2', 'console_scripts', 'tox')()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 25, in main
    retcode = Session(config).runcommand()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 273, in runcommand
    return self.subcommand_test()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 353, in subcommand_test
    sdist_path = self.sdist()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 339, in sdist
    sdist_path = self._makesdist()
  File "/usr/lib/python2.7/site-packages/tox/_cmdline.py", line 291, in _makesdist
    raise tox.exception.MissingFile(setup)
tox.MissingFile: MissingFile: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/pybind/mgr/ansible/setup.py
`

@dotnwat
Copy link
Contributor

dotnwat commented Nov 28, 2018

@jmolmo that error is caused from an older version of tox (1.4.2) being used. that older version is installed on the system, but it looks like you commented out the virtualenv path, which should contain 2.9.1 (at least that's what it looks like from the log).

diff --git a/src/pybind/mgr/ansible/run-tox.sh b/src/pybind/mgr/ansible/run-tox.sh
index d14065e197..951ea23150 100644
--- a/src/pybind/mgr/ansible/run-tox.sh
+++ b/src/pybind/mgr/ansible/run-tox.sh
@@ -17,7 +17,7 @@ fi
 unset PYTHONPATH
 export CEPH_BUILD_DIR=$CEPH_BUILD_DIR
 
-# source ${MGR_ANSIBLE_VIRTUALENV}/bin/activate
+source ${MGR_ANSIBLE_VIRTUALENV}/bin/activate
 
 if [ "$WITH_PYTHON2" = "ON" ]; then
   ENV_LIST+="py27"

A Ceph Manager Orchestrator that uses a external REST API service to execute Ansible playbooks.

get_inventory implementation

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>

Document how to use CLI through Orchestrator CLI

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
@jmolmo
Copy link
Member Author

jmolmo commented Nov 29, 2018

@noahdesu, @sebastian-philipp thanks for your help with this. When i finished yesterday with this last error i was completely fustrated. I didn't remember that i commented out the "venv" activate command !!!. It seems that now everything is working ok. I have squashed all the commits in only one. I hope no new issues arise!!!.

@sebastian-philipp
Copy link
Contributor

jenkins retest this please

1 similar comment
@sebastian-philipp
Copy link
Contributor

jenkins retest this please

@sebastian-philipp
Copy link
Contributor

I'm going to run Jenkins again for the last time, to be really sure, we're not running again into this tox version dependency issue.

@sebastian-philipp
Copy link
Contributor

jenkins retest this please

1 similar comment
@sebastian-philipp
Copy link
Contributor

jenkins retest this please

@sebastian-philipp sebastian-philipp merged commit ce28976 into ceph:master Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants