-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/ansible: Ansible orchestrator module #24445
Conversation
With this model of each operation being an individual playbook execution, I'm having trouble seeing how it will re-use ceph-ansible itself. In ceph-ansible users edit their hosts file to define services + optionally devices for OSDs, and then run site.yaml -- i.e. there is one big ansible execution that updates everything in the system to the desired state, rather than lots of little playbooks being run. @leseb can jump in if I'm reading the docs wrong on this. One of the purposes of the wait() function is to enable this kind of use case: individual operations can update the hosts file and return completions, but the actual playbook execution wouldn't happen until someone called So I guess the key question is: is this module intending to re-use ceph-ansible, or is it intended to write some other set of playbooks? And if it will re-use ceph-ansible, then how will that work? |
Thank you for your comments John, your help with the "orientation" and "concepts" will be very much appreciated. :-)
I think that the model follows what the Ceph Mgr Orchestrator dictates. The Ansible Runner service is just a back-end to ease the execution of playbooks over a set of hosts previously provisioned in the same service. The user has to provision previously the hosts and groups of the hosts. Once provisioned in the service, user can execute playbooks over this hosts/groups of hosts. Example: Therefore, is really each of the Orchestrator API endpoints what defines the "magnitude" of each task. (even it will be possible implement one orchestrator API enpoint using several different playbook executions.)
Maybe i do not understand well the documentation, and i have implemented this in the wrong way. What we have in the documentation is: So what i understood (and i have drive my implementation in this direction):
I think we are not aligned with this. I explain how works the Ansible Runner Service. In the Ansible Runner service is the User who provides the "inventory" of hosts to the Service. Althought it is possible to manage the "inventory", our assumption is that for the moment the only one that can say what hosts are in the cluster and what is the function of each one is the User. So... although we can add/remove hosts(groups) from the Orchestrator, i think that it will be difficult to know what we have to do ... Can you explain with more detail your idea/assumption?
The module is intending to execute ansible playbooks, most of the functionality we have is in the ceph-ansible playbooks so we will try to reuse it.
One Orchestrator method will be called, this will launch a new completion object that will be responsible for the execution of one or more playbooks, this completion object will be returned to the caller. The caller will use the "status" and "result" attributes of the completion object to get the information required and to know if the operation has been executed successfully. I expect a high degree of ceph-ansible functionality reuse, because i think that most of the operations needed in the Orchestrator are things that are being covered by the current ceph-ansible playbooks. I think that the big challenge here is to create an Orchestrator API that provides a very easy way to manage all Ceph clusters operations. |
If you do a bunch of operations, and then call wait(), then the wait() method is essentially your "build site" method. If I can use a Star Trek analogy... think of all the normal operations (like creating an OSD) as Captain Picard giving his orders, and then the I am not trying to say that every module (or even this module) has to work that way, just that the interface is designed so that it's a a possibility (i.e. having a
The "playbook execution starts here" part is not a requirement of the interface. This is a key point. Nothing requires or promises that operations will begin at the point the completion object is constructed -- they don't have to advance at all until someone calls wait().
I'm looking at the current workflow in ceph-ansible, where the way to create an OSD is to put a host in the [osd] section with a |
Thanks for the analogy! Now i understand better the aim of the design... (the "wait" method name does not helped too much, although in the documentation is clearly defined what you points) ... Ok i will modify my implementation in order that make the wait method the engine that make operations in completion objects to progress/advance.
Using Ansible Runner Service the User do not have directly an inventory file, although the User can manage the groups and the hosts in each group using the REST service. The basic flow of operations in Ansible Runner Service is:
Taking this into account, we can use the orchestrator to provide the User a set of higher logic level / and more easy operations, for example, OSD management like creating, replacing OSD's Maybe this differences with the "Ceph Ansible way of work" can clarify the working behavior of The Ansible Orchestrator:
|
jenkins retest this please |
I don't think any limitations of ansible-runner-service are important -- it's brand new unreleased code, so it can be changed however is necessary. If it needs an extension to its API to define inventories in the ceph-ansible style, then that shouldn't be hard.
The orchestrator interface absolutely is intended for installation of all the Ceph services apart from the initial mon and mgr services. Mons and managers require little or no configuration or decision making (it's easy to set them up with a simple CLI tool), whereas OSDs require a guided process to select how devices should be used (a GUI is strongly preferred), so it makes sense to ensure that the OSD installation part of the process happens in the Ceph dashboard. There is no meaningful separation between "day 1" and "day 2" when it comes to Ceph OSDs, because part of the ongoing lifecycle of a Ceph cluster is adding new OSDs (as the cluster grows, as drives fail). |
Manual test test lab used:
Operations
Manager logs during the operation Note: Once enabled orchestrator modules, in the manager node is needed to raise the log level in order to get this king of log events output
|
|
||
response = r | ||
|
||
except Exception as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching Exception
is a bit broad. Do you want to catch more thanrequests.exceptions.RequestException
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No special action will be taken in case of any kind of error here , so in my opinion differenciate the errors is not adding too much value.
In any case, following your advice, I changed the log method to ""exception" in order to have more information about "the context" of the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with just logging exceptions is that the ceph-mgr log is not visible to users unless they go and poke around on the node where it's running. To actually make an error visible to users, it's better to let the exception surface.
In other words, if there's no special action to take in the case of the exception, then don't catch it. In the case of login(), it makes sense to have the caller catch exceptions, rather than to catch them inside and then have the caller check is_operable -- that way the caller can see the actual exception, and surface it to the user (e.g. via a health check) if they choose to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the behavior of the module with the login implementation with/without error management.
And In this case (login), as John is pointing, it seems sensible to move the error management to the caller.
(not too much difference, but the error message/stack trace is clear because the explanation appears first).
In the case of the rest of the http methods, i think that is better to leave the error management in the method, if i remove it, then probably a generic error management will be implemented at the caller level to avoid repeating the error management in each http call.
Details:
I stopped the Ansible Runner Service to check the error:
When i try to login again: ( error management in login method)
2018-10-24 14:38:40.301 7fa08c337700 0 mgr[ansible_orchestrator] Ansible runner service - Unexpected error
Traceback (most recent call last):
File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 171, in login
verify = self.certificate)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 68, in get
return request('get', url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
2018-10-24 14:38:40.301 7fa08c337700 0 mgr[ansible_orchestrator] Ansible Runner Service not available. Check external server status or connection options. If configuration options changed try to disable/enable the module.
When i try to login again: (login method without error management, it is implemented in the caller.) (This is the current version)
2018-10-24 15:05:31.908 7f1056d41700 0 mgr[ansible_orchestrator] Ansible Runner Service not available. Check external server status or connection options. If configuration options changed try to disable/enable the module.
Traceback (most recent call last):
File "/usr/lib64/ceph/mgr/ansible_orchestrator/module.py", line 250, in serve
logger = self.log)
File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 159, in __init__
self.login()
File "/usr/lib64/ceph/mgr/ansible_orchestrator/ansible_runner_svc.py", line 171, in login
verify = self.certificate)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 68, in get
return request('get', url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
response = r | ||
|
||
except Exception as ex: | ||
self.log.error("Ansible runner service - Unexpected error: %s", ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you call log.execption
, it will also print the stack trace.
self.log.exception("Ansible runner service")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@returns: A requests object | ||
""" | ||
# TODO | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass | |
raise NotImplementedError("TODO") |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
inventory_nodes = [] | ||
|
||
# Loop over the result events and request the event data | ||
for event_key, data in inventory_events.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteritems
is not supported in Python 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!. Thanks!
event_data = json.loads(event_response.text)["data"]["event_data"] | ||
|
||
free_disks = event_data["res"]["free_disks"] | ||
for item, data in free_disks.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteritems
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if item not in [host.name for host in inventory_nodes]: | ||
|
||
devs = [] | ||
for dev_key, dev_data in data.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteritems
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Auxiliary functions | ||
#============================================================================== | ||
|
||
def process_inventary_json(inventory_events, ar_client, playbook_uuid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def process_inventary_json(inventory_events, ar_client, playbook_uuid): | |
def process_inventory_json(inventory_events, ar_client, playbook_uuid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks!
params = "{}") | ||
|
||
# Assing the process_output function | ||
ansible_operation.process_output = process_inventary_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ansible_operation.process_output = process_inventary_json | |
ansible_operation.process_output = process_inventory_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
|
||
# List of playbooks names used | ||
GET_INVENTORY_PLAYBOOK = "probe-disks.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share this file as an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is playbook i'm using in this moment is only providing "free disks". In any case it was good enough to be used as base for implement the "get_inventory" method. I'm modifying it in order to get a list of all the devices. ( in any case if you know other playbook with this function it will be welcome)
[jolmomar@localhost tmp]$ cat probe-disks.yml
---
#
# Playbook to scan a set of hosts and return a dict indexed by host containing
# a list of disks that are unused. Each disk is represented by a dict with the
# following fields;
#
# size_txt (str) e.g 10.0GB
# size_bytes (int) e.g. 21474836480
# sectorsize (int) e.g. 512
# sectors (int) e.g 41943040
#
# example output;
# ok: [con-1 -> 127.0.0.1] => {
# "free_disks": {
# "con-1": {
# "vdd": {
# "rotational": true,
# "sectors": 41943040,
# "sectorsize": 512,
# "size_bytes": 21474836480,
# "size_txt": "20.00 GB"
# }
# },
- name: probe hosts for free disks
hosts:
- osds
- mgrs
- mons
vars:
free_disks: |
{%- set disk_table = dict() %}
{%- for host in play_hosts %}
{%- set _x = disk_table.__setitem__(host, {}) %}
{%- set _devdata = dict() %}
{%- for disk in hostvars[host].host_disk %}
{%- set _meta = hostvars[host]['ansible_devices'][disk] %}
{%- set _x = _devdata.__setitem__(disk, dict(size_txt=_meta['size'],
rotational=_meta['rotational']|bool,
sectors=_meta['sectors']|int,
sectorsize=_meta['sectorsize']|int,
size_bytes=_meta['sectors']|int * _meta['sectorsize']|int)) %}
{%- endfor %}
{%- set _x = disk_table.__setitem__(host, _devdata) %}
{%- endfor %}
{{ disk_table }}
gather_facts: true
tasks:
- name: setup
set_fact:
host_disk: []
- name: Get a list of block devices (excludes loop and child devices)
command: lsblk -n --o NAME --nodeps --exclude 7
register: lsblk_out
- name: check if disk {{ item }} is free
command: pvcreate --test /dev/{{ item }}
ignore_errors: true
register: pv_status
with_items: "{{lsblk_out.stdout_lines}}"
- name: Update hosts freedisk list
set_fact:
host_disk: "{{host_disk + [item.item]}}"
ignore_errors: true
when: item.rc == 0
with_items: "{{ pv_status.results}}"
- name: RESULTS
debug:
var: free_disks
delegate_to: 127.0.0.1
run_once: True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name of the playbook to "host-disks.yml"
|
||
|
||
# List of playbooks names used | ||
GET_INVENTORY_PLAYBOOK = "probe-disks.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name is weird, you call it GET_INVENTORY_PLAYBOOK
but is this the playbook or the inventory? It's confusing. Based on https://github.com/ceph/ceph/pull/24445/files#diff-5940840b32ed5f2781084ee6e9a7d408R270 we would think it's the playbook but https://github.com/ceph/ceph/pull/24445/files#diff-5940840b32ed5f2781084ee6e9a7d408R261 indicates an inventory...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant must contain the name of the playbook used to retrieve the list of storage devices present in the host affected by the playbook run.
As you pointed the name of the playbook is weird...
i will change it. I think that "get_storage_devices.yml" is more understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally changed to "host-disks.yml".
|
||
# Create a new read completion object for execute the playbook | ||
ansible_operation = AnsibleReadOperation(client = self.ar_client, | ||
playbook = GET_INVENTORY_PLAYBOOK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
playbook
is confusing if we are actually called an inventory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should leave the name of the constant without changes, although i will add the following comment over the definition of the constant.
Name of the playbook used in the "get_inventory" method. This playbook is expected to provide a list of storage devices in the host where the playbook is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/mgr/ansible_orchestrator.rst
Outdated
.. _ansible-orchestrator-module: | ||
|
||
==================== | ||
Ansible Orchestrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest sticking with plain ansible
as the name. If we find it becomes necessary to highlight/identify which modules are orchestrator modules, we should do that programmatically rather than with long names.
(I mean the actual python module name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Now it follows the same pattern that other Orchestratpor modules and it is more elegant in commands. Thanks!
""" | ||
|
||
OPTIONS = [ | ||
{'name': 'server_addr'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest merging the addr+port settings into a single URL setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that the server's URL is set atomically (i.e. port and hostname together) -- if changing the server's address and port, you don't want to go through an intermediate stage where it's trying to talk to the wrong port on the right hostname, or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... but in the orchestrator these values are not effective when you change them, only when you disable/enable the module, ( unless we change implementation to allow "hot" change of config variables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm assuming that you would at some point want to improve the connection/authentication stuff so that an authentication error or a config change didn't require a ceph-mgr restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ceph mgr restart is not needed at all to refresh/change configuration values.
The sequence is:
- Disable module
- Change configuration values as required
- Enable module
When the ansible module is enabled, it reads all the configuration values, and once readed and validated the module can start to use them. So you can deem the read of the configuration values as a "transactional" operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling or enabling a module restarts ceph-mgr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8-| ... i didn't realize of that... in fact the service is not restarted, the pid of the binary continues being the same ... but internally as you say a restart is executed ....
Does not seem very healthy the fact of changing and make effective a setting in one module, force the restart of other modules ... now i understand your comment:
you would at some point want to improve the connection/authentication stuff so that an authentication error or a config change didn't require a ceph-mgr restart
But this is a problem that affects all the modules...
So what i think what we need is basically a method that refreshes configuration and can be called from CLI.
I propose:
- To add in the MgrModule base class a method "refresh_config" to be overwritten by modules:
In this method the module must read all the settings and apply the changes detected. - To implement in the Ansible Orchestrator this method:
- Add in the Orchestrator_CLI a new command to call the "refresh_config" in the backend orchestrator
If you agree... i can do this... i think that this is really 'adding value' ... more than join/not join together two different settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, I still think that you should store your URL as a URL. Your web browser doesn't have different text boxes for the hostname vs. port, and neither should your settings. Trust me on this. The way to specify the destination for an HTTP connection is to have a URL setting, this isn't controversial.
While some modules would benefit from a notification on changes (and that could be implemented pretty easily from PyModuleRegistry::handle_config calling through to ActivePyModules::notify_all), you don't need it here. Because you're a client rather than a server, you can just look at the configured URL each time you make a request. If it's different from your established client session, just throw away your session and open a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. server + port is now one setting: server_url
{'name': 'server_port'}, | ||
{'name': 'username'}, | ||
{'name': 'password'}, | ||
{'name': 'certificate'} # Ansible runner https server certificate file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a filename? Manager modules should not depend on files on local filesystem, rather the certificate should be stored like the username/password (see how this is done for server side certs in dashboard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will check the dashboard implementation and i will follow your advice.
But this is not part of this PR.. so i prefer to implement it later. I wouldn't like to add features over features in an endless PR
OPTIONS = [ | ||
{'name': 'server_addr'}, | ||
{'name': 'server_port'}, | ||
{'name': 'username'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ansible-runner-service actually have/need multi-tenancy (multiple user accounts)? It feels like an unnecessary complexity when the service is just for a single cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Ansible runner service works. Even in a single cluster multiple users with different privileges use to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the following may need input from @pcuzner on the intended security model:
I don't think that ARS has users with different privileges. From reading https://github.com/pcuzner/ansible-runner-service/blob/master/runner_service/controllers/login.py#L46 it seems like they just have a password stored in plain text in their config file, and once you're logged in it doesn't matter what user you are.
Even if ARS did expand its user account concept beyond a dict in the config file, all the ansible playbooks are being run as root out on the cluster nodes, so would it be any meaningful security isolation?
BTW, I also notice that ansible_runner_service has a default crypto secret of "secret" and nothing in the installation instructions about how to set it to something unique per-installation, so hopefully there is a plan for resolving that.
I also don't see any mechanism for revoking JWT tokens, so it seems like even if a user account was removed from the ARS configuration file, login sessions would continue until expiry (default 24 hours).
The username/password handling seems quite superficial, so I'm left wondering why we don't just use the client TLS certificates for authentication. The user/pass stuff seems like it's liable to give a false sense of security -- if the certs are handled properly, they should be enough security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't users with different privileges in Ansible Runner Service... but probably users will want a certain level of security. (what i mean is that not all the users that can access/use the servers has the same possibilities of doing things)
Ansible Runner Service is quite new ( like me) so we need a little time in order to be completely functional. :-)
By the moment what we have is the "user login" and the use of tokens... and as you said ... we have several ways to improve this point.
# Once authenticated this token will be used in all the requests | ||
self.token = "" | ||
|
||
self.server_url = "https://{0}:{1}".format(self.server, self.port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not IPv6 safe. For simplicity reasons, I'd suggest to not assemble URLs by hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed: Thx!
server, port settings are changed to only one setting: server_url
|
||
# Used to verify or not https server identity | ||
if not certificate: | ||
self.certificate = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the default to True
? Disabling HTTPS validation is questionable. This is related to https://github.com/requests/requests/blob/master/requests/api.py#L41-L43
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a setting ( "certificate") config to specify the path to the CA Bundle to use for verification.
If this is not provided then is assumed that we cannot verify the Ansible Server Identity, (this would be like a "dev" mode.)
- BTW, probably the name of the config setting is not the best one.... -
In any case, in the Ansible Runner Service there is a change to be implement for using client TLS certificates.
pcuzner/ansible-runner-service#74
And this probably will imply changes in the login method of the orchestrator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a setting ( "certificate") config to specify the path to the CA Bundle to use for verification. If this is not provided then is assumed that we cannot verify the Ansible Server Identity, (this would be like a "dev" mode.)
Instead of assuming per default that we cannot verify the Identity, would it be possible to let the user explicitly disable verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok ... safe by default... i got it. Sure!!. I will change it asap. thx!
jenkins retest this please |
1 similar comment
jenkins retest this please |
@jmolmo what seems to be the issue? i struggled to get the tox tests for the insights plugin to work, but i think you should be able to effectively copy that over for the ansible case. |
i've gotta say that failure is a bit baffling to me. the error seems to be complaining about a bad symbol in the path of insights plugin test, but I cannot even see the word insights appear in your patch! |
jenkins retest this please |
Thanks for your help @noahdesu. It seems that the problems resides in some kind of weird dependency between v.env in insights and ansible... i continue investigation .... |
@noahdesu , @tchaikov Thank you very much for your help with this!! Finally it seems solved! |
yeah, that'd be simpler and better than what we have now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, thanks @jmolmo
jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good initial iteration :)
Jenkins says:
|
jenkins retest this please |
envlist = py27,py3 | ||
skipsdist = true | ||
toxworkdir = {env:CEPH_BUILD_DIR}/ansible | ||
minversion = 2.8.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.4.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change to see what happen ... no clues about why it was working before (o_o)
jenkins retest this please |
The arm jenkins failed:
|
jenkins retest this please |
|
@sebastian-philipp @jmolmo the failure on ARM was fixed in 9538675#diff-47a21b3706c13e08943e223c12323aa1 Looks like you can resolve that by rebasing onto master. |
retest this please. |
One last thing. Arm64 failes with
|
Argh. Something went horribly wrong with your git branch. |
After rebasing onto master, i checked build environment/tests execution is OK in local:
but unfortunately i have another different fail. @tchaikov, @noahdesu can you give me advice or any clue?
|
@jmolmo that error is caused from an older version of tox (1.4.2) being used. that older version is installed on the system, but it looks like you commented out the virtualenv path, which should contain 2.9.1 (at least that's what it looks like from the log). diff --git a/src/pybind/mgr/ansible/run-tox.sh b/src/pybind/mgr/ansible/run-tox.sh
index d14065e197..951ea23150 100644
--- a/src/pybind/mgr/ansible/run-tox.sh
+++ b/src/pybind/mgr/ansible/run-tox.sh
@@ -17,7 +17,7 @@ fi
unset PYTHONPATH
export CEPH_BUILD_DIR=$CEPH_BUILD_DIR
-# source ${MGR_ANSIBLE_VIRTUALENV}/bin/activate
+source ${MGR_ANSIBLE_VIRTUALENV}/bin/activate
if [ "$WITH_PYTHON2" = "ON" ]; then
ENV_LIST+="py27" |
A Ceph Manager Orchestrator that uses a external REST API service to execute Ansible playbooks. get_inventory implementation Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com> Document how to use CLI through Orchestrator CLI Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
@noahdesu, @sebastian-philipp thanks for your help with this. When i finished yesterday with this last error i was completely fustrated. I didn't remember that i commented out the "venv" activate command !!!. It seems that now everything is working ok. I have squashed all the commits in only one. I hope no new issues arise!!!. |
jenkins retest this please |
1 similar comment
jenkins retest this please |
I'm going to run Jenkins again for the last time, to be really sure, we're not running again into this tox version dependency issue. |
jenkins retest this please |
1 similar comment
jenkins retest this please |
A Ceph Manager Orchestrator that uses a external REST API service to execute Ansible playbooks.
Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com
A first running version of this orchestrator manager module.
Still lot of things to do, but this allows to start getting feedback from the community.
Just manual tests ran:
enable/disable module and check logs
test command line (get inventory)
Details