Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

igw add support for ceph-iscsi package #3977

Merged
merged 10 commits into from Jul 3, 2019
Merged

igw add support for ceph-iscsi package #3977

merged 10 commits into from Jul 3, 2019

Conversation

mikechristie
Copy link

The ceph-iscsi-config and ceph-iscsi-cli packages were merged into ceph-iscsi, and the ceph-iscsi-config API was completely changed, but the ceph-ansible iscsi code/tasks will try to use ceph-iscsi with the old API calls and crash.

For older setups that are still using ceph-iscsi-config/cli we will still allow iscsigws.yml to define iscsi objects, but to handle this issue for new and updated installs using ceph-iscsi, this patchset syncs the non-container setup to work like container based setup where ansible is only used to setup the daemons and install rpms. gwcli or dashboard will then be used to setup the iscsi objects like the target, LUNs and clients.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1709518
Resolves: #3870

infrastructure-playbooks/purge-iscsi-gateways.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/purge-iscsi-gateways.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/purge-iscsi-gateways.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/purge-iscsi-gateways.yml Outdated Show resolved Hide resolved
@mikechristie
Copy link
Author

Hey @guits

I have never seen the web based suggested fix and commit interface before.

I clicked the button to add the suggested commits. Do you prefer I do that on the web interface, or that I redo my patches and then re-push them?

@guits
Copy link
Collaborator

guits commented May 16, 2019

@mikechristie indeed, the "suggested fix" feature from github looks nice at first glance but it end up with 1 commit by suggested change which is not that nice in the end 😄
I would prefer you amend your commits instead, sorry for the inconvenience it generated.

@mikechristie
Copy link
Author

No problem guits. Repushed with your requested changes. Thanks.

@guits
Copy link
Collaborator

guits commented May 17, 2019

@mikechristie looks like there's some valid failures :

TASK [ceph-iscsi-gw : make sure gateway_iqn is configured] *********************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-centos-non_container-all_daemons/roles/ceph-iscsi-gw/tasks/common.yml:9
Friday 17 May 2019  03:27:30 +0000 (0:00:00.027)       0:20:59.075 ************ 
fatal: [iscsi-gw0]: FAILED! => changed=false 
  msg: you must set a iqn for the iSCSI target

@mikechristie
Copy link
Author

Hey,

2 questions.

How can I see the iscsigws.yml being used for the tests that failed? Or, how do I see/modify a test like:

https://2.jenkins.ceph.com/job/ceph-ansible-prs-dev-centos-container-all_daemons/14/consoleText

They are not in ceph-ansible/tests right?

For the container cases, before my patches we currently ignore the iscsi variables like gateway_iqn, gateway_ip_list, etc if they are set in iscsigws.yml. Was that a mistake? Do we want to continue that behavior, or do you prefer we fail the task and warn the user those values are not supported when this is detected?

roles/ceph-iscsi-gw/tasks/common.yml Outdated Show resolved Hide resolved
roles/ceph-iscsi-gw/tasks/common.yml Outdated Show resolved Hide resolved
@mikechristie
Copy link
Author

Just adding myself a note. The current test failures like this:

 no repository is ready for: ceph-iscsi/master

were due to me breaking the ceph-iscsi build, so it is not on the shamen repos right now. It will be fixed here:

ceph/ceph-iscsi#93

@guits
Copy link
Collaborator

guits commented Jun 5, 2019

@mikechristie I think we are almost good, could you resolve the conflicting file please so we can trigger the CI?

Thanks!

@mikechristie
Copy link
Author

@guits

In this repush, I fixed the test file conflict and I also fixed the warnings:

[WARNING]: conditional statements should not include jinja2 templating

@guits
Copy link
Collaborator

guits commented Jun 6, 2019

looks like there is a valid failure:

_ TestiSCSIs.test_iscsi_service_enabled_and_running[ansible://iscsi-gw0-rbd-target-api] _
[gw1] linux2 -- Python 2.7.5 /tmp/venv.Fu9SFo0ujI/dev-centos-non_container-all_daemons/bin/python

self = <tests.iscsi.test_iscsi.TestiSCSIs object at 0x31e7410>
node = {'ceph_release_num': {'dev': 99, 'jewel': 10, 'kraken': 11, 'luminous': 12, ...}, 'ceph_stable_release': 'nautilus', 'docker': None, 'radosgw_num_instances': 1, ...}
host = <testinfra.host.Host object at 0x28094d0>, svc = 'rbd-target-api'

    @pytest.mark.parametrize('svc', [
        'rbd-target-api',
        'rbd-target-gw',
        'tcmu-runner'
    ])
    def test_iscsi_service_enabled_and_running(self, node, host, svc):
        s = host.service(svc)
        assert s.is_enabled
>       assert s.is_running
E       assert False
E        +  where False = <service rbd-target-api>.is_running

@dsavineau
Copy link
Contributor

@mikechristie
Copy link
Author

@guits

Ignore the repush I did today. It is going to fail. It is going to need this fix

ceph/ceph-iscsi#102

to pass. I will ping you when we are ready on the ceph-iscsi side.

@dsavineau
Copy link
Contributor

@mikechristie we merged ceph/ceph-container#1393 last week but we're wondering if we should have wait this PR to be merged before merging the changes in ceph-container. any thought ?
Because we saw some CI failure on rbd-targer-api/gw with the new container (ceph-iscsi 3.x) but maybe it's only related to ceph-iscsi bits.

@mikechristie
Copy link
Author

@dsavineau

How does ceph-container bring in package dependencies? Does it depend on yum/rpm handling it or do you have to add some explicit call/code?

The failure is probably the same as the one we are still hitting in this PR

def test_iscsi_service_enabled_and_running(self, node, host, svc):
    s = host.service(svc)
    assert s.is_enabled
  assert s.is_running

E assert False
E + where False = .is_running

and requires the patch in this ceph-iscsi PR:

(this patch was just merged today)
ceph/ceph-iscsi#102

which is just fixing up the spec to bring in pyOpenSSL/python-pyOpenSSL.

Without that patch rbd-target-api was failing to start on the tests because it was trying to do a "import ssl" and the package was not installed.

@dsavineau
Copy link
Contributor

How does ceph-container bring in package dependencies? Does it depend on yum/rpm handling it or do you have to add some explicit call/code?

ceph-container uses rpm/yum commands so it should install all required dependencies.

and requires the patch in this ceph-iscsi PR:

(this patch was just merged today)
ceph/ceph-iscsi#102

We have rebuild the ceph container image this morning and we have now ceph-iscsi-3.0-59.g4b372fa.el7.noarch (was ceph-iscsi-3.0-57 before). I think it includes your patch.

@mikechristie
Copy link
Author

Just FYI.

It looks like there is another bug with rbd-target-gw too.

@mikechristie
Copy link
Author

Just FYI.

It looks like there is another bug with rbd-target-gw too.

Ok. This is fixed in this ceph-iscsi PR:

ceph/ceph-iscsi#110

@dsavineau
Copy link
Contributor

    @pytest.mark.parametrize('svc', [
        'rbd-target-api',
        'rbd-target-gw',
        'tcmu-runner'
    ])
    def test_iscsi_service_enabled_and_running(self, node, host, svc):
        s = host.service(svc)
        assert s.is_enabled
>       assert s.is_running
E       assert False
E        +  where False = <service rbd-target-api>.is_running

I tried to debug this and there's some changes on the logging side for api/gw.
The testinfra CI sometimes fails but in fact the container is restarting forever. Because we are using custom systemd unit scripts to start the container, the status of the service is sometimes loaded active running then loaded activating auto-restart because the underlying ceph iscsi process is failing at the end.

So it's failing because the /var/log/rbd-target-gw and/or /var/log/rbd-target-api directories don't exist in the container image [1].

Jun 10 19:21:17 iscsi-gw0 docker[6908]: exec: PID 101: spawning /usr/bin/rbd-target-gw
Jun 10 19:21:17 iscsi-gw0 docker[6908]: exec: Waiting 101 to quit
Jun 10 19:21:17 iscsi-gw0 docker[6908]: Traceback (most recent call last):
Jun 10 19:21:17 iscsi-gw0 docker[6908]: File "/usr/bin/rbd-target-gw", line 81, in <module>
Jun 10 19:21:17 iscsi-gw0 docker[6908]: backupCount=7)
Jun 10 19:21:17 iscsi-gw0 docker[6908]: File "/usr/lib64/python2.7/logging/handlers.py", line 117, in __init__
Jun 10 19:21:17 iscsi-gw0 docker[6908]: BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
Jun 10 19:21:17 iscsi-gw0 docker[6908]: File "/usr/lib64/python2.7/logging/handlers.py", line 64, in __init__
Jun 10 19:21:17 iscsi-gw0 docker[6908]: logging.FileHandler.__init__(self, filename, mode, encoding, delay)
Jun 10 19:21:17 iscsi-gw0 docker[6908]: File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__
Jun 10 19:21:17 iscsi-gw0 docker[6908]: StreamHandler.__init__(self, self._open())
Jun 10 19:21:17 iscsi-gw0 docker[6908]: File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open
Jun 10 19:21:17 iscsi-gw0 docker[6908]: stream = open(self.baseFilename, self.mode)
Jun 10 19:21:17 iscsi-gw0 docker[6908]: IOError: [Errno 2] No such file or directory: '/var/log/rbd-target-gw/rbd-target-gw.log'

I don't think it was true for ceph-iscsi 2.x. For 3.x we probably need to create the directories on the host and bind mount them into the container (to avoid to do the logging in the container). @guits thoughts ?

Finally I didn't see any changes in the iscsi-gateway.cfg file template in this PR. I only took a quick look but the keyring variable is different between 2.x and 3.x

  • 2.x : gateway_keyring
  • 3.x : cluster_client_name

[1] https://github.com/ceph/ceph-iscsi-config/blob/master/rbd-target-gw.py#L412
[2] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-iscsi-gw/templates/iscsi-gateway.cfg.j2

@mikechristie
Copy link
Author

mikechristie commented Jun 12, 2019

@guits

We have merged the necessary ceph-iscsi patches to upstream so all non-container tests are now passing. I am not sure what the GH command is to restart the tests in this PR.

@dsavineau

The /var/log/rbd-target-* dir issue is a little messy, but maybe we do not need to support everything?

  • ceph-iscsi-config/ceph-iscsi-cli - added the /var/log/rbd-target-* dirs in the upstream master branch but we never made a upstream or downstream release with it, so maybe that helps.

However, there were shaman builds with it added because it is tracking upstream commit.s

  • ceph-iscsi - all versions have had the /var/log/rbd-target-/rbd-target-.log.

For iscsi-gateway.cfg, here are the changes:

  • as you mentioned gateway_keyring was dropped and we have cluster_client_name in 3.0
  • in 3.0 we have the new settings: pool and logger_level.

Here is a list of the iscsi-gateway.cfg settings in 3.0:

[config]
cluster_name
pool
cluster_client_name
time_out
api_host
api_port
api_secure
api_ssl_verify
loop_delay
trusted_ip_list
api_user
api_password
ceph_user
debug
minimum_gateways
ceph_config_dir
priv_key
pub_key
prometheus_exporter
prometheus_port
prometheus_host
logger_level

[target]
osd_op_timeout
dataout_timeout
nopin_response_timeout
nopin_timeout
qfull_timeout
cmdsn_depth
immediate_data
initial_r2t
max_outstanding_r2t
first_burst_length
max_burst_length
max_recv_data_segment_length
max_xmit_data_segment_length
max_data_area_mb
alua_failover_type
hw_max_sectors

@hoerup
Copy link

hoerup commented Jun 28, 2019

what is the status of this PR ?

@mikechristie
Copy link
Author

@hoerup

I think guits is waiting on the tests passing. I am having issues making them pass on GH, but though (sometimes a tests works and sometimes it doesn't). Locally they run ok for me. I sent a mail to guits with some details about the issue. Will update the PR when I figure out if it is a bug in my code or the test setup.

roles/ceph-validate/tasks/check_iscsi.yml Outdated Show resolved Hide resolved
roles/ceph-validate/tasks/check_iscsi.yml Outdated Show resolved Hide resolved
roles/ceph-validate/tasks/check_iscsi.yml Outdated Show resolved Hide resolved
@guits
Copy link
Collaborator

guits commented Jul 2, 2019

jenkins test pipeline

@guits
Copy link
Collaborator

guits commented Jul 2, 2019

jenkins test pipeline

Mike Christie and others added 9 commits July 3, 2019 09:50
The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
The gateway_ip_list is not used in container setups, so drop it
for that case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
This adds support for the ceph-iscsi package during install. ceph-iscsi
does not support setting up targets/gws, luns and clients with the
current library/igw_* code. Going forward those tasks should be done with
gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi
objects setup so we do not break existing setups.

The next patch will update the iscsigws.yml.sample to document that
users must not setup any iscsi object if they want to use the new
package and tools.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.

Signed-off-by: Mike Christie <mchristi@redhat.com>
gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.

Signed-off-by: Mike Christie <mchristi@redhat.com>
If the user has manually installed ceph-iscsi but is trying to setup a
iscsi object in iscsigws.yml you will just a python crash. This patch
adds a check and more user friendly error message for the case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
@guits
Copy link
Collaborator

guits commented Jul 3, 2019

jenkins test pipeline

3 similar comments
@guits
Copy link
Collaborator

guits commented Jul 3, 2019

jenkins test pipeline

@guits
Copy link
Collaborator

guits commented Jul 3, 2019

jenkins test pipeline

@guits
Copy link
Collaborator

guits commented Jul 3, 2019

jenkins test pipeline

@guits guits merged commit a781ce8 into ceph:master Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crash in igw_gateway (tgt) | configure iscsi target (gateway)
4 participants