Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CIDR blocklisting #44151

Merged
merged 24 commits into from Apr 21, 2022
Merged

Conversation

gregsfortytwo
Copy link
Member

@gregsfortytwo gregsfortytwo commented Nov 30, 2021

This PR lets users specify a CIDR range when blocklisting, and propagates
that range throughout the OSDMap blocklisting functionality.

To facilitate this, it also cleans up some of the blocklist handling in CephFS, and
implements blocklisting unit tests.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@gregsfortytwo
Copy link
Member Author

ceph test api

@gregsfortytwo
Copy link
Member Author

@batrick @neha-ojha @vshankar Obviously later than I'd planned, but I haven't done this bit twiddling in a while and my first working solution was too ugly to share. o_0 Please take a look!

There are a few teuthology tests that invoke blocklisting so I'm gonna go over those and see if any are possible to extend to test the ranges, but it's gonna be rough in the lab with the limited machines. And documentation incoming shortly.

@gregsfortytwo
Copy link
Member Author

@gregsfortytwo
Copy link
Member Author

https://pulpito.ceph.com/gregf-2021-12-02_21:15:39-rados-wip-cidr-blocklist-distro-basic-smithi for a few runs of existing rados blocklist tests

Passed

https://pulpito.ceph.com/gregf-2021-12-02_21:16:13-fs-wip-cidr-blocklist-distro-basic-smithi for the fs tests

One failed.

2021-12-02T21:56:02.651 DEBUG:teuthology.orchestra.run.smithi186:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 900 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.b.asok --format=json session evict 9300
2021-12-02T21:56:02.919 DEBUG:tasks.cephfs.filesystem:_json_asok output empty
2021-12-02T21:56:14.753 DEBUG:teuthology.orchestra.run.smithi018:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2021-12-02T21:56:14.759 DEBUG:teuthology.orchestra.run.smithi186:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2021-12-02T21:56:23.229 INFO:teuthology.orchestra.run.smithi018.stderr:Traceback (most recent call last):
2021-12-02T21:56:23.230 INFO:teuthology.orchestra.run.smithi018.stderr: File "", line 44, in
2021-12-02T21:56:23.230 INFO:teuthology.orchestra.run.smithi018.stderr:RuntimeError: write() failed to raise error
2021-12-02T21:56:23.235 DEBUG:teuthology.orchestra.run:got remote process result: 1

This has popped up at least once before; I think it's just a network blip. But I scheduled another run: http://pulpito.front.sepia.ceph.com:80/gregf-2021-12-06_17:17:45-fs-wip-cidr-blocklist-distro-basic-smithi/

@gregsfortytwo
Copy link
Member Author

gregsfortytwo commented Dec 8, 2021

Ah, this is actually busting mds client eviction as it stands. The MDS has client addresses that look like 172.21.15.18:0/1308057869 and so the monitor parses it as a CIDR range with an invalid bitmask.

It's probably simplest to make the UI a "blocklist add range" instead of overloading "blocklist add"; I'll try that out.

@gregsfortytwo gregsfortytwo force-pushed the wip-cidr-blocklist branch 3 times, most recently from 6eecf75 to 36b3ea2 Compare December 8, 2021 22:01
@gregsfortytwo
Copy link
Member Author

Finally got shaman to build without weird system failures, so scheduled tests again:
fs http://pulpito.front.sepia.ceph.com:80/gregf-2021-12-14_19:49:56-fs-wip-cidr-blocklist-1214-2-distro-default-smithi/
rados http://pulpito.front.sepia.ceph.com:80/gregf-2021-12-14_19:51:30-rados-wip-cidr-blocklist-1214-2-distro-default-smithi

These passed! There is a failure in one of the fs tests due to ""2021-12-14T20:24:53.732478+0000 mon.a (mon.0) 1960 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)" in cluster log", but that is a persistent issue in master as well.

Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments, looks good overall though! has it been through upgrade testing?

src/osd/OSDMap.h Outdated Show resolved Hide resolved
src/osd/OSDMap.h Show resolved Hide resolved
src/osd/OSDMap.cc Show resolved Hide resolved
src/test/osd/TestOSDMap.cc Outdated Show resolved Hide resolved
@@ -584,8 +584,31 @@ class OSDMap {
std::shared_ptr< mempool::osdmap::vector<uuid_d> > osd_uuid;
mempool::osdmap::vector<osd_xinfo_t> osd_xinfo;

class range_bits {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like boost has a network range type we could use here instead - address_v[4|6]_range. This implementation looks correct to me, though

@batrick
Copy link
Member

batrick commented Dec 15, 2021

jenkins test make check arm64

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my review so far; will finish tomorrow

The commit organization makes this a little more difficult. There is a is_cidr method which is not in the final diff. Some commits look like they could have been squashed.

src/msg/msg_types.h Show resolved Hide resolved
src/msg/msg_types.h Show resolved Hide resolved
doc/rados/operations/control.rst Outdated Show resolved Hide resolved
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

src/osd/OSDMap.h Show resolved Hide resolved
src/mon/MonCommands.h Show resolved Hide resolved
src/osd/OSDMap.cc Show resolved Hide resolved
src/test/osd/TestOSDMap.cc Show resolved Hide resolved
I'm not sure if the blocklist events tracking in Client.cc was ever
the simplest way to track that state, but it definitely isn't now. We
can just hand our addr_vec to the OSDMap and ask it -- it handles
version compatibility issues and, happily, means the Client doesn't
need to learn to deal with ranges directly.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…ss format

I discovered in testing with CephFS that this tends to interpret client IPs
(which don't have ports, but do have nonces) as invalid ranges. So give it
a separate input keyword that has to be applied first.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…ocklists

Providing a non-range-aware blocklist accessor would just be
asking for trouble, so don't.

The ugly part of this is how the Objecter is currently just
throwing the range blocklist on the end of its own list. The in-tree
callers are okay with this, and I'd like to look at removing the
blocklist events API from librados entirely -- it exposes "OSD-only"
state to clients and, as evidenced by this patch series, is not
particularly stable.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
… ranges

Carry a parallel map from cidr addresses to a new
range_bits class (stored entirely as ephemeral state) so that we
don't need to re-compute masks and bit mappings too often, and to
separate out the unpleasant ipv6 bit mapping logic. Then check
against those with range_bits::matches() the same way we check
for equality on specific-entity matches. Nice and simple loops!

Fixes: https://tracker.ceph.com/issues/53050

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
These tests are supposed to be validating we don't accept invalid IPs,
but they left out the "add" subcommand so they're all failing on that!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
...and make the OSDMap handle it.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo gregsfortytwo dismissed batrick’s stale review April 21, 2022 13:58

Addressed changes but he's on long-term leave

@gregsfortytwo
Copy link
Member Author

Jenkins test api

@gregsfortytwo gregsfortytwo merged commit 6536d0c into ceph:master Apr 21, 2022
11 checks passed
@ljflores
Copy link
Contributor

@gregsfortytwo looks like there is a regression related to this PR in the rados suite. I created a Tracker issue for it; can you take a look? https://tracker.ceph.com/issues/55419

http://pulpito.front.sepia.ceph.com/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-smithi/6800292/

2022-04-22T14:23:30.757 INFO:tasks.workunit.client.0.smithi085.stderr:listed 5 entries
2022-04-22T14:23:30.772 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  return 0
2022-04-22T14:23:30.773 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1429: test_mon_osd:  expect_false 'ceph osd blocklist add 192.168.0.1/-1'
2022-04-22T14:23:30.773 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false:  set -x
2022-04-22T14:23:30.773 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  'ceph osd blocklist add 192.168.0.1/-1'
2022-04-22T14:23:30.773 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh: line 36: ceph osd blocklist add 192.168.0.1/-1: No such file or directory
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  return 0
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1430: test_mon_osd:  expect_false 'ceph osd blocklist add 192.168.0.1/foo'
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false:  set -x
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  'ceph osd blocklist add 192.168.0.1/foo'
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh: line 36: ceph osd blocklist add 192.168.0.1/foo: No such file or directory
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  return 0
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1433: test_mon_osd:  expect_false 'ceph osd blocklist add 1234.56.78.90/100'
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:35: expect_false:  set -x
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  'ceph osd blocklist add 1234.56.78.90/100'
2022-04-22T14:23:30.774 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh: line 36: ceph osd blocklist add 1234.56.78.90/100: No such file or directory
2022-04-22T14:23:30.775 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:36: expect_false:  return 0
2022-04-22T14:23:30.775 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1436: test_mon_osd:  bl=192.168.0.1/24
2022-04-22T14:23:30.775 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1437: test_mon_osd:  ceph osd blocklist range add 192.168.0.1/24
2022-04-22T14:23:32.991 INFO:tasks.workunit.client.0.smithi085.stderr:blocklisting cidr:192.168.0.1:0/24 until 2022-04-22T15:23:32.109570+0000 (3600 sec)
2022-04-22T14:23:33.004 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1438: test_mon_osd:  ceph osd blocklist ls
2022-04-22T14:23:33.004 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1438: test_mon_osd:  grep 192.168.0.1/24
2022-04-22T14:23:33.427 INFO:tasks.workunit.client.0.smithi085.stderr:listed 6 entries
2022-04-22T14:23:33.436 INFO:tasks.workunit.client.0.smithi085.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_mon_osd:  rm -fr /tmp/cephtool.dfI
2022-04-22T14:23:33.437 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-04-22T14:23:33.438 INFO:tasks.workunit:Stopping ['cephtool'] on client.0...
2022-04-22T14:23:33.438 DEBUG:teuthology.orchestra.run.smithi085:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2022-04-22T14:23:33.680 ERROR:teuthology.run_tasks:Saw exception from tasks.

@ljflores
Copy link
Contributor

Also @gregsfortytwo was there a final teuthology run done for this PR?

@gregsfortytwo gregsfortytwo assigned batrick and unassigned batrick Apr 22, 2022
@gregsfortytwo
Copy link
Member Author

gregsfortytwo commented Apr 22, 2022

...my apologies, I'm looking at the test history and realizing I ran it through less of the suite than I thought I had. Since the only recent changes were in the unit test code I didn't spin up another test, but I definitely should have given what it's actually run through. :/ I'm looking into this.

@gregsfortytwo gregsfortytwo mentioned this pull request Apr 23, 2022
14 tasks
@neha-ojha
Copy link
Member

@gregsfortytwo I am marking https://tracker.ceph.com/issues/55419 as resolved for now. Is there a tracker this to track the backport of this feature? we should not forget to include the fix.

@gregsfortytwo
Copy link
Member Author

@gregsfortytwo I am marking https://tracker.ceph.com/issues/55419 as resolved for now. Is there a tracker this to track the backport of this feature? we should not forget to include the fix.

https://tracker.ceph.com/issues/53050 is Pending Backport; I was waiting on the test fix to make sure there weren't any other issues before I generate backport PRs. Will get to them in the next day or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants