Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon, osd: misc fixes #16283

Merged
merged 9 commits into from Jul 18, 2017

Conversation

Projects
None yet
2 participants
@xiexingguo
Copy link
Member

xiexingguo commented Jul 12, 2017

No description provided.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch 3 times, most recently from db893ca to 0ea8d62 Jul 12, 2017

if (!(r.second != CRUSH_ITEM_NONE &&
r.second < max_osd &&
osd_weight[r.second] == 0)) {
i = r.second;

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 12, 2017

Member

it looks like the [1,2],[2,1] applied here would change both 1 and 2 to 1 because the original 1 would be updated twice. also, we can rearrange primaries already with the primary affinity...

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 12, 2017

Author Member

it looks like the [1,2],[2,1] applied here would change both 1 and 2 to 1 because the original 1 would be updated twice

negative

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 12, 2017

Author Member

There is a follow-up break below

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 12, 2017

Member

@idryomov one more upmap semantic change!

find_roots(roots);
for (auto& p: roots) {
if (shadow_item(p)) {
roots.erase(p);

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 12, 2017

Member

this will invalidate the iterator inside the loop, right?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 12, 2017

Author Member

Fixed!

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch from 0ea8d62 to e959bc3 Jul 12, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 12, 2017

we can rearrange primaries already with the primary affinity

But this is the certain (and easy) way to do this. Besides I think we don't have to perform the limitations here, we could instead do the necessary validations on the API.

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch from e959bc3 to 181e710 Jul 12, 2017

@liewegas liewegas added the needs-qa label Jul 12, 2017

@@ -447,7 +447,8 @@ COMMAND("osd dump " \
"print summary of OSD map", "osd", "r", "cli,rest")
COMMAND("osd tree " \
"name=epoch,type=CephInt,range=0,req=false " \
"name=states,type=CephChoices,strings=up|down|in|out,n=N,req=false", \
"name=states,type=CephChoices,strings=up|down|in|out,n=N,req=false " \
"name=shadow,type=CephChoices,strings=--show-shadow,req=false", \

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 12, 2017

Member

Can you add a qa/workunits/cephtool/test.sh case that verifies this works? (includes ~class with flag, doesn't include it without?) thanks!

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 13, 2017

Member

Actually, thinking about this a bit more, I think we should leave the 'osd tree' commadn alone, and instead add this informaiton to the 'osd crush tree' command (which is also going to show the weight-set values for each pool)

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 13, 2017

Author Member

okay, will do

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 13, 2017

Author Member

It is already in the crush tree...

./bin/ceph osd crush tree
[
    {
        "id": -4,
        "device_class": "hdd",
        "name": "default~hdd",
        "type": "root",
        "type_id": 10,
        "items": [
            {
                "id": -3,
                "device_class": "hdd",
                "name": "gitbuilder-ceph-rpm-centos7-amd64-basic~hdd",
                "type": "host",
                "type_id": 1,
                "items": [
                    {
                        "id": 0,
                        "device_class": "hdd",
                        "name": "osd.0",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 1.000000,
                        "depth": 2
                    }
                ]
            }
        ]
    },

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 13, 2017

Author Member

@liewegas Can we keep this change? This is not the default behaviour of "osd tree" after all.
IMHO, the "crush tree" output is not very intuitive...

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch from 181e710 to e92271c Jul 12, 2017

# make sure we reached luminous to enable crush class feature
ceph osd set-require-min-compat-client luminous
ceph osd crush class create hdd
ceph osd crush set-device-class osd.0 hdd

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 12, 2017

Author Member

@liewegas Add the test cases here but do not test this together with #16014, in which
I changed the set-device-class cmd slightly to support multiple osd-ids.
I'll update these correspondingly once #16014 gets merged.

@liewegas liewegas added this to the luminous milestone Jul 12, 2017


peering_wq.clear();
osd_lock.Lock(); // make locker happy
return r;

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 12, 2017

Member

On second thought, this is a bit excessive.. and is going to immediately break the next time we change any part of the shutdown procedure. Is it possible to factor this into a helper shared with shutdown(), or should we instead just exit(0) or assert out?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 12, 2017

Author Member

Print an error message and then exit(0) would be good enough.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jul 13, 2017

xiexingguo added some commits Jul 12, 2017

mon/OSDMonitor: another dedup case for pg-upmap-items
./bin/ceph osd pg-upmap-items 1.0 1 2 1 2
osd.1 -> osd.2 already exists, set 1.0 pg_upmap_items mapping to [1->2]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
osd/OSDMap: allow bidirectional swap of pg-upmap-items
This is useful when we also want an even distribution of pg primaries across osds.
For example:
Was:
[0 1 2]

By applying bidirectional swap of pg-upmap-items mapping [[0,1],[1,0]], now:
[1 0 2]

Thus we successfully decrease the number of primaries of osd.0 by 1 without
affecting the current (even) distribution of global pgs.

Real exmaple:
./bin/ceph pg ls-by-pool rbd
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP      UP_PRIMARY ACTING  ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
3.0           0                  0        0         0       0     0   0        0 active+clean 2017-07-12 15:14:45.083441     0'0    29:13 [0,1,3]          0 [0,1,3]              0        0'0 2017-07-12 15:14:14.515989             0'0 2017-07-12 15:14:14.515989

./bin/ceph osd pg-upmap-items 3.0 0 1 1 0 3 5
set 3.0 pg_upmap_items mapping to [0->1,1->0,3->5]

./bin/ceph pg ls-by-pool rbd
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP      UP_PRIMARY ACTING  ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
3.0           0                  0        0         0       0     0   0        0 active+clean 2017-07-12 15:16:22.648424     0'0    33:13 [1,0,5]          1 [1,0,5]              1        0'0 2017-07-12 15:14:14.515989             0'0 2017-07-12 15:14:14.515989

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch 2 times, most recently from 55cec94 to 97f029a Jul 15, 2017

xiexingguo added some commits Jul 11, 2017

osd/OSDMap: kill dead structure "struct qi"
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/MonCommand: drop unnecessary write permission
since "log last" does not ask for it.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch from 095fcfd to eb1e64c Jul 15, 2017

@xiexingguo

This comment has been minimized.

Copy link
Member Author

xiexingguo commented Jul 15, 2017

i'm fixing 'osd crush tree' to have a plaintext output right now (and also show the weight sets). let's set this patch aside until that is ready adn then decide

Moved that change into a separate PR #16354, so we can continue to test this one. @liewegas Can you recheck( as I append some more trivial fixes)

@@ -2629,8 +2636,7 @@ int OSD::init()

return 0;
monout:
mgrc.shutdown();
monc->shutdown();
exit(0);

This comment has been minimized.

Copy link
@liewegas

liewegas Jul 17, 2017

Member

exit(1)?

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 17, 2017

Author Member

@liewegas Okay, done

@xiexingguo xiexingguo force-pushed the xiexingguo:wip-osd-segfault branch from eb1e64c to 963c1cd Jul 17, 2017

xiexingguo added some commits Jul 13, 2017

osd/OSD: gracefully shutdown on error exit during init
This can avoid crashes as below:

  0> 2017-07-12 09:34:47.427438 7f320ce61b80 -1 /home/xxg/build/ceph-dev/src/common/HeartbeatMap.cc: In function 'ceph::HeartbeatMap::~HeartbeatMap()'
thread 7f320ce61b80 time 2017-07-12 09:34:47.422986
/home/xxg/build/ceph-dev/src/common/HeartbeatMap.cc: 39: FAILED assert(m_workers.empty())

 ceph version 12.1.0-702-gc5b99af (c5b99af) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f320d8ba7f0]
 2: (ceph::HeartbeatMap::~HeartbeatMap()+0xf8) [0x7f320d9be0a8]
 3: (CephContext::~CephContext()+0x40c) [0x7f320d9a648c]
 4: (CephContext::put()+0xe6) [0x7f320d9a6776]
 5: (main()+0xad3) [0x7f320d282953]
 6: (__libc_start_main()+0xf5) [0x7f32094cfb15]
 7: (()+0x4964c9) [0x7f320d31f4c9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/OSDMonitor: cleanup last_osd_report if osd does not exist
In case we might want to reuse the same slot later.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
osd/OSD: filter out deprecated meta for bluestore
Journal path is filestore related...

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/OSDMonitor: drop unnecessary write permission for "crush get-tuna…
…ble" command

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mon/OSDMonitor: fix "ceph osd pool get rbd all --format=json-pretty"
Two problems:
(1) MIN_WRITE_RECENCY_FOR_PROMOTE is a tier-only option.
(2) should automatically filter out unset pool options, otherwise it will
    keep outputing rubbish:

{
    "pool": "rbd",
    "pool_id": 3,
    "min_write_recency_for_promote": 0
}
{
    "pool": "rbd",
    "pool_id": 3,
    "fast_read": 0
}
{
    "pool": "rbd",
    "pool_id": 3
}
{
    "pool": "rbd",
    "pool_id": 3
}
{
    "pool": "rbd",
    "pool_id": 3
}
{
    "pool": "rbd",
    "pool_id": 3,
    "csum_type": "???"
}

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

@liewegas liewegas merged commit e9820d6 into ceph:master Jul 18, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@xiexingguo xiexingguo deleted the xiexingguo:wip-osd-segfault branch Jul 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.