New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd,mon: misc full fixes and cleanups #13968

Merged
merged 24 commits into from Apr 17, 2017

Conversation

Projects
None yet
5 participants
@dzafman
Member

dzafman commented Mar 15, 2017

Checkpoint replicated out of space handling seems to work

@dzafman

This comment has been minimized.

Member

dzafman commented Mar 31, 2017

@jcsp @jtlayton @liewegas 1e5cb49 changes the way the OSD handles MDS writes after getting to failsafe full. It drops those writes because we can't allow the OSD to become unusable because it can't replay its journal. By that time full had already been passed and should have caused MDS to stop certain kinds of writes. The OSD continues to allow MDS to write between full and failsafe full so that it can perform removes.

const char *get_full_state_name(s_names s) {
switch (s) {
case NONE: return "none";
case NEARFULL: return "nearfull";
case BACKFILL: return "backfill_full";

This comment has been minimized.

@liewegas

liewegas Mar 31, 2017

Member

maybe BACKFILLFULL just for consistency?

This comment has been minimized.

@dzafman

dzafman Mar 31, 2017

Member

I'll change all variables, config values to use "backfillfull" in all cases, for example: new_backfillfull_ratio, backfillfull_ratio, CEPH_OSD_BACKFILLFULL, mon_osd_backfillfull_ratio...

@@ -116,6 +116,7 @@ struct ceph_eversion {
#define CEPH_OSD_NEW (1<<3) /* osd is new, never marked in */
#define CEPH_OSD_FULL (1<<4) /* osd is at or above full threshold */
#define CEPH_OSD_NEARFULL (1<<5) /* osd is at or above nearfull threshold */
#define CEPH_OSD_BACKFILL_FULL (1<<6) /* osd is at or above backfill_full threshold */

This comment has been minimized.

@liewegas

liewegas Mar 31, 2017

Member

do we need this state flag in teh osdmap? if we're only looking at it locally i think we don't?

This comment has been minimized.

@dzafman

dzafman Mar 31, 2017

Member

This is used for to get the health warning and to notify the mon about the local OSDs state.

This comment has been minimized.

@dzafman

dzafman Apr 14, 2017

Member

Now I understand you meant having a backfill CEPH_OSDMAP flag.

We don't need a CEPH_OSDMAP_BACKFILLFULL because there is no client behavior change if at least 1 node is past backfillfull ratio. Also, setting that as a global flag wouldn't mean anything.

@@ -154,6 +154,7 @@ void OSDMonitor::create_initial()
newmap.set_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS);
newmap.full_ratio = g_conf->mon_osd_full_ratio;
newmap.backfill_full_ratio = g_conf->mon_osd_backfill_full_ratio;

This comment has been minimized.

@liewegas

liewegas Mar 31, 2017

Member

mainly i'm hoping to get this in OSDMap along with the mon commands so that it is easier to manage fullness settings across the cluster

This comment has been minimized.

@dzafman

dzafman Mar 31, 2017

Member

I'm just copying the initialization of these values for what I assume is the first boot.

@liewegas

This comment has been minimized.

Member

liewegas commented Mar 31, 2017

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 3, 2017

The 1e5cb49 change to the MDS handling on the OSD looks good to me.

<< ", full_ratio " << full_ratio
<< ", failsafe_ratio " << failsafe_ratio
<< ", new state " << get_full_state_name(new_state)
<< inject

This comment has been minimized.

@jcsp

jcsp Apr 4, 2017

Contributor

s/inject/inject.str/ (I know this is DNM I was just pulling it onto a test branch for fun)

case FULL: return "full";
case FAILSAFE: return "failsafe";
default: return "???";
}
}
enum s_names get_full_state(string type) {

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

could make this a static method, and pass type by const reference. also no need to delcare the s_names with enum s_names, unlike C, C++ take s_names as a qualified typename.

double cur_ratio; ///< current utilization
int64_t injectfull;
enum s_names injectfull_state;

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

nit, s_names would suffice

double cur_ratio; ///< current utilization
int64_t injectfull;

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

i'd recommend initialize the member variables in-class. like:

int64_t injectfull = 0;
float get_failsafe_full_ratio();
void check_full_status(const osd_stat_t &stat);
public:
bool check_failsafe_full();
bool is_nearfull();
bool _check_full(enum s_names type, ostream &ss);

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

this method should be private, i think?

@@ -471,6 +471,9 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
::encode(new_nearfull_ratio, bl);
::encode(new_full_ratio, bl);
}
if (target_v >= 4) {

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

the target_v is always greater or eq to 4 here.

enum s_names new_state;
if (ratio > failsafe_ratio) {
if (injectfull_state > NONE && injectfull) {

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

instead of injectfull_state > NONE && injectfull, maybe we should put injectfull_state > NONE && injectfull > 0? otherwise, we might want to make it an unsigned int, and use 0 as the default value when handling "injectfull` command and parsing the "injectfull" parameter.

This comment has been minimized.

@dzafman

dzafman Apr 13, 2017

Member

If you do an injectfull without a count, we inject indefinitely using a value of -1. So when non-zero we will inject, but only when > 0 do we decrement.

Mutex full_status_lock;
enum s_names { NONE, NEARFULL, FULL, FAILSAFE } cur_state; // ascending
enum s_names { INVALID = -1, NONE, NEARFULL, BACKFILLFULL, FULL, FAILSAFE } cur_state; // ascending

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

can we take this chance to make full_status_lock a mutable member variable? so we can mark the getter methods using it const?

*nearfull = 0;
for (int i = 0; i < max_osd; ++i) {
if (exists(i) && is_up(i) && is_in(i)) {
if (osd_state[i] & CEPH_OSD_FULL)
++(*full);
else if (osd_state[i] & CEPH_OSD_BACKFILLFULL)

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

are these states mutual exclusive? or we just want to ensure that the total number of full / backfull / nearfull less than the total number of osd?

This comment has been minimized.

@dzafman

dzafman Apr 12, 2017

Member

Yes, they are mutually exclusive.

@@ -2174,6 +2189,9 @@ void OSDMap::encode(bufferlist& bl, uint64_t features) const
::encode(nearfull_ratio, bl);
::encode(full_ratio, bl);
}
if (target_v >= 3) {

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

target_v is always greater or equal to 3 here.

@@ -655,6 +655,14 @@ static int update_pgmap_meta(MonitorDBStore& st)
t->put(prefix, "full_ratio", bl);
}
{
auto backfillfull_ratio = g_ceph_context->_conf->mon_osd_backfillfull_ratio;
if (backfillfull_ratio > 1.0)
backfillfull_ratio /= 100.0;

This comment has been minimized.

@tchaikov

tchaikov Apr 12, 2017

Contributor

shall we normalize the ratio in create_initial() and create_pending() if ratio > 1.0 and spit an error message? as the document reads:

The percentage of disk space used before an OSD is considered too full to backfill.

so in theory, any value greater than 1 is invalid.

This comment has been minimized.

@dzafman

dzafman Apr 12, 2017

Member

Yes, I did this like was done elsewhere. If someone puts in 80 instead .80, it works just as well.

This comment has been minimized.

@dzafman

dzafman Apr 14, 2017

Member

I added a commit a04c738 which fixes every full ratio percentage configured as a whole number.

@dzafman

This comment has been minimized.

Member

dzafman commented Apr 14, 2017

@tchaikov I believe I've address all your code review comments.

@liewegas This is ready for you to review. I just have to squash some commits before merging. I've added a commit which handles the detailed health output for full OSDs as documented. The docs are modified where I made changes to the format.

I'll see about adding a unit test and doing a rados run, Friday.

@dzafman

This comment has been minimized.

Member

dzafman commented Apr 14, 2017

@tchaikov @liewegas I found a bug in the "ceph df" MAX AVAIL output which didn't account for changes to full ratio. It just used mon_osd_full_ratio config instead of the current value set with ceph osd set-full-ratio command. Fixed in cc76412.

@liewegas liewegas changed the title from DNM: Wip 15912 followon to osd,mon: misc full fixes and cleanups Apr 14, 2017

// handles the improper setting of these values.
if (br < nr) {
ostringstream ss;
ss << "Backfill ratio (" << br << ") < nearfull ratio (" << nr << "), increased";

This comment has been minimized.

@liewegas

liewegas Apr 14, 2017

Member

can we make these errors use teh literal name? "full_ratio", "backfillfull_ratio", etc.?

@@ -3632,8 +3632,6 @@ void OSDMonitor::get_health(list<pair<health_status_t,string> >& summary,
ss << osdmap.get_flag_string(osdmap.get_flags() & warn_flags)
<< " flag(s) set";
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
if (detail)
detail->push_back(make_pair(HEALTH_WARN, ss.str()));

This comment has been minimized.

@liewegas

liewegas Apr 14, 2017

Member

i think in general we should keep these.. so that the detail summary is an exhaustive list and you don't need to try to figure out the unique set of things from the union of the summary and detail lists

This comment has been minimized.

@dzafman

dzafman Apr 14, 2017

Member

I'll remove that commit.

@liewegas

This comment has been minimized.

Member

liewegas commented Apr 14, 2017

Looks great! A couple minor nits, but otherwise lgtm

@dzafman

This comment has been minimized.

Member

dzafman commented Apr 17, 2017

dzafman added some commits Feb 28, 2017

osd: Fix log message
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Remove unused argument to clear_queued_recovery
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Increase osd_backfill_retry_interval to 30 seconds
Signed-off-by: David Zafman <dzafman@redhat.com>
common: Remove unused config option osd_recovery_threads
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Fail-safe full is a hard stop even for mds
We can't allow OSD to become non-startable even if mds
could be writing as part of file removals.

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: too_full_for_backfill() returns ostream for reason
Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman added some commits Mar 30, 2017

osd: For testing full disks add injectfull socket command
Signed-off-by: David Zafman <dzafman@redhat.com>
ceph-objectstore-tool: cleanup comment
Signed-off-by: David Zafman <dzafman@redhat.com>
common: Bump ratio for backfillfull from 85% to 90%
Signed-off-by: David Zafman <dzafman@redhat.com>
test: Switch from pg to osd for set-*-ratio commands
Testing of 6422e0a

Signed-off-by: David Zafman <dzafman@redhat.com>
test: Fix intended test flow and restore nearfull-ratio
This is inconsequential but seems to have always been wrong since original
commit 6cafb0e

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Add check_osdmap_full() to check for shard OSD fullness
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Rename backfill_request_* to recovery_request_*
To be used by both recovery and backfill

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Handle backfillfull_ratio just like nearfull and full
Add BACKFILLFULL as a local OSD cur_state
Notify monitor of this new fullness state

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Revamp injectfull op to support all full states
Use check_* for injectable full checks
Use is_* to just test simple cur_state

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Check failsafe full and crash on push/pull
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Add PG state and flag for too full for recovery
New state machine state NotRecovering
New PG state PG_STATE_RECOVERY_TOOFULL

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: Check whether any OSD is full before starting recovery
Add event RecoveryTooFull to move to NotRecovering state

Signed-off-by: David Zafman <dzafman@redhat.com>
osd: check_full_status() remove bogus comment and use equivalent comp…
…utation

We actually compute kb_used as the kb - kb_avail.  We don't have the
statfs() system call issue of non-privileged f_bavail vs f_bfree.  It
was assumed that used was really like (blocks - f_bfree).  It is not.

Signed-off-by: David Zafman <dzafman@redhat.com>
mon: Use currently configure full ratio to determine available space
This is a bug that would not adjust available space based on the
currently configured full ratio, but rather the mon_osd_full_ratio
default initial value.

Signed-off-by: David Zafman <dzafman@redhat.com>
mon: Always fix-up full ratios when specified incorrectly in config
Signed-off-by: David Zafman <dzafman@redhat.com>
mon: Issue warning or error if a full ratio out of order
The full ratios should be in this order: nearfull, backfillfull, full, failsafe full

Signed-off-by: David Zafman <dzafman@redhat.com>
mon, osd: Add detailed full information for now in the mon
Show ceph health doc output in the correct order

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman

This comment has been minimized.

Member

dzafman commented Apr 17, 2017

Rebased on master due to conflicts in src/mon/OSDMonitor.cc related to addition of mon_debug_no_require_luminous code.

@dzafman

This comment has been minimized.

Member

dzafman commented Apr 17, 2017

retest this please

test: Test health check output for full ratios
Test out of order ratios summary and details
Test various full osd conditions summary and details

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman

This comment has been minimized.

Member

dzafman commented Apr 17, 2017

@liewegas @jdurgin This is ready to merge with 3 reviewers none of whom officially marked it reviewed.

@liewegas liewegas merged commit 37e9a87 into ceph:master Apr 17, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@dzafman dzafman deleted the dzafman:wip-15912-followon branch Apr 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment