Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: skip first non-zero incremental in PGMap::apply_incremental() #18347

Merged
merged 1 commit into from Nov 1, 2017

Conversation

agutikov
Copy link

After initialization of PGMap instance PGMap::stamp is zero
and this cause huge first delta.
Also after mgr restart first non-zero value coming to PGMap::apply_incremental()
is current pg_sum value so it produces unreasonably huge pg_sum_delta.
This patch introduces a workaround to save pg_sum and not update pg_sum_delta
by first non-zero incremental.

Signed-off-by: Aleksei Gutikov aleksey.gutikov@synesis.ru
Fixes: http://tracker.ceph.com/issues/21773

@jcsp
Copy link
Contributor

jcsp commented Oct 26, 2017

retest this please (jenkins)

@jcsp
Copy link
Contributor

jcsp commented Oct 26, 2017

Looks sane at a glance but I'd like a second opinion (@tchaikov)

@xiexingguo
Copy link
Member

This patch is very welcome, as we are probably suffering the same pain.

src/mon/PGMap.cc Outdated
pg_sum_delta.stats.add(d.stats);
// if mgr restarts first update of pg_sum is previous pg_sum
// so skip this huge value
if (!(pg_sum_old.stats.sum == object_stat_sum_t())) {
Copy link
Contributor

@tchaikov tchaikov Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agutikov instead, i think it'd be simpler if we just skip this interval if stamp.is_zero() or pg_sum_old.stats.sum.is_zero(). when a ceph-mgr boots, the first interval's stamp is zero, and the so is the pg_sum_old.stats.sum. but the pg stats takes time to pupulate, so the next interval is very likely have a non-zero time stamp, but its pg_sum_old.stats.sum is still zero.

in other words, i'd suggest

  1. check both stamp and pg_sum_old.stats.sum at the same place
  2. move the delta_t calculation down to the if block,
  3. and update the stamp after checking it

i.e.

  if (!stamp.is_zero() && !pg_sum_old.stats.sum.is_zero()) {
    utime_t delta_t;
    delta_t = inc.stamp;
    delta_t -= stamp;
    // calculate a delta, and average over the last 2 deltas.
    pool_stat_t d = pg_sum;
    d.stats.sub(pg_sum_old.stats);
    pg_sum_deltas.push_back(make_pair(d, delta_t));
    stamp_delta += delta_t;
    pg_sum_delta.stats.add(d.stats);
    unsigned smooth_intervals = (unsigned)MAX(1, cct ? cct->_conf->mon_stat_smooth_intervals : 1);
    if (pg_sum_deltas.size() > smooth_intervals) {
      pg_sum_delta.stats.sub(pg_sum_deltas.front().first.stats);
      stamp_delta -= pg_sum_deltas.front().second;
      pg_sum_deltas.pop_front();
    }
  }
  stamp = inc.stamp;

what do you think?

@agutikov
Copy link
Author

@tchaikov
Ok, I agree it's simpler. Seems working after changes.

@tchaikov
Copy link
Contributor

probably will need rebase and resolve the conflict after #18647

After initialization of PGMap instance PGMap::stamp is zero
and this cause huge first delta.
Also after mgr restart first non-zero value coming to PGMap::apply_incremental()
is current pg_sum value so it produces unreasonably huge pg_sum_delta.
This patch introduces a workaround to save pg_sum and not update pg_sum_delta
by first non-zero incremental.

Signed-off-by: Aleksei Gutikov <aleksey.gutikov@synesis.ru>
Fixes: http://tracker.ceph.com/issues/21773
@ceph-jenkins
Copy link
Collaborator

all commits in this PR are signed

@ceph-jenkins
Copy link
Collaborator

submodules for project are unmodified

@ceph-jenkins
Copy link
Collaborator

OK - docs built

@ceph-jenkins
Copy link
Collaborator

make check succeeded

1 similar comment
@ceph-jenkins
Copy link
Collaborator

make check succeeded

@liewegas liewegas merged commit 5ba89bf into ceph:master Nov 1, 2017
@liewegas
Copy link
Member

liewegas commented Nov 1, 2017

Thanks!

@agutikov agutikov deleted the fix-21773 branch December 13, 2017 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants