Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking #11679

Merged
4 commits merged into from Nov 7, 2016
Merged

jewel: mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking #11679

4 commits merged into from Nov 7, 2016

Conversation

ghost
Copy link

@ghost ghost commented Oct 28, 2016

@ghost ghost self-assigned this Oct 28, 2016
@ghost ghost added this to the jewel milestone Oct 28, 2016
@ghost ghost added bug-fix core labels Oct 28, 2016
@@ -3061,6 +3061,25 @@ void OSDMonitor::get_health(list<pair<health_status_t,string> >& summary,
}
}

// warn about upgrade flags that can be set but are not.
if ((osdmap.get_up_osd_features() & CEPH_FEATURE_SERVER_KRAKEN) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove the code referencing KRAKEN.

@ghost
Copy link
Author

ghost commented Oct 28, 2016

@tchaikov removed reference to kraken in mon/OSDMonitor: health warn if require_{jewel,kraken} flags aren't set and mon/OSDMonitor: encode canonical full osdmap based on osdmap flags

@ghost ghost assigned tchaikov Oct 28, 2016
@ghost
Copy link
Author

ghost commented Oct 28, 2016

upgrade/jewel-x

teuthology-suite -k distro --verbose --suite upgrade/jewel-x --suite-branch jewel --ceph wip-17734-jewel --machine-type vps --priority 101 machine_types/vps.yaml

rados

teuthology-suite -k distro --priority 101 --suite rados --subset $(expr $RANDOM % 2000)/2000 --suite-branch jewel --email loic@dachary.org --ceph wip-17734-jewel --machine-type smithi

wip-17734-jewel ceph branch & jewel ceph-qa-suite branch

wip-17734-jewel ceph branch & wip-17734-jewel ceph-qa-suite branch

jewel branch

upgrade/hammer-x

teuthology-suite -k distro --verbose --suite upgrade/hammer-x --suite-branch jewel --ceph wip-17734-jewel --machine-type vps --priority 101 machine_types/vps.yaml

wip-17734-jewel ceph branch & jewel ceph-qa-suite branch

wip-17734-jewel ceph branch & wip-17734-jewel ceph-qa-suite branch

jewel branch

@ghost
Copy link
Author

ghost commented Oct 31, 2016

I'm concerned by http://pulpito.ceph.com/loic-2016-10-31_09:24:48-rados-wip-17734-jewel-distro-basic-smithi/505937/ which looks like another form of http://tracker.ceph.com/issues/17728#note-6 . The rest of the failures above are either fixed or known.

liewegas and others added 4 commits November 4, 2016 15:35
If the JEWEL or KRAKEN flags aren't set, encode the full map without
those features.  This ensure that older OSDs in the cluster will be able
to correctly encode the full map with a matching CRC.  At least, that is
true as long as the encoding changes are guarded by those feature bits.
That appears to be true currently, and we plan to ensure that it is true
in the future as well.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 5e0daf6)

Conflicts:
   src/mon/OSDMonitor.cc: removed reference to kraken

    if (!tmp.test_flag(CEPH_OSDMAP_REQUIRE_KRAKEN)) {
      dout(10) << __func__ << " encoding without feature SERVER_KRAKEN" << dendl;
      features &= ~CEPH_FEATURE_SERVER_KRAKEN;
    }
We want to prompt users to set these flags as soon as their
upgrades complete.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 12e5083)

Conflicts:
   src/mon/OSDMonitor.cc: remove references to kraken

    if ((osdmap.get_up_osd_features() & CEPH_FEATURE_SERVER_KRAKEN) &&
	!osdmap.test_flag(CEPH_OSDMAP_REQUIRE_KRAKEN)) {
      string msg = "all OSDs are running kraken or later but the"
	" 'require_kraken_osds' osdmap flag is not set";
      summary.push_back(make_pair(HEALTH_WARN, msg));
      if (detail) {
	detail->push_back(make_pair(HEALTH_WARN, msg));
      }
    } else
The Incremental encode stashes encode_features, which is
what we use later to reencode the updated OSDMap.  Use
the same features so that the encoding will match!

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 916ca6a)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 83ffc2b)
@ghost
Copy link
Author

ghost commented Nov 7, 2016

http://pulpito.ceph.com/loic-2016-11-07_06:19:54-rados-wip-17734-jewel-distro-basic-smithi/ has a clean run. It still is racy but no more than master and that shows the commits in this pull request do the right thing. http://tracker.ceph.com/issues/17808 was opened for this.

@ghost
Copy link
Author

ghost commented Nov 7, 2016

@tchaikov although some upgrade tests are still missing because VPS run have had issues since last thursday, I think there is enough proof that these commits do the right thing and that we're in no danger of a regression. What say you ?

@tchaikov
Copy link
Contributor

tchaikov commented Nov 7, 2016

@dachary cool, let's merge it!

@ghost ghost merged commit 0c38c46 into ceph:jewel Nov 7, 2016
@smithfarm
Copy link
Contributor

It appears that, as of 10.2.4, cluster admins will have to do "ceph osd set require_jewel_osds", otherwise the MONs will complain: "all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set". Does this deserve a mention in the 10.2.4 release notes?

@smithfarm
Copy link
Contributor

(answering my own question based on clarification provided by @liewegas and @athanatos on IRC)

When the last hammer OSD in a cluster containing jewel MONs is upgraded to jewel, as of 10.2.4 the jewel MONs will issue this warning: "all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set" and change the cluster health status to HEALTH_WARN.

This is a signal for the admin to do "ceph osd set require_jewel_osds" - by doing this, the admin acknowledges that there is no downgrade path.

(I propose that we add this text, or one like it, to the 10.2.4 release notes.)

@liewegas
Copy link
Member

liewegas commented Nov 10, 2016 via email

@theanalyst theanalyst changed the title jewel: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking "jewel: mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking" Nov 17, 2016
@theanalyst theanalyst changed the title "jewel: mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking" jewel: mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking Nov 17, 2016
asheplyakov pushed a commit to asheplyakov/pkg-ceph that referenced this pull request Dec 5, 2016
* Upgrading 0.94.6 -> 0.94.9 saturating mon node networking,
  http://tracker.ceph.com/issues/17694
  ceph/ceph#11679
  patches:
  - mon-OSDMonitor-encode-canonical-full-osdmap-based-on.patch
  - mon-OSDMonitor-health-warn-if-require_-jewel-kraken-.patch
  - mon-OSDMonitor-encode-OSDMap-Incremental-with-same-f.patch
  - messages-MForward-fix-encoding-features.patch
  - messages-MForward-reencode-forwarded-message-if-targ.patch
  - msg-Message-fix-set_middle-vs-throttler.patch
  - msg-adjust-byte_throttler-from-Message-encode.patch
  - all-add-const-to-operator-param.patch
* mon: health does not report pgs stuck in more than one state,
  http://tracker.ceph.com/issues/17601
  ceph/ceph#11660
  patches:
  - mon-PGMap-PGs-can-be-stuck-more-than-one-thing.patch
asheplyakov pushed a commit to asheplyakov/pkg-ceph that referenced this pull request Dec 5, 2016
* Upgrading 0.94.6 -> 0.94.9 saturating mon node networking,
  http://tracker.ceph.com/issues/17694
  ceph/ceph#11679
  patches:
  - mon-OSDMonitor-encode-canonical-full-osdmap-based-on.patch
  - mon-OSDMonitor-health-warn-if-require_-jewel-kraken-.patch
  - mon-OSDMonitor-encode-OSDMap-Incremental-with-same-f.patch
  - messages-MForward-fix-encoding-features.patch
  - messages-MForward-reencode-forwarded-message-if-targ.patch
  - msg-Message-fix-set_middle-vs-throttler.patch
  - msg-adjust-byte_throttler-from-Message-encode.patch
  - all-add-const-to-operator-param.patch
* mon: health does not report pgs stuck in more than one state,
  http://tracker.ceph.com/issues/17601
  ceph/ceph#11660
  patches:
  - mon-PGMap-PGs-can-be-stuck-more-than-one-thing.patch

(cherry picked from commit f871303)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants