New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: fix wrongly delete routed pgstats op #12784

Merged
merged 1 commit into from Jan 15, 2017

Conversation

Projects
None yet
3 participants
@LiumxNL
Contributor

LiumxNL commented Jan 4, 2017

a peon may forward the pgstats to leader, and record it locally, but leader will check if
osd has the latest map before process, if not, will use a route op to indicate poen to send it,
then poen will delete routed op when fininaly send out which make poen cannot send pgstatack
when leader has processed the pgstat update. so osd will always track this op until reach a
threshold block pgstats sending, at worst, reopen mon session.

Signed-off-by: Mingxin Liu mingxin@xsky.com

@LiumxNL

This comment has been minimized.

Contributor

LiumxNL commented Jan 4, 2017

@tchaikov @liewegas pls review, thanks!

@tchaikov tchaikov self-assigned this Jan 4, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 5, 2017

both leader and peon check the epoch of the pgstats. an alternative fix is to let the leader skip the "mon->osdmon()->send_latest_now_nodelete(op, stats->epoch+1);" step if the connection of the MonSession is a proxied one.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 5, 2017

@LiumxNL could you file a ticket on tracker? i think we would want to backport it.

@tchaikov tchaikov requested a review from jecluis Jan 5, 2017

@LiumxNL

This comment has been minimized.

Contributor

LiumxNL commented Jan 9, 2017

@LiumxNL

This comment has been minimized.

Contributor

LiumxNL commented Jan 9, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 9, 2017

@LiumxNL

looks good.

but you might want to revise the commit message a little bit: in addition to the symptom, we could put what change we made in this commit to fix it. for example, the title could be specific:

mon: do not send duplicated osdmap msg to not sync'ed osd

prior to this change:
a peon may forward the pgstats to leader, and record it locally, but leader will
check if osd has the latest map before process, if not, will use a route op to
indicate peon to send it, then poen will delete routed op when fininaly send
out which make peon cannot send pgstatack when leader has processed the
pgstat update. so osd will always track it util reach a threshold block pgstats
sending, at worst, reopen mon session.
also, both leader and peon will send out the osdmap message to the osd.

after this change:
only the peon will send out the osdmap message. and the pgstatack message
will be routed to the osd as expected. so the osd will not keep track of the 
"acked" pg stats in its queue forever before times out.

@tchaikov tchaikov added the needs-qa label Jan 9, 2017

@yuriw

This comment has been minimized.

Contributor

yuriw commented Jan 9, 2017

@LiumxNL @tchaikov make check failed with this PR, pls fix

[ 31%] Built target rbd_internal
/home/yuriw/wip/src/mon/PGMonitor.cc: In member function 'bool PGMonitor::preprocess_pg_stats(MonOpRequestRef)':
/home/yuriw/wip/src/mon/PGMonitor.cc:709:27: error: expected primary-expression before ')' token
!session->proxy_con))
^
Scanning dependencies of target common
[ 31%] Building C object src/CMakeFiles/common.dir/ceph_ver.c.o
[ 31%] Building CXX object src/CMakeFiles/common.dir/common/PluginRegistry.cc.o
[ 31%] Building CXX object src/CMakeFiles/common.dir/common/version.cc.o
src/mon/CMakeFiles/mon.dir/build.make:422: recipe for target 'src/mon/CMakeFiles/mon.dir/PGMonitor.cc.o' failed
make[3]: *** [src/mon/CMakeFiles/mon.dir/PGMonitor.cc.o] Error 1
CMakeFiles/Makefile2:4344: recipe for target 'src/mon/CMakeFiles/mon.dir/all' failed
make[2]: *** [src/mon/CMakeFiles/mon.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
[ 31%] Linking CXX static library ../lib/libcommon.a
[ 45%] Built target common
[ 45%] Linking CXX static library ../../lib/libosd.a
[ 46%] Built target osd
CMakeFiles/Makefile2:245: recipe for target 'CMakeFiles/check.dir/rule' failed
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
Makefile:186: recipe for target 'check' failed
make: *** [check] Error 2

mon: do not send duplicated osdmap msg to not sync'ed osd
prior to this change:
a peon may forward the pgstats to leader, and record it locally, but leader will
check if osd has the latest map before process, if not, will use a route op to
indicate peon to send it, then poen will delete routed op when fininaly send
out which make peon cannot send pgstatack when leader has processed the
pgstat update. so osd will always track it util reach a threshold block pgstats
sending, at worst, reopen mon session.
also, both leader and peon will send out the osdmap message to the osd.

after this change:
only the peon will send out the osdmap message. and the pgstatack message
will be routed to the osd as expected. so the osd will not keep track of the
"acked" pg stats in its queue forever before times out.

Fixes: http://tracker.ceph.com/issues/18458
Signed-off-by: Mingxin Liu <mingxin@xsky.com>
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 11, 2017

@tchaikov tchaikov merged commit 5c98ac8 into ceph:master Jan 15, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment