Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/progress & mgr/pg_autoscaler: Added Pg Autoscaler Event #29035

Merged

Conversation

kamoltat
Copy link
Member

Creates a progress event in progress module
when the pool need pg_num adjusting to match
target pg_num. Also made a bit of change to
the logic of trigerring pg adjusting.

Signed-off-by: Kamoltat (Junior) Sirivadhna ksirivad@redhat.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@kamoltat kamoltat changed the title mgr/progress: Added Pg Autoscaler Event mgr/progress & mgr/pg_autoscaler: Added Pg Autoscaler Event Jul 14, 2019
@batrick batrick requested a review from liewegas July 15, 2019 15:03
refs=[("pool", int(pool_id))])

else:
new_progress = p['pg_num_target']/p['pg_num_final']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this path is only taken once, when we set pg_num_target. Isn't there an update method of some sort to refresh progress (that would need to compare pg_num with pg_num_target and the initial pg_num)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sage,
So from my understanding (could be wrong) is that p['pg_num_final'] is the current pg_num in the pool and p['p_num_target'] is the target PG the pool is trying to reach. As to how it refreshes the progress, I think the function _maybe_adjust get called every 1 min as specified by the 'sleep interval', and calls the function get_osdmap() to get a new map every time and use it to call get_pools_by_name to get new information about the pools. Therefore, I feel like it is already updating at that point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pg_num_target is what ceph currently wants pg_num to be.
pg_num is the current, real pg count for the pool. The mgr will move pg_num toward pg_num_target for you... this is what should update the progress.
pg_num_final is what the pg_autoscaler module decided it wants pg_num_target to be.

If I'm reading the code correctly, the would_adjust will be true when we first make the adjustment to pg_num_target. But early in _maybe_adjust() you have

            if not p['would_adjust']:
                continue

So you want to create the progress event there, but that's not the right place to update progress.

Instead, I think you want a separate helper like _update_progress_events() or something like that. Loop over the progress event objects you have in the active list, and compare the latest pg_num to pg_num_target and the starting pg_num value. I think to do this you need to create a class like PgAdjustmentProgress that keeps the initial and target pg_num values...

self._event[pool_id] = str(uuid.uuid4())
if p['pg_num_target'] < p['pg_num_final']:
self.remote('progress', 'update', self._event[pool_id],
ev_msg=" PgAutoscaler growing PGs in pool: {0}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG autoscaler increasing pool %s PGs from %d to %d

del self._event[pool_id]
elif new_progress > 1.0:
new_progress = p['pg_num_final']/p['pg_num_target']
self.remote('progress', 'update', self._event[pool_id],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ev_id=self._event[pool_id] just so it is consistent

@@ -73,6 +72,7 @@ class PgAutoscaler(MgrModule):
def __init__(self, *args, **kwargs):
super(PgAutoscaler, self).__init__(*args, **kwargs)
self._shutdown = threading.Event()
self._event = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events within the progress module are stored in the persistent key value store of the progress module. self._event = {} is not stored permanently.

Is there a possibility this can lead to progress events without references from the autoscaler module due to MGR restart?

Copy link
Member Author

@kamoltat kamoltat Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebastian-philipp
Hi,
so the purpose of this self._event = {} is just for local references to track if the is an autoscaler event at a pool already, so we don't need to create a new one, but rather update it. But yes basically when we craete an event locally, we also create one in the progress module as well. There is an alternative strategy where we could just reference everything in the progress event so like we could check the events in the progress and see which pool it is referencing to, therefore we don't another this._event in the pg_autoscaler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebastian-philipp
Hi,
so the purpose of this self._event = {} is just for local references to track if the is an autoscaler event at a pool already, so we don't need to create a new one, but rather update it. But yes basically when we craete an event locally, we also create one in the progress module as well.

Just run a ceph mgr fail $(ceph mgr dump | jq -r '.active_name') while a pg_autoscaler event is there and unfinished. The progress module should then reload all events from the store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'd need to stash and repopulate this module's events too - could that metadata (initial_pg_num/pg_num_target) be added to the remote event and handled by the existing progress module persistence?

@@ -66,13 +75,14 @@ class PgAutoscaler(MgrModule):
MODULE_OPTIONS = [
{
'name': 'sleep_interval',
'default': str(60),
'default': str(10),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this just changed for testing?

@@ -240,9 +251,9 @@ def _get_pool_status(
self,
osdmap,
pools,
threshold=3.0,
threshold=1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for testing?

refs=[("pool", int(pool_id))])


# if p['pg_num_target'] < p['pg_num_final']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed?

src/pybind/mgr/pg_autoscaler/module.py Show resolved Hide resolved
self.remote('progress', 'update', ev._ev_id,
ev_msg=" PG autoscaler increasing pool %s PGs from %d to %d" %
(pool_id, pg_num, pg_num_target),
ev_progress=pg_num/pg_num_target,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the progress should be measured by current pg_num compared to distance between the last initial_pg_num when the module made a change, and target_pg_num

i.e. for increasing, (pg_num - initial_pg_num)/(target_pg_num - initial_pg_num)

Each time the autoscaler chose a new target, it could reset the initial_pg_num and target_pg_num for an existing event. Then the progress is measuring number of pgs changed in the latest decision, which makes sense since earlier history of shrinking/growing pg_num becomes irrelevant when the autoscaler changes things again.

Then in this function, the PgAdjustmentProgress wouldn't change, only the remote ev_progress would be updated.

@@ -73,6 +72,7 @@ class PgAutoscaler(MgrModule):
def __init__(self, *args, **kwargs):
super(PgAutoscaler, self).__init__(*args, **kwargs)
self._shutdown = threading.Event()
self._event = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'd need to stash and repopulate this module's events too - could that metadata (initial_pg_num/pg_num_target) be added to the remote event and handled by the existing progress module persistence?

@kamoltat kamoltat force-pushed the wip-mgr-progress-add-pg-auto-scaling-event branch 3 times, most recently from 502a397 to 5540892 Compare August 8, 2019 19:29
Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple minor comments

src/pybind/mgr/pg_autoscaler/module.py Outdated Show resolved Hide resolved
elif pg_num == initial_pg_num:
continue

elif pg_num_target > pg_num:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two cases only differ by one word now, could make 'increasing'/'decreasing' another variable in the message to avoid repitition

src/pybind/mgr/pg_autoscaler/module.py Show resolved Hide resolved
Creates a progress event in progress module
when the pool need pg_num adjusting to match
target pg_num. Also made a bit of change to
the logic of trigerring pg adjusting.

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
change back the threshold value and
delete white space, also added more
comments

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
make threshold value back to the orginal

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
Just getting rid of a whitespace

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
create a new class call PgAjustmentProgress
to keep track of initial pg_num and pg_num_target
also create a helper function call _update_progress_events
to update the progress module using self.remote function

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
…st initial_pg_num

changed the progress calculations to:
(pg_num - initial_pg_num)/(target_pg_num - initial_pg_num)

Also changed the threshold and sleep interval back to
default value.

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-mgr-progress-add-pg-auto-scaling-event branch from 5540892 to 39de999 Compare August 13, 2019 01:22
basically get rid of if else statement and passed in
a variable for increase/decrease instead

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-mgr-progress-add-pg-auto-scaling-event branch from 39de999 to 3163b68 Compare August 14, 2019 15:55
pools = osdmap.get_pools()
for pool_id in list(self._event):
ev = self._event[pool_id]
pool_data = pools[int(pool_id)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019-08-21T21:03:28.521+0000 7f9b3b5fe700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 175, in serve
    self._update_progress_events()
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 350, in _update_progress_events
    pool_data = pools[int(pool_id)]
KeyError: (1,)

i don't really understand how come int could return a tuple..

see http://pulpito.ceph.com/kchai-2019-08-21_12:58:47-rados-wip-kefu-testing-2019-08-21-1445-distro-basic-mira/4236725/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@tchaikov tchaikov Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019-08-29T07:52:47.255+0000 7f691c344700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 175, in serve
    self._update_progress_events()
  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 353, in _update_progress_events
    pool_data = pools[int(pool_id)]
KeyError: (1,)

i still ran into this issue even my branch contains the fix for https://tracker.ceph.com/issues/41386

http://pulpito.ceph.com/kchai-2019-08-29_03:14:53-rados-wip-kefu-testing-2019-08-27-1807-distro-basic-mira/4260378/

@tchaikov
Copy link
Contributor

tchaikov commented Sep 2, 2019

liewegas added a commit that referenced this pull request Oct 5, 2019
* refs/pull/29035/head:
	mgr/pg_autoscaler: changes made reflect jdurgin's request
	mgr/pg_autoscaler: current pg_num compared to distance between the last initial_pg_num
	mgr/progress & mgr/pg_autoscaler: changes reflect liewegas' comment
	mgr/pg_autoscaler: get rid of white space
	mgr/progress: change threshold value to origin
	mgr/progress: cleaning up for pg_autoscaler
	mgr/progress: Added Pg Autoscaler Event

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
@liewegas liewegas merged commit 3163b68 into ceph:master Oct 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants