Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: pybind/mgr/pg_autoscaler: noautoscale flag retains individual pool configs #53677

Merged
merged 2 commits into from Oct 5, 2023

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Sep 26, 2023

Problem:

The pg_autoscaler noautoscale flag doesn't retain individual pool states of autoscale mode. For example turn the flag ON and then OFF again all the pools will have autoscale mode on which is inconvenience for the user because sometimes the user just want to temporary disable the autoscaler on all pools and will enable it back after a period of time while retaining individual pool states of autoscale mode

Solution:

We store noautoscale flag in the OSDMAP such that it is persistent. We then get rid of noautoscale MODULE OPTION in the pg_autoscaler module since we do not need it anymore. Everytime we set, unset or get the flag we rely on looking up the OSDMAP, we did this because we want to avoid inconsistancy between the noautoscale flag. This is because noautoscale flag can easily be set by doing ceph osd set noautoscale.

Fixes: https://tracker.ceph.com/issues/61922

Backporting relevant commits from main PRs:

#52442

Backport tracker: https://tracker.ceph.com/issues/62978

Signed-off-by: Kamoltat ksirivad@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@kamoltat kamoltat added this to the quincy milestone Sep 26, 2023
@kamoltat kamoltat requested a review from a team as a code owner September 26, 2023 15:51
@kamoltat kamoltat self-assigned this Sep 26, 2023
@kamoltat
Copy link
Member Author

kamoltat commented Oct 2, 2023

make check failed because:

mypy: commands[0] /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr> mypy --config-file=../../mypy.ini -m alerts -m balancer -m cephadm -m crash -m dashboard -m devicehealth -m diskprediction_local -m hello -m influx -m iostat -m localpool -m mds_autoscaler -m mgr_module -m mgr_util -m mirroring -m nfs -m orchestrator -m pg_autoscaler -m progress -m prometheus -m rbd_support -m rgw -m rook -m snap_schedule -m selftest -m stats -m status -m telegraf -m telemetry -m test_orchestrator -m volumes -m zabbix
pg_autoscaler/__init__.py:6: note: In module imported here:
pg_autoscaler/module.py: note: In member "_maybe_adjust" of class "PgAutoscaler":
pg_autoscaler/module.py:682: error: "PgAutoscaler" has no attribute "noautoscale"; maybe "set_noautoscale", "get_noautoscale", or "unset_noautoscale"?
Found 1 error in 1 file (checked 32 source files)

…nfigs

Problem:

The pg_autoscaler `noautoscale flag` doesn't retain individual pool states of
`autoscale mode`. For example turn the flag `ON` and then `OFF` again all
the pools will have `autoscale mode on` which is inconvenience for the user
because sometimes the user just want to temporary disable the autoscaler on
all pools and will enable it back after a period of time while retaining
individual pool states of `autoscale mode`

Solution:

We store noautoscale flag in the OSDMAP such that it is
persistent. We then get rid of noautoscale MODULE OPTION
in the pg_autoscaler module since we do not need it anymore.
Everytime we set, unset or get the flag we rely on looking up
the OSDMAP, we did this because we want to avoid inconsistancy
between the `noautoscale flag`. This is because `noautoscale flag`
can easily be set by doing `ceph osd set noautoscale`.

Fixes: https://tracker.ceph.com/issues/61922

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 0bfdf63)

Conflicts:
	src/pybind/mgr/pg_autoscaler/module.py - trivial fix
@kamoltat kamoltat force-pushed the wip-ksirivad-quincy-backport-52442 branch from 3955e9c to 2371e19 Compare October 2, 2023 15:13
@kamoltat
Copy link
Member Author

kamoltat commented Oct 2, 2023

Pushed new change, should be good now!

@sseshasa
Copy link
Contributor

sseshasa commented Oct 3, 2023

Teuthology Test Result:

@kamoltat There's a related failure shown below:
https://pulpito.ceph.com/yuriw-2023-10-02_19:19:00-rados-wip-yuri4-testing-2023-10-02-0826-quincy-distro-default-smithi/7408689

Please take a look.

@kamoltat
Copy link
Member Author

kamoltat commented Oct 3, 2023

My mistake, I didn't cherry-pick all the commits ...

modified:

`qa/workunits/mon/test_noautoscale_flag.sh`
`qa/workunits/cephtool/test.sh`

adding test coverage to files mentioned above

Fixes: https://tracker.ceph.com/issues/61922

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 0972dbf)
@github-actions github-actions bot added the tests label Oct 3, 2023
@kamoltat
Copy link
Member Author

kamoltat commented Oct 3, 2023

Pushed the rest of the commit ...

@sseshasa
Copy link
Contributor

sseshasa commented Oct 5, 2023

Teuthology Test Result:
https://pulpito.ceph.com/yuriw-2023-10-02_19:19:00-rados-wip-yuri4-testing-2023-10-02-0826-quincy-distro-default-smithi/

Re-Run of autoscale-flag test After including missed commits:
https://pulpito.ceph.com/yuriw-2023-10-04_20:41:04-rados:singleton:all-wip-yuri4-testing-2023-10-02-0826-quincy-distro-default-smithi/

Failures, unrelated:

  1. https://tracker.ceph.com/issues/58560
  2. https://tracker.ceph.com/issues/61786
  3. https://tracker.ceph.com/issues/58476

Details:

  1. test_envlibrados_for_rocksdb.sh failed to subscribe to repo
  2. test_dashboard_e2e.sh: Can't run because no spec files were found; couldn't determine Mocha version
  3. test_non_existent_cluster: cluster does not exist

@yuriw yuriw merged commit 6ac08c7 into ceph:quincy Oct 5, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants