Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/autoscaler: Introduce noautoscale flag #43716

Merged
merged 3 commits into from Jan 4, 2022

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Oct 28, 2021

noautoscale flag is a feature where the
user can choose to flip the switch between
turning autoscale on and off for all
pools with a single command.

osd pool set noautoscale will turn all
autoscale modeoff for all pools.

osd pool unset noautoscale will turn all
autoscale mode on for all pools.

Address: https://tracker.ceph.com/issues/51213

Signed-off-by: Kamoltat ksirivad@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch 2 times, most recently from 3da3c90 to 97bcb47 Compare November 1, 2021 20:41
Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamoltat I noticed that if we run ceph osd pool set noautoscale and then create a new pool, the new pool has AUTOSCALE on. I don't think this is the behavior we want. Can you please add a test case to cover this and other scenarios?

@kamoltat
Copy link
Member Author

kamoltat commented Nov 5, 2021

@kamoltat I noticed that if we run ceph osd pool set noautoscale and then create a new pool, the new pool has AUTOSCALE on. I don't think this is the behavior we want. Can you please add a test case to cover this and other scenarios?

@neha-ojha Agreed, will make the changes so that it also newly created pools will have AUTOSCALE off. Will create test case that covers this scenarios as well.

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch from 97bcb47 to 9267781 Compare November 9, 2021 04:57
@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch from 9267781 to 243e2ed Compare November 9, 2021 14:28
@kamoltat
Copy link
Member Author

kamoltat commented Nov 9, 2021

2021-11-09T20:08:42.170 INFO:tasks.workunit.client.0.smithi149.stderr:+ ceph osd pool set noautoscale
2021-11-09T20:08:42.537 INFO:tasks.workunit.client.0.smithi149.stderr:noautoscale is already set!
2021-11-09T20:08:42.546 INFO:tasks.workunit.client.0.smithi149.stderr:+ ceph osd pool create a
2021-11-09T20:08:43.544 INFO:tasks.workunit.client.0.smithi149.stderr:pool 'a' already exists
2021-11-09T20:08:43.557 INFO:tasks.workunit.client.0.smithi149.stderr:+ sleep 2
2021-11-09T20:08:45.561 INFO:tasks.workunit.client.0.smithi149.stderr:++ ceph osd pool autoscale-status
2021-11-09T20:08:45.562 INFO:tasks.workunit.client.0.smithi149.stderr:++ grep -oe off
2021-11-09T20:08:45.562 INFO:tasks.workunit.client.0.smithi149.stderr:++ wc -l
2021-11-09T20:08:45.950 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-11-09T20:08:45.951 INFO:tasks.workunit.client.0.smithi149.stderr:+ RESULT1=1
2021-11-09T20:08:45.951 INFO:tasks.workunit.client.0.smithi149.stderr:+ test 1 -eq 2
2021-11-09T20:08:45.952 INFO:tasks.workunit:Stopping ['mon/test_noautoscale_flag.sh'] on client.0...
2021-11-09T20:08:45.952 DEBUG:teuthology.orchestra.run.smithi149:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2021-11-09T20:08:46.206 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch 10 times, most recently from 73c9d02 to f4e73c5 Compare November 12, 2021 15:11
@neha-ojha
Copy link
Member

@kamoltat could you please put the test changes in a different commit and provide a link to a test run that is executing the new test?

@neha-ojha
Copy link
Member

@kamoltat could you please put the test changes in a different commit and provide a link to a test run that is executing the new test?

@kamoltat ping, let's try to get it merged before quincy dev freeze

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamoltat code changes look good, could you update https://docs.ceph.com/en/latest/rados/operations/placement-groups/ to reflect the new flag and make a note about this in the release notes

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch 5 times, most recently from 4721bfd to a58c925 Compare December 15, 2021 06:38
@kamoltat
Copy link
Member Author

kamoltat commented Dec 17, 2021

Work unit passed 1/1

However, I had to comment out the qa/tasks configuration where it sets osd_pool_default_pg_autoscale_mode to off. Somehow this messes up the stored value in the monitor, even if we do ceph config set global osd_pool_default_pg_autoscale_mode on, it still doesn't really set the value to on

Working on a way around this since we don't want the qa/workunit to have autoscale on by default

https://github.com/ceph/ceph/pull/43716/files#diff-dbf878757438981fb23ce86d3b84d3d4f1fa7d3ee627b5476b1a4edb8afda7c2R20

https://pulpito.ceph.com/ksirivad-2021-12-16_22:08:49-rados:singleton:all:test-noautoscale-flag.yaml-wip-ksirivad-autoscale-global-flag-distro-basic-smithi/

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch 2 times, most recently from 91292a2 to 37c5fe9 Compare December 21, 2021 19:11
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch 2 times, most recently from 8f93b15 to 2ce1aab Compare December 22, 2021 20:08
`noautoscale` flag is a feature where the
user can choose to flip the switch between
turning autoscale `on` and `off` for all
pools with a single command.

`osd pool set noautoscale` will turn all
autoscale mode`off` for all pools.

`osd pool unset noautoscale` will turn all
autoscale mode `on` for all pools.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-autoscale-global-flag branch from 2ce1aab to 24c9262 Compare December 22, 2021 21:37
set and unset the noautoscale flag,
evaluate if the results are what
we expected. As well as, evaluate
if the flag is correct when we
create new pools.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
Updated the docs in
https://docs.ceph.com/en/latest/rados/operations/placement-groups/
and updated the release notes to reflect noautoscale flag.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat
Copy link
Member Author

jenkins test make check

@ljflores
Copy link
Contributor

ljflores commented Jan 4, 2022

http://pulpito.front.sepia.ceph.com/yuriw-2021-12-23_16:50:03-rados-wip-yuri6-testing-2021-12-22-1410-distro-default-smithi/

Failures, unrelated:
https://tracker.ceph.com/issues/53499
https://tracker.ceph.com/issues/52124
https://tracker.ceph.com/issues/52652
https://tracker.ceph.com/issues/53422
https://tracker.ceph.com/issues/51945
https://tracker.ceph.com/issues/53424
https://tracker.ceph.com/issues/53394
https://tracker.ceph.com/issues/53766
https://tracker.ceph.com/issues/53767

Details:
Bug_#53499: test_dashboard_e2e.sh Failure: orchestrator/02-hosts-inventory.e2e failed. - Ceph - Mgr - Dashboard
Bug_#52124: Invalid read of size 8 in handle_recovery_delete() - Ceph - RADOS
Bug_#52652: ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest) - Ceph - Mgr
Bug_#53422: tasks.cephfs.test_nfs.TestNFS.test_export_create_with_non_existing_fsname: AssertionError: NFS Ganesha cluster deployment failed - Ceph - Orchestrator
Bug_#51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0 - Ceph - RADOS
Bug_#53424: CEPHADM_DAEMON_PLACE_FAIL in orch:cephadm/mgr-nfs-upgrade/ - Ceph - Orchestrator
Bug_#53394: cephadm: can infer config from mon from different cluster causing file not found error - Ceph - Orchestrator
Bug_#53766: ceph orch ls: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found - Ceph - Orchestrator
Bug_#53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout - Ceph - RADOS

@yuriw yuriw merged commit 3f21194 into ceph:master Jan 4, 2022
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants