Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

Closed
insatomcat opened this issue Nov 19, 2022 · 3 comments
Closed

Comments

@insatomcat
Copy link
Contributor

Bug Report

What happened:

ceph-ansible (stable-7.0) is not able to create an rbd pool with autoscale_mode=on.
I get the error:

Error ERANGE: ''pgp_num'' must be greater than 0 and lower or equal than ''pg_num'', which in this case is 1

What you expected to happen:

The rbd pool should be created with no error in autoscaling mode.

How to reproduce it (minimal and precise):

Using :

Using clients.yml car file to create an rbd pool:

user_config: true
rbd:
  name: "rbd"
  application: "rbd"
  pg_autoscale_mode: on
  target_size_ratio: 1
pools:
  - "{{ rbd }}"

ceph-ansible fails:

2022-11-19 08:57:07,293 p=24 u=virtu n=ansible | failed: [rhel9-1] (item={'name': 'rbd', 'application': 'rbd', 'pg_autoscale_mode': True, 'target_size_ratio': 1}) => changed=false
  ansible_loop_var: item
  cmd:
  - ceph
  - -n
  - client.admin
  - -k
  - /etc/ceph/ceph.client.admin.keyring
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - rbd
  - replicated
  - --target_size_ratio
  - '1'
  - replicated_rule
  - --expected_num_objects
  - '0'
  - --autoscale-mode
  - 'on'
  delta: '0:00:01.698545'
  end: '2022-11-19 09:57:07.221218'
  item:
    application: rbd
    name: rbd
    pg_autoscale_mode: true
    target_size_ratio: 1
  rc: 2
  start: '2022-11-19 09:57:05.522673'
  stderr: 'Error ERANGE: ''pgp_num'' must be greater than 0 and lower or equal than ''pg_num'', which in this case is 1'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Maybe it's a ceph bug?

[root@rhel9-1 ~]# ceph -n client.admin -k /etc/ceph/ceph.client.admin.keyring --cluster ceph osd pool create test --target-size-ratio 0.2 --autoscale-mode=on
Error ERANGE: 'pgp_num' must be greater than 0 and lower or equal than 'pg_num', which in this case is 1

Also note that creating the rbd in autoscale_mode=warn and then setting it to "on" seems to work:

[root@rhel9-1 ~]# ceph -n client.admin -k /etc/ceph/ceph.client.admin.keyring --cluster ceph osd pool create test --autoscale-mode=warn
pool 'test' created
[root@rhel9-1 ~]# ceph osd pool set test pg_autoscale_mode on
set pool 7 pg_autoscale_mode to on

Share your group_vars files, inventory and full ceph-ansibe log

Environment:

  • OS (e.g. from /etc/os-release): RHEL9.1
  • Kernel (e.g. uname -a): 5.14.0-70
  • Docker version if applicable (e.g. docker version): N/A
  • Ansible version (e.g. ansible-playbook --version): 2.12.0
  • ceph-ansible version (e.g. git head or tag or stable branch): stable-7.0
  • Ceph version (e.g. ceph -v): 17.2.5

Thanks!

all.yml.gz
osds.yml.gz
clients.yml.gz
ansible.log.gz

@asm0deuz
Copy link
Collaborator

asm0deuz commented Nov 25, 2022

Hi,

Indeed it seems to be a Ceph issue as doing the same in my test env works:

[root@mon0 /]# ceph osd pool create test --target-size-ratio 0.2 --autoscale-mode=on
pool 'test' created
[root@mon0 /]# ceph osd pool create test1 --target-size-ratio 1 --autoscale-mode=on
pool 'test1' created

[root@mon0 /]# ceph osd pool ls detail  
pool 1 'device_health_metrics' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 240 pgp_num 230 pg_num_target 8 pgp_num_target 8 pg_num_pending 239 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'rbd' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 116 pgp_num 114 pg_num_target 32 pgp_num_target 32 pg_num_pending 115 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'test' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 250 pg_num_target 32 pgp_num_target 32 pg_num_pending 255 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool stripe_width 0 target_size_ratio 0.2
pool 4 'test1' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 106 lfor 0/0/98 flags hashpspool stripe_width 0 target_size_ratio 1

For that error to happen, pgp_num must be greater than pg_num (https://github.com/ceph/ceph/blob/f2f5ca51509ee8bd6b66772a02f0f57c68862fd7/src/mon/OSDMonitor.cc#L8012) but with autoscale-mode set to on, pg_num is set to 1 which means that in your case pgp_num has to be greater than this value. Have you changed the default value of osd_pool_default_pgp_num in your ceph.conf file?

@asm0deuz
Copy link
Collaborator

@insatomcat

From the all.yml file you shared:

    global:
        osd_pool_default_size: "{{ osd_pool_default_size }}"
        osd_pool_default_min_size: "{{ osd_pool_default_min_size }}"
        osd_pool_default_pg_num: 128
        osd_pool_default_pgp_num: 128
        osd_crush_chooseleaf_type: 1
        mon_osd_min_down_reporters: 1
    mon:
        auth_allow_insecure_global_id_reclaim: false
    osd:
        osd_min_pg_log_entries: 500
        osd_max_pg_log_entries: 500
        osd memory target: "{{ osd_memory_target }}"

You are forcing pgp_num to 128. pgp_num cannot be higher than pg_num, this is why you get that error. When using autoscale_mode=on, pg_num = 1 (https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_pg_autoscale_mode)

@insatomcat
Copy link
Contributor Author

thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants