create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

insatomcat · 2022-11-19T09:11:06Z

Bug Report

What happened:

ceph-ansible (stable-7.0) is not able to create an rbd pool with autoscale_mode=on.
I get the error:

Error ERANGE: ''pgp_num'' must be greater than 0 and lower or equal than ''pg_num'', which in this case is 1

What you expected to happen:

The rbd pool should be created with no error in autoscaling mode.

How to reproduce it (minimal and precise):

Using :

ceph official rpm package for v17.2.5 (https://download.ceph.com/rpm-17.2.5/el9/)
RHEL9.
ceph-ansible stable-7.0.

Using clients.yml car file to create an rbd pool:

user_config: true
rbd:
  name: "rbd"
  application: "rbd"
  pg_autoscale_mode: on
  target_size_ratio: 1
pools:
  - "{{ rbd }}"

ceph-ansible fails:

2022-11-19 08:57:07,293 p=24 u=virtu n=ansible | failed: [rhel9-1] (item={'name': 'rbd', 'application': 'rbd', 'pg_autoscale_mode': True, 'target_size_ratio': 1}) => changed=false
  ansible_loop_var: item
  cmd:
  - ceph
  - -n
  - client.admin
  - -k
  - /etc/ceph/ceph.client.admin.keyring
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - rbd
  - replicated
  - --target_size_ratio
  - '1'
  - replicated_rule
  - --expected_num_objects
  - '0'
  - --autoscale-mode
  - 'on'
  delta: '0:00:01.698545'
  end: '2022-11-19 09:57:07.221218'
  item:
    application: rbd
    name: rbd
    pg_autoscale_mode: true
    target_size_ratio: 1
  rc: 2
  start: '2022-11-19 09:57:05.522673'
  stderr: 'Error ERANGE: ''pgp_num'' must be greater than 0 and lower or equal than ''pg_num'', which in this case is 1'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Maybe it's a ceph bug?

[root@rhel9-1 ~]# ceph -n client.admin -k /etc/ceph/ceph.client.admin.keyring --cluster ceph osd pool create test --target-size-ratio 0.2 --autoscale-mode=on
Error ERANGE: 'pgp_num' must be greater than 0 and lower or equal than 'pg_num', which in this case is 1

Also note that creating the rbd in autoscale_mode=warn and then setting it to "on" seems to work:

[root@rhel9-1 ~]# ceph -n client.admin -k /etc/ceph/ceph.client.admin.keyring --cluster ceph osd pool create test --autoscale-mode=warn
pool 'test' created
[root@rhel9-1 ~]# ceph osd pool set test pg_autoscale_mode on
set pool 7 pg_autoscale_mode to on

Share your group_vars files, inventory and full ceph-ansibe log

Environment:

OS (e.g. from /etc/os-release): RHEL9.1
Kernel (e.g. uname -a): 5.14.0-70
Docker version if applicable (e.g. docker version): N/A
Ansible version (e.g. ansible-playbook --version): 2.12.0
ceph-ansible version (e.g. git head or tag or stable branch): stable-7.0
Ceph version (e.g. ceph -v): 17.2.5

Thanks!

all.yml.gz
osds.yml.gz
clients.yml.gz
ansible.log.gz

The text was updated successfully, but these errors were encountered:

asm0deuz · 2022-11-25T09:25:02Z

Hi,

Indeed it seems to be a Ceph issue as doing the same in my test env works:

[root@mon0 /]# ceph osd pool create test --target-size-ratio 0.2 --autoscale-mode=on
pool 'test' created
[root@mon0 /]# ceph osd pool create test1 --target-size-ratio 1 --autoscale-mode=on
pool 'test1' created

[root@mon0 /]# ceph osd pool ls detail  
pool 1 'device_health_metrics' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 240 pgp_num 230 pg_num_target 8 pgp_num_target 8 pg_num_pending 239 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'rbd' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 116 pgp_num 114 pg_num_target 32 pgp_num_target 32 pg_num_pending 115 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'test' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 250 pg_num_target 32 pgp_num_target 32 pg_num_pending 255 autoscale_mode on last_change 111 lfor 0/111/111 flags hashpspool stripe_width 0 target_size_ratio 0.2
pool 4 'test1' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 106 lfor 0/0/98 flags hashpspool stripe_width 0 target_size_ratio 1

For that error to happen, pgp_num must be greater than pg_num (https://github.com/ceph/ceph/blob/f2f5ca51509ee8bd6b66772a02f0f57c68862fd7/src/mon/OSDMonitor.cc#L8012) but with autoscale-mode set to on, pg_num is set to 1 which means that in your case pgp_num has to be greater than this value. Have you changed the default value of osd_pool_default_pgp_num in your ceph.conf file?

asm0deuz · 2022-11-25T16:36:30Z

@insatomcat

From the all.yml file you shared:

    global:
        osd_pool_default_size: "{{ osd_pool_default_size }}"
        osd_pool_default_min_size: "{{ osd_pool_default_min_size }}"
        osd_pool_default_pg_num: 128
        osd_pool_default_pgp_num: 128
        osd_crush_chooseleaf_type: 1
        mon_osd_min_down_reporters: 1
    mon:
        auth_allow_insecure_global_id_reclaim: false
    osd:
        osd_min_pg_log_entries: 500
        osd_max_pg_log_entries: 500
        osd memory target: "{{ osd_memory_target }}"

You are forcing pgp_num to 128. pgp_num cannot be higher than pg_num, this is why you get that error. When using autoscale_mode=on, pg_num = 1 (https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_pg_autoscale_mode)

insatomcat · 2022-11-25T17:06:07Z

thanks a lot.

insatomcat closed this as completed Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

insatomcat commented Nov 19, 2022

asm0deuz commented Nov 25, 2022 •

edited

Loading

asm0deuz commented Nov 25, 2022

insatomcat commented Nov 25, 2022

create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

create rbd pool in autoscaling mode fails with Error ERANGE pgp_num #7353

Comments

insatomcat commented Nov 19, 2022

asm0deuz commented Nov 25, 2022 • edited Loading

asm0deuz commented Nov 25, 2022

insatomcat commented Nov 25, 2022

asm0deuz commented Nov 25, 2022 •

edited

Loading