Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set osd max memory based on host memory #3113

Merged
merged 2 commits into from Sep 18, 2018
Merged

Conversation

neha-ojha
Copy link
Member

This PR addresses https://bugzilla.redhat.com/show_bug.cgi?id=1595003 by adding an osd_memory_target option to ceph-ansible.

Signed-off-by: Neha Ojha nojha@redhat.com

set_fact:
num_osds: "{{ devices | length }}"
when:
- devices|length > 0 and (osd_scenario == 'collocated' or osd_scenario == 'non-collocated')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(osd_scenario == 'collocated' or osd_scenario == 'non-collocated') should be on a new line

when:
  - devices|length > 0
  - (osd_scenario == 'collocated' or osd_scenario == 'non-collocated')

Copy link
Member Author

@neha-ojha neha-ojha Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used and based on neha-ojha/ceph-ansible@9e203db#r30481853. I'll put them in two lines.

set_fact:
num_osds: "{{ lvm_volumes | length }}"
when:
- lvm_volumes|length > 0 and osd_scenario == 'lvm'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, two lines are needed in the when condition.

set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
when:
- devices|length > 0 and osd_scenario == 'lvm-batch'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, two lines are needed. Also, there is no lvm-batch scenario, this should still check for the lvm scenario.


- name: count number of osds for lvm-batch scenario
set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ceph-volume lvm batch --report will need to be run in a separate task using the command module, with the output from the command being registered into a variable. Then that variable can be used here to set the num_osds fact.


- name: count number of osds for lvm-batch scenario
set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
when:
- devices|length > 0 and osd_scenario == 'lvm-batch'
- devices|length > 0
- osd_scenario == 'lvm-batch'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be - osd_scenario == 'lvm'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, updating it in my next commit with the lvm-batch changes.


- name: count number of osds for lvm-batch scenario
set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this here would mean it'd be run on all host, I think it must be run on an OSD.

By the way, you have to do it in two steps :

- name: get number of osds for lvm-batch scenario
  command: "{{ docker_exec_cmd }} ceph-volume lvm batch --report devices"
  register: lvm_batch_devices
  when:
    - devices|length > 0 and osd_scenario == 'lvm-batch'
- name: set_fact num_osds
  set_fact:
    num_osds: "{{ (lvm_batch_devices | from_json).osds | length }}"
  when:
    - devices|length > 0 and osd_scenario == 'lvm-batch'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits would we just use delegate_to and let the first OSD host take care of it. We'd probably only need to run this once I'd imagine, not on every OSD host.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewschoen that's correct.
I think we must use something else than docker_exec_cmd in that case.

(roles/ceph-defaults/tasks/facts.yml:25:    docker_exec_cmd: "docker exec ceph-mon-{{ hostvars[groups[mon_group_name][0]]['ansible_hostname'] }}")

{% if osd_objectstore == 'bluestore' %}
[osd]
osd memory target = {{ osd_memory_target }}
{% if is_hci is defined %} # is_hci should be defined in group_vars/all.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to add this variable in ceph-defaults/defaults/main.yml


- name: count number of osds for lvm-batch scenario
set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this here would mean it'd be run on all host, I think it must be run on an OSD.

By the way, you have to do it in two steps :

- name: get number of osds for lvm-batch scenario
  command: "{{ docker_exec_cmd }} ceph-volume lvm batch --report devices"
  register: lvm_batch_devices
  when:
    - devices|length > 0 and osd_scenario == 'lvm-batch'
- name: set_fact num_osds
  set_fact:
    num_osds: "{{ (lvm_batch_devices | from_json).osds | length }}"
  when:
    - devices|length > 0 and osd_scenario == 'lvm-batch'


- name: count number of osds for lvm-batch scenario
set_fact:
num_osds: "{{ (ceph-volume lvm batch --report devices | from_json()).osds | length }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When #3111 merges this command will also need the --osds-per-device osds_per_device flag added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're also missing --format=json here.

@neha-ojha
Copy link
Member Author

@guits @andrewschoen I've updated the PR, could you guys take another look please.

command: "ceph-volume lvm batch --report --format=json {{ devices| join(' ') }}"
register: lvm_batch_devices
run_once: true
delegate_to: "{{ groups[osd_group_name][0]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the closing }} here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing it. I see that #3111 got merged, I'll add --osds-per-device osds_per_device as well.

@guits
Copy link
Collaborator

guits commented Sep 12, 2018

@neha-ojha please, could you squash your commits? thanks!

- osd_scenario == 'lvm'

- name: get number of osds for lvm-batch scenario
command: "ceph-volume lvm batch --report --format=json --osds-per-device osds_per_device {{ devices| join(' ') }}"
Copy link
Collaborator

@guits guits Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewschoen is this supposed to be run only in non containerized deployment?
(see: #3113 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but I suspect it should work on containerized deployments as well. docker_exec_cmd is expecting a mon at this point, right?

Copy link
Collaborator

@guits guits Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exactly why i'm raising this.
It means, we would need to set a new fact like docker_exec_cmd_osd

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha you have to add this task before L26

- name: set_fact docker_exec_cmd_osd
  set_fact:
    docker_exec_cmd_osd: "docker exec ceph-osd-{{ hostvars[groups[osd_group_name][0]]['ansible_hostname'] }}"
  when:
    - containerized_deployment
    - groups.get(osd_group_name, []) | length > 0

then

command: "{{ docker_exec_cmd_osd | default('') }} ceph-volume lvm batch --report --format=json --osds-per-device osds_per_device {{ devices| join(' ') }}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits will this command: be added as a part of a new task for containerized_deploymentor will it replace https://github.com/ceph/ceph-ansible/pull/3113/files#diff-408c9554f34e961d163f68b645ee6da3R27?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guits updated it, let me know if it looks alright. Managed to get 5d9f269#diff-2857b72f37476e1b5d16e7e81783ecbaL241 as a by-product of that change.

@neha-ojha
Copy link
Member Author

@guits squashed.

@guits
Copy link
Collaborator

guits commented Sep 13, 2018

jenkins test pipeline

- osd_scenario == 'lvm'

- name: get number of osds for lvm-batch scenario
command: "ceph-volume lvm batch --report --format=json --osds-per-device osds_per_device {{ devices| join(' ') }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha you have to add this task before L26

- name: set_fact docker_exec_cmd_osd
  set_fact:
    docker_exec_cmd_osd: "docker exec ceph-osd-{{ hostvars[groups[osd_group_name][0]]['ansible_hostname'] }}"
  when:
    - containerized_deployment
    - groups.get(osd_group_name, []) | length > 0

then

command: "{{ docker_exec_cmd_osd | default('') }} ceph-volume lvm batch --report --format=json --osds-per-device osds_per_device {{ devices| join(' ') }}"

{% if is_hci is defined %} # is_hci should be defined in group_vars/all.yml
{% if num_osds > 0 and ansible_memtotal_mb * hci_safety_factor / num_osds > osd_memory_target %} # hci_safety_factor is the safety factor for HCI deployments
osd memory target = {{ ansible_memtotal_mb * hci_safety_factor / num_osds }}
{% endif %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha something is wrong here, I think you want a {% elif %}?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pay attention to my last comment, I think you are just missing a {% endif %}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I noticed it when I wrote a pythonic version of these ifs :) fixing it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha perhaps I'm missing something but as it is

If osd_objectstore = 'bluestore' and is_hci is defined, it means you are going to add osd memory target twice in the config, L156 + either L159 or L163

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the default value of osd memory target to be set to 4000000000. If those conditions are met, we will override this value else not.

Copy link
Collaborator

@guits guits Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha Isn't the following what you want to achieve ?

{% if osd_objectstore == 'bluestore' %}
[osd]
{% if is_hci and num_osds > 0 %}
{% if ansible_memtotal_mb * hci_safety_factor / num_osds  > osd_memory_target %}
# hci_safety_factor is the safety factor for HCI deployments
{% set _osd_memory_target = (ansible_memtotal_mb * hci_safety_factor / num_osds) %}
{% elif ansible_memtotal_mb * non_hci_safety_factor / num_osds  > osd_memory_target %}
# non_hci_safety_factor is the safety factor for dedicated nodes
{% set _osd_memory_target = (ansible_memtotal_mb * non_hci_safety_factor / num_osds) %}
{% endif %}
{% endif %}
osd memory target = {{ _osd_memory_target | default(osd_memory_target) }}
{% endif %}

@jdurgin
Copy link
Member

jdurgin commented Sep 13, 2018

@markhpc do you think the default of using 0.5 of the node memory is too conservative? In your earlier graphs it looked like RSS could be ~10% over the target

@markhpc
Copy link
Member

markhpc commented Sep 13, 2018

@jdurgin

One thing I noticed is that the behavior is very different on my ubuntu dev box and on incerta running centos7. On my dev box, I saw variations in the 10-20% range. On CentOS the mapped usage tracked RSS very tightly; maybe more like 1%. I'm not sure why that is, other than perhaps the CentOS kernel was set up to reclaim memory much more aggressively. Is it possible to get memory usage from QE runs to see how it's been doing in master under different OSes?

@markhpc
Copy link
Member

markhpc commented Sep 13, 2018

to more directly answer your question: I don't think you'll have any problem with 0.7 unless there's a bunch of other stuff running on it (hyper-converged, or co-located rgw/mon/mgr/etc). The one case where that might not hold up is if The node has a very small amount of memory (say 4GB) in which case you might want a constant+ratio for OS (maybe 1GB + 20% of remaining or something).

{% if osd_objectstore == 'bluestore' %}
[osd]
osd memory target = {{ osd_memory_target }}
{% if is_hci is defined %} # is_hci should be defined in group_vars/all.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is_hci is defined will always return true
if is_hci is enough here I guess

{% if is_hci is defined %} # is_hci should be defined in group_vars/all.yml
{% if num_osds > 0 and ansible_memtotal_mb * hci_safety_factor / num_osds > osd_memory_target %} # hci_safety_factor is the safety factor for HCI deployments
osd memory target = {{ ansible_memtotal_mb * hci_safety_factor / num_osds }}
{% endif %}
Copy link
Collaborator

@guits guits Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neha-ojha Isn't the following what you want to achieve ?

{% if osd_objectstore == 'bluestore' %}
[osd]
{% if is_hci and num_osds > 0 %}
{% if ansible_memtotal_mb * hci_safety_factor / num_osds  > osd_memory_target %}
# hci_safety_factor is the safety factor for HCI deployments
{% set _osd_memory_target = (ansible_memtotal_mb * hci_safety_factor / num_osds) %}
{% elif ansible_memtotal_mb * non_hci_safety_factor / num_osds  > osd_memory_target %}
# non_hci_safety_factor is the safety factor for dedicated nodes
{% set _osd_memory_target = (ansible_memtotal_mb * non_hci_safety_factor / num_osds) %}
{% endif %}
{% endif %}
osd memory target = {{ _osd_memory_target | default(osd_memory_target) }}
{% endif %}

@guits guits force-pushed the wip-trial branch 3 times, most recently from 8e8fba6 to 1090009 Compare September 14, 2018 22:04
@neha-ojha
Copy link
Member Author

ok, this is passing tests now.

@guits guits force-pushed the wip-trial branch 4 times, most recently from f3059ee to 59305fb Compare September 17, 2018 21:10
@guits
Copy link
Collaborator

guits commented Sep 17, 2018

jenkins test pipeline

guits
guits previously approved these changes Sep 18, 2018
BlueStore's cache is sized conservatively by default, so that it does
not overwhelm under-provisioned servers. The default is 1G for HDD, and
3G for SSD.

To replace the page cache, as much memory as possible should be given to
BlueStore. This is required for good performance. Since ceph-ansible
knows how much memory a host has, it can set

`bluestore cache size = max(total host memory / num OSDs on this host * safety
factor, 1G)`

Due to fragmentation and other memory use not included in bluestore's
cache, a safety factor of 0.5 for dedicated nodes and 0.2 for
hyperconverged nodes is recommended.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1595003

Signed-off-by: Neha Ojha <nojha@redhat.com>
Co-Authored-by: Guillaume Abrioux <gabrioux@redhat.com>
@mergify mergify bot merged commit 27027a1 into ceph:master Sep 18, 2018
@alfredodeza
Copy link
Contributor

This PR is now breaking ceph-volume tests. Ideally, we would like to see some tests pass with devel (ceph-volume, ceph master, etc...):

TASK [ceph-config : count number of osds for non-lvm scenario] *****************
task path: /tmp/tox.fjM8P3TRe0/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-config/tasks/main.yml:13
skipping: [mon0] => {
    "changed": false, 
    "skip_reason": "Conditional result was False"
}
fatal: [osd0]: FAILED! => {}

MSG:

The conditional check 'devices | length > 0' failed. The error was: error while evaluating conditional (devices | length > 0): 'devices' is undefined

The error appears to have been in '/tmp/tox.fjM8P3TRe0/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-config/tasks/main.yml': line 13, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  - block:
    - name: count number of osds for non-lvm scenario
      ^ here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants