Failing to upgrade ES from 7.17.10 to 8.9.1: incorrect validation of custom write thread pool size #101206

gustavosci · 2023-10-23T10:27:31Z

Elasticsearch Version

7.17.10

Installed Plugins

No response

Java Version

openjdk 20.0.1 2023-04-18

OS Version

Linux es-es-data-1-0 5.4.231-137.341.amzn2.aarch64 #1 SMP Tue Feb 14 21:50:56 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

Problem Description

Background
It has been found an issue during the upgrade from ES v7.17.10 to v8.9.1.
Initially, the issue seemed to be due to an eck-operator validation defect, but after discussing it with eck-operator team, they indicated that it was an ES problem (elastic/cloud-on-k8s#7173).

Issue
We cannot set a custom thread pool write size for our data nodes.

Master nodes config:

          resources:
            limits:
              cpu: "4"
              memory: 16Gi
            requests:
              cpu: "4"
              memory: 16Gi

Data nodes config:

          resources:
            limits:
              cpu: "13"
              memory: 57Gi
            requests:
              cpu: "13"
              memory: 57Gi

The issue only happens when we try to apply thread_pool.write.size=6 for our data nodes.
As per ES documentation, the write thread pool is fixed with a size of # of allocated processors. The maximum size for this pool is 1 + # of allocated processors. It means that, in our case, as we are setting tp size to 6 only for data nodes, it should work, since data nodes have 13 CPU cores allocated.

However, it looks like that ES is considering the nodes with less CPU allocated in the cluster, regardless of the node type, to make this validation, which is not correct.
We did some tests and, for example, thread_pool.write.size=5 already works. Increasing the CPUs for master nodes from 4 to 5 also makes thread_pool.write.size=6 work.

So, to summarize, it seems that the node type should be taken into consideration for this validation. Can you please take a look at that?

Steps to Reproduce

Set up a cluster with 3 master nodes and 3 data nodes
- ES version v7.17.10.
- Each master node has 4 CPUs
- Each data nodes has 13 CPUs
- Set thread_pool.write.size=6 for data nodes only.
Try to upgrade the cluster to v8.9.1 using eck-operator

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-10-27T17:25:54Z

Pinging @elastic/es-distributed (Team:Distributed)

gustavosci added >bug needs:triage Requires assignment of a team area label labels Oct 23, 2023

gustavosci mentioned this issue Oct 23, 2023

ECK Operator 2.6.1 failing to upgrade ES from 7.17.10 to 8.9.1 due to invalid setting which is actually valid elastic/cloud-on-k8s#7173

Open

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to upgrade ES from 7.17.10 to 8.9.1: incorrect validation of custom write thread pool size #101206

Failing to upgrade ES from 7.17.10 to 8.9.1: incorrect validation of custom write thread pool size #101206

gustavosci commented Oct 23, 2023

elasticsearchmachine commented Oct 27, 2023

Failing to upgrade ES from 7.17.10 to 8.9.1: incorrect validation of custom write thread pool size #101206

Failing to upgrade ES from 7.17.10 to 8.9.1: incorrect validation of custom write thread pool size #101206

Comments

gustavosci commented Oct 23, 2023

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticsearchmachine commented Oct 27, 2023