Fix HostResolver behavior on fail #62652

ianton-ru · 2024-04-15T13:04:49Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

HostResolver has each IP address several times.
If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again).
Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again.

This fix

Allows only one record per IP in HostResoler
Adds exponential timeouts between reties of failed IPs

Include tests (required builds will be added automatically):

Performance tests

robot-ch-test-poll · 2024-04-15T15:05:05Z

This is an automated comment for commit 8a24515 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	❌ failure
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ error
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ error
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	❌ failure

Successful checks

Check name	Description	Status
A Sync	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
PR Check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

yariks5s

Also, can you please add tests emulating the situation that will raise the new behaviour?

yariks5s · 2024-04-16T12:19:52Z

src/Common/HostResolvePool.cpp

+        while (it != records.end() && it->address == address)
+        {
+            it->failed = true;
+            it->fail_time = now;
+            if (it->fail_count < RECORD_FAIL_COUNT_LIMIT)
+                ++it->fail_count;
+            ++it;
+        }


what's the point of creating while cycle here? We assume that we may have the same addresses there?

Yes, you a right, with single record for address this cycle useless, removed.

ianton-ru · 2024-04-19T17:10:11Z

Also, can you please add tests emulating the situation that will raise the new behaviour?

Did not managed to write good test. Result of fix can be viewed after long work - reduced count of timeouts. With short test it's too random - in selectBest random IP choosed, so it can be alive IP.

CheSema · 2024-04-30T14:47:41Z

src/Common/HostResolvePool.cpp

+            if (merged.empty() || merged.back().address != *it_next)
+                merged.push_back(Record(*it_next, now));
+            else
+                merged.back().resolve_time = now;


I do not understand that lines.
It is a case when new address discovered. Why do you do additional check here if (merged.empty() || merged.back().address != *it_next)? This condition is always true here.

On my host getaddrinfo returns 3 record per each address, for example

(gdb) frame #0 Poco::Net::DNS::hostByName (hostname=..., hintFlags=<optimized out>) at ./build-clang-17/./base/poco/Net/src/DNS.cpp:80 80 if (rc == 0) (gdb) l 75 struct addrinfo* pAI; 76 struct addrinfo hints; 77 std::memset(&hints, 0, sizeof(hints)); 78 hints.ai_flags = hintFlags; 79 int rc = getaddrinfo(hostname.c_str(), NULL, &hints, &pAI); 80 if (rc == 0) 81 { 82 HostEntry result(pAI); 83 freeaddrinfo(pAI); 84 return result; (gdb) p *pAI $14 = {ai_flags = 32, ai_family = 2, ai_socktype = 1, ai_protocol = 6, ai_addrlen = 16, ai_addr = 0x76bbf2316170, ai_canonname = 0x0, ai_next = 0x76bbf2316180} (gdb) p *pAI->ai_next $15 = {ai_flags = 32, ai_family = 2, ai_socktype = 2, ai_protocol = 17, ai_addrlen = 16, ai_addr = 0x76bbf23161b0, ai_canonname = 0x0, ai_next = 0x76bbf23161c0} (gdb) p *pAI->ai_next->ai_next $16 = {ai_flags = 32, ai_family = 2, ai_socktype = 3, ai_protocol = 0, ai_addrlen = 16, ai_addr = 0x76bbf23161f0, ai_canonname = 0x0, ai_next = 0x0} (gdb) p *(sockaddr_in*)pAI->ai_addr $19 = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"} (gdb) p *(sockaddr_in*)pAI->ai_next->ai_addr $20 = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"} (gdb) p *(sockaddr_in*)pAI->ai_next->ai_next->ai_addr $21 = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 16777343}, sin_zero = "\000\000\000\000\000\000\000"}

(16777343 ==> '127.0.0.1')
so each IP has three copies in next_gen parameter in HostResolver::updateImpl method.

By the way, randomizing in HostResolver does not respect address priority from /etc/gai.conf

I did not saw any code in Clickhouse which cares about ip order, so I did not supported it here either. if you wish you could try to add it!

so each IP has three copies in next_gen parameter in HostResolver::updateImpl method.

I thought that DNSResolver eliminating duplicates. But if it does not, than some one should do it. Lets do it in HostResolver because it expects unique records right now. Thanks for that info.

BTW you reminded me that it should be like that here

ClickHouse/src/Common/HostResolvePool.cpp

Line 50 in f75447b

, resolve_function([](const String & host_to_resolve) { return DNSResolver::instance().resolveHostAll(host_to_resolve); })

- , resolve_function([](const String & host_to_resolve) { return DNSResolver::instance().resolveHostAll(host_to_resolve); }) + , resolve_function([](const String & host_to_resolve) { return DNSResolver::instance().resolveHostAllInOriginOrder(host_to_resolve); })

Could you fix that as well?

Second random in choosing IPs by weights, so ordering here does not fix it.

CheSema · 2024-04-30T14:49:54Z

src/Common/HostResolvePool.cpp

@@ -237,7 +244,7 @@ void HostResolver::updateImpl(Poco::Timestamp now, std::vector<Poco::Net::IPAddr
    }

    for (auto & rec : merged)
-        if (rec.failed && rec.fail_time < last_effective_resolve)
+        if (rec.failed && rec.fail_time < now - Poco::Timespan(history.totalSeconds() * (1ull << (rec.fail_count - 1)), 0))


What that expression means? now - Poco::Timespan(history.totalSeconds() * (1ull << (rec.fail_count - 1))
Make a variable or function, the name will give us a hint what it is.

Moved in separate method HostResolver::Record::cleanTimeoutedFailedFlag

CheSema · 2024-04-30T14:50:46Z

src/Common/HostResolvePool.h

@@ -141,6 +142,7 @@ class HostResolver : public std::enable_shared_from_this<HostResolver>
        size_t usage = 0;
        bool failed = false;
        Poco::Timestamp fail_time = 0;
+        size_t fail_count = 0;


if you want to count failtures that you do not need this at all bool failed = false;

uint_8 fail_count = 0;
+
static_assert that (1 << sizeof (fail_count)) <= RECORD_FAIL_COUNT_LIMIT, or something like that.

It;s a different. failed flag cleaned from time to time (HostResolverPool.cpp, line 248), and without fail_count it is cleaned every 2 minutes (DEFAULT_RESOLVE_TIME_HISTORY_SECONDS). It is possible that in some time IP can be accessible again. failed set to true to check address again, but fail_count is not set to zero to make pauses between checks in 2, 4, 8, ..., 62 minutes. fail_count set to zero only after success check.

I understood that too after some time ) Tnx.
May be it is better name for fail_count as consecutive_fail_count. You make bigger penalty according consecutive failures.

CheSema · 2024-04-30T14:56:39Z

Actually idea what I see in the code is good. I like it.
But the description did not help me to understand that idea fast. Please write what you did in the PR description more detailed.

It is possible to write good tests here. See src/Common/tests/gtest_resolve_pool.cpp. There is ResolvePoolMock which takes result of resolving from any lambda.

CheSema · 2024-04-30T15:21:42Z

Without tests it wont be merged. Please try to do them. It is better to check your good ideas with some tests than on real clusters.

Usually IP is not bad completely. It just overloaded and connections are failed to it. So there should not be big penalty just for occasional fails. If IP is really expired that it has to be removed from dns and clickhouse forgets it as well.
You are trying to improve detection of the bad IP in order to understand if there is some IP which is failed every connect. Such IP has to be ban more severe for each consecutive fails.
You purposely do not reset fail_count if all IP are marked as faulty and we have to reset their status. It is good. But comments and tests are required. This is why you need separate bool failed = false;.

CheSema · 2024-04-30T15:43:37Z

Also, can you please add tests emulating the situation that will raise the new behaviour?

Did not managed to write good test. Result of fix can be viewed after long work - reduced count of timeouts. With short test it's too random - in selectBest random IP choosed, so it can be alive IP.

history time could be set as a small value in tests.

ianton-ru · 2024-04-30T16:30:01Z

Also, can you please add tests emulating the situation that will raise the new behaviour?

Did not managed to write good test. Result of fix can be viewed after long work - reduced count of timeouts. With short test it's too random - in selectBest random IP choosed, so it can be alive IP.

history time could be set as a small value in tests.

It's a DEFAULT_RESOLVE_TIME_HISTORY_SECONDS and can't be changed with some setting now.

CheSema · 2024-04-30T16:35:03Z

Also, can you please add tests emulating the situation that will raise the new behaviour?

Did not managed to write good test. Result of fix can be viewed after long work - reduced count of timeouts. With short test it's too random - in selectBest random IP choosed, so it can be alive IP.

history time could be set as a small value in tests.

It's a DEFAULT_RESOLVE_TIME_HISTORY_SECONDS and can't be changed with some setting now.

You do not have to make a test for HostResolversPool. You need a test for HostResolver.
It could be created with special agrs. Check this:

ClickHouse/src/Common/HostResolvePool.cpp

Lines 55 to 57 in f75447b

    
           HostResolver::HostResolver( 
        
               ResolveFunction && resolve_function_, String host_, Poco::Timespan history_) 
        
               : host(std::move(host_)), history(history_), resolve_function(std::move(resolve_function_))

ianton-ru · 2024-05-03T11:33:11Z

Tried to add test for this.

ianton-ru · 2024-05-13T08:41:57Z

@CheSema Is any chance to review tests on next week?

CheSema · 2024-05-13T18:08:42Z

src/Common/HostResolvePool.cpp

+            /// there are could be duplicates in next_gen vector
+            if (merged.empty() || merged.back().address != *it_next)
+            {
+                CurrentMetrics::add(metrics.active_count, 1);


Here we do not update metrics for duplicates.

CheSema · 2024-05-13T18:10:17Z

src/Common/HostResolvePool.cpp

@@ -237,10 +254,22 @@ void HostResolver::updateImpl(Poco::Timestamp now, std::vector<Poco::Net::IPAddr
    }

    for (auto & rec : merged)
-        if (rec.failed && rec.fail_time < last_effective_resolve)
-            rec.failed = false;
+    {


Here I adjust new counter banned_count. class Rec is unaware about metrics, as a result that code does not belong Rec's method.

CheSema · 2024-05-13T18:11:02Z

src/Common/HostResolvePool.h

@@ -149,6 +152,11 @@ class HostResolver : public std::enable_shared_from_this<HostResolver>
            return address < r.address;
        }

+        bool operator ==(const Record & r) const


needs for is_unuque check under chassert

CheSema · 2024-05-13T18:11:51Z

src/Common/HostResolvePool.h

@@ -166,6 +174,28 @@ class HostResolver : public std::enable_shared_from_this<HostResolver>
                return 8;
            return 10;
        }
+
+        bool setFail(const Poco::Timestamp & now)


return true if status has chenged. Needs for adjusting metrics.

CheSema · 2024-05-13T18:13:12Z

src/Common/HostResolvePool.h

@@ -188,7 +219,7 @@ class HostResolver : public std::enable_shared_from_this<HostResolver>

    std::mutex mutex;

-    Poco::Timestamp last_resolve_time TSA_GUARDED_BY(mutex);
+    Poco::Timestamp last_resolve_time TSA_GUARDED_BY(mutex) = Poco::Timestamp::TIMEVAL_MIN;


just to be sure that HostResolver::update is called in c-tor even if history is 0.

CheSema · 2024-05-13T18:13:31Z

src/Common/tests/gtest_resolve_pool.cpp


-        addresses = std::set<String>{"127.0.0.1", "127.0.0.2", "127.0.0.3"};
+        addresses = std::multiset<String>{"127.0.0.1", "127.0.0.2", "127.0.0.3"};


for testing duplicates in resolve result

yariks5s · 2024-05-17T12:18:16Z

this is failed due to skipped black check in the CI:

maxknv · 2024-05-17T13:52:35Z

@Felixoid ci in pull request skipped black check for a new .py file

ianton-ru · 2024-05-21T11:42:07Z

@CheSema Is any chance to backport to 24.3 LTS?

CheSema · 2024-05-21T11:51:09Z

@CheSema Is any chance to backport to 24.3 LTS?

This is new feature. Normally we do not back port such changes. I will check if there are conflicts.

Algunenano · 2024-05-21T11:56:08Z

This is new feature. Normally we do not back port such changes. I will check if there are conflicts.

Please don't backport features. This would introduce unnecessary risk on minor upgrades for stable releases, where we should only backport important bug fixes.

…68bdafd55f23d3a27f35ce01b7353 Cherry pick #62652 to 24.4: Fix HostResolver behavior on fail

ianton-ru · 2024-05-21T13:00:39Z

HostResolver was changed only in 24.3, so backports to 24.2 and 23.8 are not required.

yariks5s added the can be tested Allows running workflows for external contributors label Apr 15, 2024

yariks5s self-assigned this Apr 15, 2024

robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Apr 15, 2024

yariks5s reviewed Apr 16, 2024

View reviewed changes

alexey-milovidov removed the pr-bugfix Pull request with bugfix, not backported by default label Apr 16, 2024

robot-ch-test-poll2 added the pr-performance Pull request with some performance improvements label Apr 19, 2024

alesapin assigned CheSema Apr 24, 2024

CheSema reviewed Apr 30, 2024

View reviewed changes

ianton-ru added 4 commits May 13, 2024 18:03

Fix HostResolver behavior on fail

08a2d19

Set failed only for single record in HostResolver

87785e1

Test for HostResolver fail_count

22f1c19

Move some logic to HostResolver::Record methods

58c53fa

CheSema force-pushed the fix-host-resolver-fail branch from a17cf27 to 21d0aeb Compare May 13, 2024 18:07

CheSema reviewed May 13, 2024

View reviewed changes

CheSema approved these changes May 15, 2024

View reviewed changes

CheSema added this pull request to the merge queue May 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks May 16, 2024

CheSema added this pull request to the merge queue May 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks May 16, 2024

CheSema added this pull request to the merge queue May 17, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks May 17, 2024

fix black

8a24515

Felixoid mentioned this pull request May 17, 2024

Files without shebang have mime 'text/plain' or 'inode/x-empty' #64062

Merged

31 tasks

Felixoid added this pull request to the merge queue May 17, 2024

Merged via the queue into ClickHouse:master with commit 38787c4 May 17, 2024
137 of 208 checks passed

robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 17, 2024

CheSema added v24.3-must-backport v24.4-must-backport labels May 21, 2024

robot-ch-test-poll3 added a commit that referenced this pull request May 21, 2024

Merge pull request #64195 from ClickHouse/cherrypick/24.4/38787c429fc…

de99bae

…68bdafd55f23d3a27f35ce01b7353 Cherry pick #62652 to 24.4: Fix HostResolver behavior on fail

robot-clickhouse added a commit that referenced this pull request May 21, 2024

Backport #62652 to 24.4: Fix HostResolver behavior on fail

28ffdd1

robot-ch-test-poll3 mentioned this pull request May 21, 2024

Backport #62652 to 24.4: Fix HostResolver behavior on fail #64196

Closed

robot-ch-test-poll3 added the pr-backports-created-cloud label May 21, 2024

robot-ch-test-poll4 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label May 21, 2024


		addresses = std::set<String>{"127.0.0.1", "127.0.0.2", "127.0.0.3"};
		addresses = std::multiset<String>{"127.0.0.1", "127.0.0.2", "127.0.0.3"};

Fix HostResolver behavior on fail #62652

Fix HostResolver behavior on fail #62652

Conversation

ianton-ru commented Apr 15, 2024 • edited by alexey-milovidov Loading

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Include tests (required builds will be added automatically):

robot-ch-test-poll commented Apr 15, 2024 • edited by robot-clickhouse-ci-2 Loading

yariks5s left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianton-ru commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CheSema commented Apr 30, 2024 • edited Loading

CheSema commented Apr 30, 2024 • edited Loading

CheSema commented Apr 30, 2024

ianton-ru commented Apr 30, 2024

CheSema commented Apr 30, 2024 • edited Loading

ianton-ru commented May 3, 2024

ianton-ru commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yariks5s commented May 17, 2024 • edited Loading

maxknv commented May 17, 2024

ianton-ru commented May 21, 2024

CheSema commented May 21, 2024

Algunenano commented May 21, 2024

ianton-ru commented May 21, 2024

ianton-ru commented Apr 15, 2024 •

edited by alexey-milovidov

Loading

robot-ch-test-poll commented Apr 15, 2024 •

edited by robot-clickhouse-ci-2

Loading

yariks5s left a comment •

edited

Loading

CheSema Apr 30, 2024 •

edited

Loading

CheSema Apr 30, 2024 •

edited

Loading

CheSema Apr 30, 2024 •

edited

Loading

CheSema commented Apr 30, 2024 •

edited

Loading

CheSema commented Apr 30, 2024 •

edited

Loading

CheSema commented Apr 30, 2024 •

edited

Loading

yariks5s commented May 17, 2024 •

edited

Loading