Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#54767 Remove extraneous volumes in Keeper image #61683

Conversation

Tristan971
Copy link
Contributor

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove from the Keeper Docker image the volumes at /etc/clickhouse-keeper and /var/log/clickhouse-keeper

Documentation entry for user-facing changes

Motivation: Fixes #54767.

In summary (more details in the issue), VOLUME directives in Docker images cannot be overriden by users (neither by extending the image, nor at runtime). Therefore, volumes that are not strictly necessary are:

  • a mild security risk (they bypass runtime configurations, like read-only filesystem settings)
  • a mild availability risk (/var/log/clickhouse-keeper could get filled upon misconfiguration and impact the host negatively)

@CLAassistant
Copy link

CLAassistant commented Mar 21, 2024

CLA assistant check
All committers have signed the CLA.

@Tristan971
Copy link
Contributor Author

Tristan971 commented Mar 21, 2024

Funnily enough, when I build & start the container locally:

2024.03.21 04:34:42.741411 [ 1 ] {} <Error> Application: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in create_directories: Permission denied ["/var/lib/clickhouse-keeper"], Stack trace (when copying this message, always include the lines below):

0. std::system_error::system_error(std::error_code, String const&) @ 0x000000000167daf5
1. std::__fs::filesystem::filesystem_error::filesystem_error[abi:v15000](String const&, std::__fs::filesystem::path const&, std::error_code) @ 0x00000000009dbfc1
2. void std::__fs::filesystem::__throw_filesystem_error[abi:v15000]<String&, std::__fs::filesystem::path const&, std::error_code const&>(String&, std::__fs::filesystem::path const&, std::error_code const&) @ 0x0000000001645bd7
3. std::__fs::filesystem::detail::(anonymous namespace)::ErrorHandler<bool>::report(std::error_code const&) const (.llvm.5385572262389653489) @ 0x0000000001649156
4. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x0000000001649953
5. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x00000000016499f5
6. DB::ConfigProcessor::savePreprocessedConfig(DB::ConfigProcessor::LoadedConfig&, String) @ 0x0000000000e25c7e
7. DB::ConfigReloader::reloadIfNewer(bool, bool, bool, bool) @ 0x0000000000e39ef6
8. DB::ConfigReloader::ConfigReloader(std::basic_string_view<char, std::char_traits<char>>, std::vector<String, std::allocator<String>> const&, String const&, zkutil::ZooKeeperNodeCache&&, std::shared_ptr<Poco::Event> const&, std::function<void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>&&, bool) @ 0x0000000000e38ccd
9. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000000bba745
10. Poco::Util::Application::run() @ 0x0000000001040046
11. DB::Keeper::run() @ 0x0000000000bb60fd
12. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000001049219
13. mainEntryClickHouseKeeper(int, char**) @ 0x0000000000bb50b8
14. main @ 0x0000000000bc5e9d

Cannot print extra info for Poco::Exception (version 24.2.2.71 (official build))
2024.03.21 04:34:42.741481 [ 1 ] {} <Information> Application: shutting down

So it seems to me that there's probably a mismatch in the Docker image already (which prepares /var/lib/clickhouse) and the default keeper config (which tries to use /var/lib/clickhouse-keeper)

And this issue is also present in current public clickhouse/clickhouse-keeper image.

Which really makes me think that my point in the issue about no one actually using the bare image as-is without explicit volumes is probably true.

@Felixoid Felixoid self-assigned this Mar 21, 2024
@Felixoid
Copy link
Member

Felixoid commented Mar 21, 2024

It's a pretty good catch, actually!

Would you like to fix it as well? We definitely should use /var/lib/clickhouse-keeper

> git grep -Pn '/var/lib/clickhouse\b' docker/keeper/
docker/keeper/Dockerfile:75:    && adduser -S -h /var/lib/clickhouse -s /bin/bash -G clickhouse -g "ClickHouse keeper" -u 101 clickhouse \
docker/keeper/Dockerfile:76:    && mkdir -p /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper \
docker/keeper/Dockerfile:77:    && chown clickhouse:clickhouse /var/lib/clickhouse \
docker/keeper/Dockerfile:83:    && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper
docker/keeper/Dockerfile:88:VOLUME /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper
docker/keeper/entrypoint.sh:41:DATA_DIR="${CLICKHOUSE_DATA_DIR:-/var/lib/clickhouse}"
docker/keeper/entrypoint.sh:83:    cd /var/lib/clickhouse

The last one should be replaced by "$DATA_DIR" as well

@Tristan971
Copy link
Contributor Author

Sure, I’ll do that shortly. I refrained as I expected you’d prefer it fixed in a separate PR

@Felixoid
Copy link
Member

No, let's fix every image's flaw at once and backport it

@Tristan971
Copy link
Contributor Author

Tristan971 commented Mar 22, 2024

Ok so the situation is a little worse than it sounds. The commit above at least makes the Docker image more readable (imo), but alas I can't entirely change the entrypoint...

The problem is that keeper, with the default setup, will expect both /var/lib/clickhouse and /var/lib/clickhouse-keeper to exist and be writable...

If I put /var/lib/clickhouse on both sides, ie editing the Dockerfile with /var/lib/clickhouse (instead of -keeper):

slow crash
Processing configuration file '/etc/clickhouse-keeper/keeper_config.xml'.
Logging trace to /var/log/clickhouse-keeper/clickhouse-keeper.log
Logging errors to /var/log/clickhouse-keeper/clickhouse-keeper.err.log
2024.03.22 03:46:44.296961 [ 1 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB
2024.03.22 03:46:44.300957 [ 1 ] {} <Information> Application: Starting ClickHouse Keeper 24.2.2.71 (revision: 54483, git hash: 9293d361e72be9f6ccfd444d504e2137b2e837cf, build id: 015C412FA457DCCDD6802012AD9AA854ED82096C), PID 1
2024.03.22 03:46:44.300992 [ 1 ] {} <Information> Application: starting up
2024.03.22 03:46:44.301010 [ 1 ] {} <Information> Application: OS Name = Linux, OS Version = 6.7.9-200.fc39.x86_64, OS Architecture = x86_64
2024.03.22 03:46:44.301195 [ 1 ] {} <Information> Application: keeper_server.max_memory_usage_soft_limit is set to 56.44 GiB
2024.03.22 03:46:44.307355 [ 1 ] {} <Debug> Application: Initializing DateLUT.
2024.03.22 03:46:44.307375 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone 'UTC'.
2024.03.22 03:46:44.308528 [ 1 ] {} <Trace> AsynchronousMetrics: Scanning /sys/class/thermal
2024.03.22 03:46:44.309441 [ 1 ] {} <Trace> AsynchronousMetrics: Scanning /sys/block
2024.03.22 03:46:44.309584 [ 1 ] {} <Trace> AsynchronousMetrics: Scanning /sys/devices/system/edac
2024.03.22 03:46:44.309618 [ 1 ] {} <Trace> AsynchronousMetrics: Scanning /sys/class/hwmon
2024.03.22 03:46:44.322422 [ 1 ] {} <Debug> KeeperDispatcher: Initializing storage dispatcher
2024.03.22 03:46:44.322485 [ 1 ] {} <Warning> CloudPlacementInfo: Placement info has not been loaded
2024.03.22 03:46:44.322539 [ 1 ] {} <Information> KeeperContext: Keeper feature flag FILTERED_LIST: enabled
2024.03.22 03:46:44.322559 [ 1 ] {} <Information> KeeperContext: Keeper feature flag MULTI_READ: enabled
2024.03.22 03:46:44.322573 [ 1 ] {} <Information> KeeperContext: Keeper feature flag CHECK_NOT_EXISTS: disabled
2024.03.22 03:46:44.322586 [ 1 ] {} <Information> KeeperContext: Keeper feature flag CREATE_IF_NOT_EXISTS: disabled
2024.03.22 03:46:44.322850 [ 1 ] {} <Trace> KeeperSnapshotManager: Reading from disk LocalSnapshotDisk
2024.03.22 03:46:44.322900 [ 1 ] {} <Trace> KeeperSnapshotManager: No snapshots were found on LocalSnapshotDisk
2024.03.22 03:46:44.323016 [ 1 ] {} <Trace> KeeperLogStore: Reading from disk LocalLogDisk
2024.03.22 03:46:44.323068 [ 1 ] {} <Warning> KeeperLogStore: No logs exists in /var/lib/clickhouse/coordination/logs. It's Ok if it's the first run of clickhouse-keeper.
2024.03.22 03:46:44.323246 [ 1 ] {} <Information> KeeperLogStore: force_sync enabled
2024.03.22 03:46:44.323273 [ 1 ] {} <Debug> KeeperDispatcher: Waiting server to initialize
2024.03.22 03:46:44.323297 [ 1 ] {} <Debug> KeeperStateMachine: Totally have 0 snapshots
2024.03.22 03:46:44.323314 [ 1 ] {} <Debug> KeeperStateMachine: No existing snapshots, last committed log index 0
2024.03.22 03:46:44.323355 [ 1 ] {} <Warning> KeeperLogStore: Removing all changelogs
2024.03.22 03:46:44.323383 [ 1 ] {} <Trace> Changelog: Starting new changelog changelog_1_100000.bin
2024.03.22 03:46:44.323446 [ 1 ] {} <Trace> KeeperServer: Last local log idx 0
2024.03.22 03:46:44.323467 [ 1 ] {} <Information> KeeperServer: No config in log store and snapshot, probably it's initial run. Will use config from .xml on disk
2024.03.22 03:46:44.324489 [ 1 ] {} <Information> RaftInstance: Raft ASIO listener initiated on :::9234, unsecured
2024.03.22 03:46:44.324537 [ 1 ] {} <Information> RaftInstance: parameters: election timeout range 1000 - 2000, heartbeat 500, leadership expiry 10000, max batch 100, backoff 50, snapshot distance 100000, enable randomized snapshot creation NO, log sync stop gap 99999, reserved logs 100000, client timeout 10000, auto forwarding on, API call type async, custom commit quorum size 0, custom election quorum size 0, snapshot receiver included, leadership transfer wait time 0, grace period of lagging state machine 0, snapshot IO: blocking, parallel log appending: on
2024.03.22 03:46:44.324555 [ 1 ] {} <Information> RaftInstance: new election timeout range: 1000 - 2000
2024.03.22 03:46:44.324573 [ 1 ] {} <Information> RaftInstance:    === INIT RAFT SERVER ===
commit index 0
term 0
election timer allowed
log store start 1, end 0
config log idx 0, prev log idx 0
2024.03.22 03:46:44.324595 [ 1 ] {} <Information> RaftInstance: peer 1: DC ID 0, localhost:9234, voting member, 1
my id: 1, voting_member
num peers: 0
2024.03.22 03:46:44.324620 [ 1 ] {} <Information> RaftInstance: global manager does not exist. will use local thread for commit and append
2024.03.22 03:46:44.324712 [ 1 ] {} <Information> RaftInstance: wait for HB, for 50 + [1000, 2000] ms
2024.03.22 03:46:44.324797 [ 71 ] {} <Information> RaftInstance: bg append_entries thread initiated
2024.03.22 03:46:44.374871 [ 1 ] {} <Debug> KeeperDispatcher: Server initialized, waiting for quorum
2024.03.22 03:46:46.000196 [ 38 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2024.03.22 03:46:46.000274 [ 38 ] {} <Information> RaftInstance: [PRIORITY] decay, target 1 -> 1, mine 1
2024.03.22 03:46:46.000312 [ 38 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: follower, log last term 0, state term 0, target p 1, my p 1, hb dead, pre-vote NOT done
2024.03.22 03:46:46.015066 [ 38 ] {} <Information> RaftInstance: [VOTE INIT] my id 1, my role candidate, term 1, log idx 0, log term 0, priority (target 1 / mine 1)
2024.03.22 03:46:46.015102 [ 38 ] {} <Information> RaftInstance: number of pending commit elements: 0
2024.03.22 03:46:46.015127 [ 38 ] {} <Information> RaftInstance: state machine commit index 0, precommit index 0, last log index 0
2024.03.22 03:46:46.015175 [ 38 ] {} <Information> RaftInstance: [BECOME LEADER] appended new config at 1
2024.03.22 03:46:46.021585 [ 70 ] {} <Information> RaftInstance: config at index 1 is committed, prev config log idx 0
2024.03.22 03:46:46.021627 [ 70 ] {} <Information> RaftInstance: new config log idx 1, prev log idx 0, cur config log idx 0, prev log idx 0
2024.03.22 03:46:46.021650 [ 70 ] {} <Information> RaftInstance: new configuration: log idx 1, prev log idx 0
peer 1, DC ID 0, localhost:9234, voting member, 1
my id: 1, leader: 1, term: 1
2024.03.22 03:46:46.021704 [ 1 ] {} <Debug> KeeperDispatcher: Quorum initialized
2024.03.22 03:46:46.021851 [ 1 ] {} <Debug> KeeperDispatcher: Dispatcher initialized
2024.03.22 03:46:46.022143 [ 1 ] {} <Warning> Application: Listen [::1]:9181 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:9181 (version 24.2.2.71 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2024.03.22 03:46:46.022277 [ 1 ] {} <Information> Application: Listening for Keeper (tcp): 127.0.0.1:9181
2024.03.22 03:46:46.022826 [ 1 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 59.61 KiB, peak 1.04 MiB, free memory in arenas 2.27 MiB, will set to 26.25 MiB (RSS), difference: 26.19 MiB
2024.03.22 03:46:46.029235 [ 1 ] {} <Debug> ConfigReloader: Loading config '/etc/clickhouse-keeper/keeper_config.xml'
2024.03.22 03:46:46.029264 [ 1 ] {} <Debug> ConfigProcessor: Processing configuration file '/etc/clickhouse-keeper/keeper_config.xml'.
2024.03.22 03:46:46.272716 [ 1 ] {} <Debug> KeeperDispatcher: Shutting down storage dispatcher
2024.03.22 03:46:46.522157 [ 1 ] {} <Information> RaftInstance: shutting down raft core
2024.03.22 03:46:46.522217 [ 1 ] {} <Information> RaftInstance: sent stop signal to the commit thread.
2024.03.22 03:46:46.522239 [ 1 ] {} <Information> RaftInstance: cancelled all schedulers.
2024.03.22 03:46:46.522257 [ 1 ] {} <Information> RaftInstance: commit thread stopped.
2024.03.22 03:46:46.522281 [ 1 ] {} <Information> RaftInstance: all pending commit elements dropped.
2024.03.22 03:46:46.522298 [ 1 ] {} <Information> RaftInstance: reset all pointers.
2024.03.22 03:46:46.522341 [ 1 ] {} <Information> RaftInstance: joined terminated commit thread.
2024.03.22 03:46:46.522370 [ 1 ] {} <Information> RaftInstance: sent stop signal to background append thread.
2024.03.22 03:46:46.522376 [ 71 ] {} <Information> RaftInstance: bg append_entries thread terminated
2024.03.22 03:46:46.522465 [ 1 ] {} <Information> RaftInstance: clean up auto-forwarding queue: 0 elems
2024.03.22 03:46:46.522486 [ 1 ] {} <Information> RaftInstance: clean up auto-forwarding clients
2024.03.22 03:46:46.522503 [ 1 ] {} <Information> RaftInstance: raft_server shutdown completed.
2024.03.22 03:46:46.522519 [ 1 ] {} <Information> RaftInstance: manually create a snapshot on 1
2024.03.22 03:46:46.522538 [ 1 ] {} <Information> RaftInstance: creating a snapshot for index 1
2024.03.22 03:46:46.522556 [ 1 ] {} <Information> RaftInstance: create snapshot idx 1 log_term 1
2024.03.22 03:46:46.522573 [ 1 ] {} <Debug> KeeperStateMachine: Creating snapshot 1
2024.03.22 03:46:46.522594 [ 1 ] {} <Information> KeeperStateMachine: Creating a snapshot during shutdown because 'create_snapshot_on_exit' is enabled.
2024.03.22 03:46:46.537952 [ 1 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 1 with path snapshot_1.bin.zstd
2024.03.22 03:46:46.537980 [ 1 ] {} <Trace> KeeperStateMachine: Clearing garbage after snapshot
2024.03.22 03:46:46.538000 [ 1 ] {} <Trace> KeeperStateMachine: Cleared garbage after snapshot
2024.03.22 03:46:46.538023 [ 1 ] {} <Information> RaftInstance: snapshot idx 1 log_term 1 created, compact the log store if needed
2024.03.22 03:46:46.538041 [ 1 ] {} <Information> RaftInstance: create snapshot idx 1 log_term 1 done: 15468 us elapsed
2024.03.22 03:46:46.538120 [ 42 ] {} <Error> RaftInstance: failed to accept a rpc connection due to error 125, Operation canceled
2024.03.22 03:46:46.538154 [ 41 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 30
2024.03.22 03:46:46.538166 [ 45 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 28
2024.03.22 03:46:46.538237 [ 52 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 21
2024.03.22 03:46:46.538235 [ 44 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 30
2024.03.22 03:46:46.538295 [ 57 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 16
2024.03.22 03:46:46.538321 [ 60 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 13
2024.03.22 03:46:46.538346 [ 62 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 11
2024.03.22 03:46:46.538374 [ 38 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 1
2024.03.22 03:46:46.538337 [ 47 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 26
2024.03.22 03:46:46.538339 [ 59 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 14
2024.03.22 03:46:46.538402 [ 64 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 9
2024.03.22 03:46:46.538427 [ 56 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 17
2024.03.22 03:46:46.538438 [ 68 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 5
2024.03.22 03:46:46.538454 [ 39 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 3
2024.03.22 03:46:46.538449 [ 69 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 4
2024.03.22 03:46:46.538444 [ 40 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 2
2024.03.22 03:46:46.538471 [ 65 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 8
2024.03.22 03:46:46.538527 [ 48 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 25
2024.03.22 03:46:46.538591 [ 49 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 24
2024.03.22 03:46:46.538643 [ 50 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 23
2024.03.22 03:46:46.538689 [ 54 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 19
2024.03.22 03:46:46.538694 [ 53 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 20
2024.03.22 03:46:46.538701 [ 51 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 22
2024.03.22 03:46:46.538731 [ 46 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 27
2024.03.22 03:46:46.538720 [ 55 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 18
2024.03.22 03:46:46.538729 [ 58 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 15
2024.03.22 03:46:46.538802 [ 63 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 10
2024.03.22 03:46:46.538805 [ 66 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 7
2024.03.22 03:46:46.538815 [ 43 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 29
2024.03.22 03:46:46.538805 [ 61 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 12
2024.03.22 03:46:46.538817 [ 67 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 6
2024.03.22 03:46:46.548273 [ 42 ] {} <Information> RaftInstance: end of asio worker thread, remaining threads: 0
2024.03.22 03:46:46.548571 [ 1 ] {} <Debug> KeeperLogStore: Shutting down Changelog
2024.03.22 03:46:46.548622 [ 37 ] {} <Information> KeeperLogStore: Raft server is not set in LogStore.
2024.03.22 03:46:46.548726 [ 1 ] {} <Debug> KeeperSnapshotManagerS3: Shutting down KeeperSnapshotManagerS3
2024.03.22 03:46:46.548787 [ 1 ] {} <Information> KeeperSnapshotManagerS3: KeeperSnapshotManagerS3 shut down
2024.03.22 03:46:46.548826 [ 1 ] {} <Debug> KeeperDispatcher: Dispatcher shut down
2024.03.22 03:46:46.548873 [ 1 ] {} <Information> KeeperLogStore: Changelog is shut down
2024.03.22 03:46:46.548893 [ 1 ] {} <Debug> KeeperLogStore: Shutting down Changelog
2024.03.22 03:46:46.549108 [ 1 ] {} <Information> Application: Waiting for background threads
2024.03.22 03:46:46.549531 [ 1 ] {} <Information> Application: Background threads finished in 0 ms
2024.03.22 03:46:46.549845 [ 1 ] {} <Error> Application: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in create_directories: Permission denied ["/var/lib/clickhouse-keeper"], Stack trace (when copying this message, always include the lines below):

0. std::system_error::system_error(std::error_code, String const&) @ 0x000000000167daf5
1. std::__fs::filesystem::filesystem_error::filesystem_error[abi:v15000](String const&, std::__fs::filesystem::path const&, std::error_code) @ 0x00000000009dbfc1
2. void std::__fs::filesystem::__throw_filesystem_error[abi:v15000]<String&, std::__fs::filesystem::path const&, std::error_code const&>(String&, std::__fs::filesystem::path const&, std::error_code const&) @ 0x0000000001645bd7
3. std::__fs::filesystem::detail::(anonymous namespace)::ErrorHandler<bool>::report(std::error_code const&) const (.llvm.5385572262389653489) @ 0x0000000001649156
4. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x0000000001649953
5. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x00000000016499f5
6. DB::ConfigProcessor::savePreprocessedConfig(DB::ConfigProcessor::LoadedConfig&, String) @ 0x0000000000e25c7e
7. DB::ConfigReloader::reloadIfNewer(bool, bool, bool, bool) @ 0x0000000000e39ef6
8. DB::ConfigReloader::ConfigReloader(std::basic_string_view<char, std::char_traits<char>>, std::vector<String, std::allocator<String>> const&, String const&, zkutil::ZooKeeperNodeCache&&, std::shared_ptr<Poco::Event> const&, std::function<void (Poco::AutoPtr<Poco::Util::AbstractConfiguration>, bool)>&&, bool) @ 0x0000000000e38ccd
9. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000000bba745
10. Poco::Util::Application::run() @ 0x0000000001040046
11. DB::Keeper::run() @ 0x0000000000bb60fd
12. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000001049219
13. mainEntryClickHouseKeeper(int, char**) @ 0x0000000000bb50b8
14. main @ 0x0000000000bc5e9d

Cannot print extra info for Poco::Exception (version 24.2.2.71 (official build))
2024.03.22 03:46:46.549957 [ 1 ] {} <Information> Application: shutting down
2024.03.22 03:46:46.549979 [ 1 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2024.03.22 03:46:46.550042 [ 26 ] {} <Trace> BaseDaemon: Received signal -2
2024.03.22 03:46:46.550091 [ 26 ] {} <Information> BaseDaemon: Stop SignalListener thread

Meanwhile if I put /var/lib/clickhouse-keeper on both sides, ie editing entrypoint.sh:41 to use -keeper:

fast crash
Processing configuration file '/etc/clickhouse-keeper/keeper_config.xml'.
Logging trace to /var/log/clickhouse-keeper/clickhouse-keeper.log
Logging errors to /var/log/clickhouse-keeper/clickhouse-keeper.err.log
2024.03.22 03:47:48.722002 [ 1 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB
2024.03.22 03:47:48.724713 [ 1 ] {} <Information> Application: Starting ClickHouse Keeper 24.2.2.71 (revision: 54483, git hash: 9293d361e72be9f6ccfd444d504e2137b2e837cf, build id: 015C412FA457DCCDD6802012AD9AA854ED82096C), PID 1
2024.03.22 03:47:48.724739 [ 1 ] {} <Information> Application: starting up
2024.03.22 03:47:48.724751 [ 1 ] {} <Information> Application: OS Name = Linux, OS Version = 6.7.9-200.fc39.x86_64, OS Architecture = x86_64
2024.03.22 03:47:48.724846 [ 1 ] {} <Information> Application: keeper_server.max_memory_usage_soft_limit is set to 56.44 GiB
2024.03.22 03:47:48.725082 [ 1 ] {} <Error> Application: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in create_directories: Permission denied ["/var/lib/clickhouse"], Stack trace (when copying this message, always include the lines below):

0. std::system_error::system_error(std::error_code, String const&) @ 0x000000000167daf5
1. std::__fs::filesystem::filesystem_error::filesystem_error[abi:v15000](String const&, std::__fs::filesystem::path const&, std::error_code) @ 0x00000000009dbfc1
2. void std::__fs::filesystem::__throw_filesystem_error[abi:v15000]<String&, std::__fs::filesystem::path const&, std::error_code const&>(String&, std::__fs::filesystem::path const&, std::error_code const&) @ 0x0000000001645bd7
3. std::__fs::filesystem::detail::(anonymous namespace)::ErrorHandler<bool>::report(std::error_code const&) const (.llvm.5385572262389653489) @ 0x0000000001649156
4. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x0000000001649953
5. std::__fs::filesystem::__create_directories(std::__fs::filesystem::path const&, std::error_code*) @ 0x00000000016499f5
6. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000000bb7e4c
7. Poco::Util::Application::run() @ 0x0000000001040046
8. DB::Keeper::run() @ 0x0000000000bb60fd
9. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000001049219
10. mainEntryClickHouseKeeper(int, char**) @ 0x0000000000bb50b8
11. main @ 0x0000000000bc5e9d

Cannot print extra info for Poco::Exception (version 24.2.2.71 (official build))
2024.03.22 03:47:48.725124 [ 1 ] {} <Information> Application: shutting down
2024.03.22 03:47:48.725163 [ 1 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2024.03.22 03:47:48.725252 [ 26 ] {} <Trace> BaseDaemon: Received signal -2
2024.03.22 03:47:48.725297 [ 26 ] {} <Information> BaseDaemon: Stop SignalListener thread

The problem is that there are references to /var/lib/clickhouse inside the default keeper config, for coordination logs/snapshots, which I think is what prevents us from only using /var/lib/clickhouse-keeper here.

And if we change that, we are getting into serious backwards-incompatibility territory, so I'd do that change separately, personally (and not backport it).

docker/keeper/Dockerfile Outdated Show resolved Hide resolved
docker/keeper/Dockerfile Show resolved Hide resolved
docker/keeper/Dockerfile Show resolved Hide resolved
docker/keeper/Dockerfile Outdated Show resolved Hide resolved
@Tristan971 Tristan971 force-pushed the 54767-remove-extraneous-keeper-docker-volumes branch from 66975d3 to 788fe1a Compare March 24, 2024 05:03
@Felixoid Felixoid added the can be tested Allows running workflows for external contributors label Mar 24, 2024
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-build Pull request with build/testing/packaging improvement label Mar 24, 2024
@robot-ch-test-poll1
Copy link
Contributor

robot-ch-test-poll1 commented Mar 24, 2024

This is an automated comment for commit 38fb8b3 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Mergeable CheckChecks if all other necessary checks are successful❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@Tristan971 Tristan971 force-pushed the 54767-remove-extraneous-keeper-docker-volumes branch from 6a5a97d to 38fb8b3 Compare March 24, 2024 11:20
@clickhouse-ci clickhouse-ci bot added the manual approve Manual approve required to run CI label Mar 24, 2024
@Tristan971
Copy link
Contributor Author

(rebased, since I presume that's what failed these 2 tests)

@Felixoid
Copy link
Member

It's good from my PoV! Let's wait for another opinion from the team.

@Felixoid Felixoid merged commit fc96a57 into ClickHouse:master Mar 27, 2024
133 of 136 checks passed
@Felixoid
Copy link
Member

Thanks for raising it up and fixing it!

robot-ch-test-poll3 added a commit that referenced this pull request Mar 27, 2024
…dc912178f06032423f04f104189d1

Cherry pick #61683 to 24.1: #54767 Remove extraneous volumes in Keeper image
robot-ch-test-poll3 added a commit that referenced this pull request Mar 27, 2024
…dc912178f06032423f04f104189d1

Cherry pick #61683 to 24.2: #54767 Remove extraneous volumes in Keeper image
robot-ch-test-poll3 added a commit that referenced this pull request Mar 27, 2024
…dc912178f06032423f04f104189d1

Cherry pick #61683 to 24.3: #54767 Remove extraneous volumes in Keeper image
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 27, 2024
robot-ch-test-poll2 added a commit that referenced this pull request Mar 27, 2024
Backport #61683 to 24.1: #54767 Remove extraneous volumes in Keeper image
robot-ch-test-poll3 added a commit that referenced this pull request Mar 27, 2024
Backport #61683 to 24.2: #54767 Remove extraneous volumes in Keeper image
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 27, 2024
Felixoid added a commit that referenced this pull request Apr 10, 2024
Backport #61683 to 24.3: #54767 Remove extraneous volumes in Keeper image
Felixoid added a commit that referenced this pull request Apr 10, 2024
Backport #61683 to 23.8: #54767 Remove extraneous volumes in Keeper image
@Tristan971 Tristan971 deleted the 54767-remove-extraneous-keeper-docker-volumes branch May 29, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo v23.8-must-backport v24.1-must-backport v24.2-must-backport v24.3-must-backport
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClickHouse Keeper docker image and VOLUME directive
6 participants