Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous loading of tables #49351

Merged
merged 180 commits into from Dec 4, 2023
Merged

Conversation

serxa
Copy link
Member

@serxa serxa commented May 1, 2023

Should be merged after #48923
Fixes #43424

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added server setting async_load_databases for asynchronous loading of databases and tables. Speeds up the server start time. Applies to databases with Ordinary, Atomic and Replicated engines. Their tables load metadata asynchronously. Query to a table increases the priority of the load job and waits for it to be done. Added table system.async_loader.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@serxa
Copy link
Member Author

serxa commented Nov 7, 2023

test_delayed_replica_failover — flaky. Other tests are green, finally!
Now let's trigger stress tests with async_load_databases = true to check if that works too

@serxa serxa marked this pull request as ready for review November 7, 2023 10:22
@alexey-milovidov
Copy link
Member

@serxa, something interesting around DB::DatabaseAtomic::assertDetachedTableNotInUse

@serxa
Copy link
Member Author

serxa commented Nov 23, 2023

something interesting around DB::DatabaseAtomic::assertDetachedTableNotInUse

I'm still trying to figure out what went wrong. I found a backtrace with variables in gdb.log of the stress test and according to table_id in one of the frames the table in question is test_ghlfanhj.dict2:

(comment: 01160_table_dependencies.sh) CREATE DICTIONARY dict2 (`n` int DEFAULT 0, `m` int DEFAULT 2) PRIMARY KEY n SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() USER 'default' TABLE 'join' PASSWORD '[HIDDEN]' DB 'test_ghlfanhj')) LIFETIME(MIN 1 MAX 10) LAYOUT(FLAT())

This test is very good at checking issues with dependencies. It had already found a deadlock (fixed) in this PR.

More logs:

2023.11.07 16:45:51.420939 [ 69191 ] {} <Debug> test_ghlfanhj.dict_src (99221ed3-cedb-4cd2-b4be-24c6e85c033f): Loading data parts
2023.11.07 16:45:51.428359 [ 69191 ] {} <Debug> test_ghlfanhj.dict_src (99221ed3-cedb-4cd2-b4be-24c6e85c033f): There are no data parts
2023.11.07 16:45:51.439908 [ 69191 ] {} <Trace> ExternalDictionariesLoader: Loading config file 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24'.
2023.11.07 16:45:51.469836 [ 69191 ] {} <Trace> ExternalDictionariesLoader: Will load the object 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24' in background, force = false, loading_id = 58
2023.11.07 16:45:51.794237 [ 69191 ] {} <Error> void DB::AsyncLoader::worker(Pool &): Code: 36. DB::Exception: external dictionary 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24' not found: While processing _CAST(dictGet('test_ghlfanhj.dict1', 'm', CAST('42', 'UInt64')) AS m_tmp_alter4017070811409268861, 'Int32') AS m: default expression and column type are incompatible.: Cannot attach table `test_ghlfanhj`.`join` from metadata file /var/lib/clickhouse/store/664/66466e39-2fc4-4c3d-8f28-fe2cc9c50e60/join.sql from query ATTACH TABLE test_ghlfanhj.join UUID 'f7c92ee9-072d-4cf3-8875-89b038924a74' (`n` Int32, `m` Int32 DEFAULT dictGet('test_ghlfanhj.dict1', 'm', CAST('42', 'UInt64'))) ENGINE = Join(any, left, n). (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

@serxa
Copy link
Member Author

serxa commented Nov 25, 2023

Looks like detach/attach database cycle in 01160_table_dependencies test lead to some problems and eventually to the crash:
https://pastila.nl/?00008e27/2952cd9a4a402f3e6643a26281147283#4oSjapDCJRZzxZbRtTPBLQ==

The detach was started before previous attach is finished:

1403001:2023.11.07 16:45:50.778259 [ 28960 ] {7fc89bc2-b9cf-4304-8d43-a307c4af185b} <Debug> executeQuery: (from [::1]:46152) (comment: 01160_table_dependencies.sh) detach database test_ghlfanhj; (stage: Complete)
1403690:2023.11.07 16:45:51.194902 [ 13604 ] {5a95dc61-ca0f-4bef-867f-fbfa09c5820c} <Debug> executeQuery: (from [::1]:46168) (comment: 01160_table_dependencies.sh) attach database test_ghlfanhj; (stage: Complete)
1404248:2023.11.07 16:45:51.420939 [ 69191 ] {} <Debug> test_ghlfanhj.dict_src (99221ed3-cedb-4cd2-b4be-24c6e85c033f): Loading data parts
1404254:2023.11.07 16:45:51.428359 [ 69191 ] {} <Debug> test_ghlfanhj.dict_src (99221ed3-cedb-4cd2-b4be-24c6e85c033f): There are no data parts
1404265:2023.11.07 16:45:51.439908 [ 69191 ] {} <Trace> ExternalDictionariesLoader: Loading config file 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24'.
1404273:2023.11.07 16:45:51.446823 [ 28928 ] {6f9b310e-f0a3-4170-8541-b94ded3fc65b} <Debug> executeQuery: (from [::1]:46206) (comment: 01160_table_dependencies.sh) detach database test_ghlfanhj; (stage: Complete)
1404291:2023.11.07 16:45:51.469836 [ 69191 ] {} <Trace> ExternalDictionariesLoader: Will load the object 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24' in background, force = false, loading_id = 58
1404676:2023.11.07 16:45:51.794237 [ 69191 ] {} <Error> void DB::AsyncLoader::worker(Pool &): Code: 36. DB::Exception: external dictionary 'c9b83bc7-f6c2-435d-ba6f-e00986f24d24' not found: While processing _CAST(dictGet('test_ghlfanhj.dict1', 'm', CAST('42', 'UInt64')) AS m_tmp_alter4017070811409268861, 'Int32') AS m: default expression and column type are incompatible.: Cannot attach table `test_ghlfanhj`.`join` from metadata file /var/lib/clickhouse/store/664/66466e39-2fc4-4c3d-8f28-fe2cc9c50e60/join.sql from query ATTACH TABLE test_ghlfanhj.join UUID 'f7c92ee9-072d-4cf3-8875-89b038924a74' (`n` Int32, `m` Int32 DEFAULT dictGet('test_ghlfanhj.dict1', 'm', CAST('42', 'UInt64'))) ENGINE = Join(any, left, n). (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

@serxa
Copy link
Member Author

serxa commented Nov 26, 2023

After fix materialized mysql database became broken:
https://pastila.nl/?0064b1cc/eb259a2b9f6f477c667dae73b144a135#3gTFbPYFyCp0IkNYFY60hQ==

@alexey-milovidov
Copy link
Member

Just in case, MaterializedMySQL is a second-class feature - if needed, you can make a shortcut so that it is loaded non-lazily.

@serxa
Copy link
Member Author

serxa commented Nov 26, 2023

you can make a shortcut so that it is loaded non-lazily.

MaterializedMySQL is inherited from AtomicDatabase, which is now reworked to be async only. So it might be even harder than figure out what when wrong... Checking...

@serxa
Copy link
Member Author

serxa commented Nov 26, 2023

The problem is here:

    clickhouse_node.query(f"ALTER NAMED COLLECTION {db} SET port=9999")
    clickhouse_node.query_with_retry(f"DETACH DATABASE {db} SYNC")
    mysql_node.query(f"INSERT INTO {db}.t1 VALUES (3, 'c', 3)")
    assert "ConnectionFailed:" in clickhouse_node.query_and_get_error(
        f"ATTACH DATABASE {db}"
    )
    clickhouse_node.query(f"ALTER NAMED COLLECTION {db} SET port=3306")
    clickhouse_node.query(f"ATTACH DATABASE {db}")

ATTACH DATABASE {db} throws an exception from another place (later) because the database load is async. This leads to some trash left in DatabaseCatalog. Fixing...

UPD. It turned out to be a deeper problem. I added waitDatabaseStarted() in all IDatabase::shutdown() implementations, but it rethrows any error that occurred during load/startup. According to DatabaseCatalog logic shutdown is called if attach failed. So I fixed this by not throwing during shutdown.

@serxa serxa requested a review from tavplubix November 28, 2023 18:23
@serxa
Copy link
Member Author

serxa commented Nov 30, 2023

CI is green now.
01104_distributed_numbers_test - Flaky
test_postgres_dictionaries_custom_query_full_load - Leak in PQmakeEmptyPGresult - unrelated

@alexey-milovidov alexey-milovidov merged commit 02439ee into master Dec 4, 2023
345 of 347 checks passed
@alexey-milovidov alexey-milovidov deleted the async-loader-integration branch December 4, 2023 16:16
@danthegoodman1
Copy link

danthegoodman1 commented Dec 4, 2023

This doesn’t actually speed up start to query time for those tables though right? Just for getting a sql session/health checks?

@serxa
Copy link
Member Author

serxa commented Dec 4, 2023

This doesn’t actually speed up start to query time for those tables though right? Just for getting a sql session/health checks?

No. If the server has a lot of tables to load at startup, this feature will also improve the time for queries due to prioritization. Prioritization means loading of tables that are being waited for by queries is done first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature pr-status-❌ PR with some error/faliure statuses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazy loading of primary key and/or data parts.
10 participants