Skip to content

Conversation

saurabhojha
Copy link
Contributor

@saurabhojha saurabhojha commented Mar 23, 2025

Resolves #3

This pull request introduces benchmarking scripts tailored for FerretDB.

Key Changes

  1. Benchmarking Scripts for FerretDB:

    • Added new scripts designed to benchmark FerretDB's performance under various conditions. These scripts aim to evaluate query execution time, index usage, and data retrieval efficiency.
  2. PostgreSQL Backend Considerations:

    • FerretDB uses PostgreSQL as its backend with the DocumentDB extension for MongoDB compatibility. Consequently, certain MongoDB configurations — such as enabling covered index scans — are not directly available.

    Example: Attempting to set internalQueryPlannerGenerateCoveredWholeIndexScans results in an error:

    Setting internalQueryPlannerGenerateCoveredWholeIndexScans to true...
    MongoServerError: no such command: 'setParameter'
  3. Query Planner Outputs:

    • Since FerretDB leverages PostgreSQL, its query planner output differs significantly from MongoDB's native format. As a result, the entire query plan output has been printed in the index_usage.sh script to ensure comprehensive visibility into execution details.

@saurabhojha saurabhojha mentioned this pull request Mar 23, 2025
@rschu1ze rschu1ze changed the title Add ferret DB benchmarking Add FerretDB Mar 23, 2025
@rschu1ze rschu1ze self-assigned this Mar 23, 2025
@rschu1ze rschu1ze force-pushed the saurabh/jsonbench/ferret-db-2 branch 2 times, most recently from 0305a08 to 473c82d Compare March 23, 2025 09:52
@rschu1ze rschu1ze force-pushed the saurabh/jsonbench/ferret-db-2 branch from 473c82d to 4b92649 Compare March 23, 2025 09:53

sudo snap install docker

sudo sudo apt-get install gnupg curl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do we install MongoDB here? Is this a leftover from the corresponding MongoDB benchmark scripts? I am asking because 1. the uninstall script doesn't remove MongoDB (which supports the theory it's a leftover) and 2. all other scripts in this PR call mongosh. I have the suspicion that we are really benchmarking MongoDB and not FerretDB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that this PR does not start the MongoDB daemon (unlike the MongoDB scripts).

Is there any way to check we are really running queries against FerretDB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FerretDB is supposed to be a drop-in replacement for MongoDB and since almost all of the scripts are 100% copypaste of the MongoDB scripts, it would be nice if you could delete them, and create symbolic symlinks to the MongoDB scripts instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason i installed mongodb was because the benchmark scripts uses mongosh and some more mongo specific imports which required them. I do not actually start a mongodb container i just installed it for using mongosh primarily to communicate with the ferretdb docker image. Good call i will add uninstallation in the uninstall.sh as well.

Is there any way to check we are really running queries against FerretDB?

Ferretdb docker container is mapped to mongo db's default port 27017.
One way to make sure that queries are not running on mongo db is through the output of the querry planner in index_usage logs. Its different from mongodb since ferretdb use postgresql and has cost models in the output.

While it is supposed to be a 100% drop in replacement, there are few configurations which are yet to be supported. Eg: enabling covering indexes on ferretdb always fails. I have enabled index_only scans on postgresql containers for this reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschu1ze i ran the benchmark again and here are logs from the postgres container.

PostgreSQL init process complete; ready for start up.

2025-03-23 14:46:29.397 UTC [1] LOG:  Initialized documentdb_core extension
2025-03-23 14:46:29.398 UTC [1] LOG:  Initialized pg_documentdb extension
2025-03-23 14:46:29.406 UTC [1] LOG:  starting PostgreSQL 17.4 (Debian 17.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-03-23 14:46:29.406 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-03-23 14:46:29.406 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-03-23 14:46:29.412 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-03-23 14:46:29.419 UTC [78] LOG:  database system was shut down at 2025-03-23 14:46:29 UTC
2025-03-23 14:46:29.426 UTC [1] LOG:  database system is ready to accept connections
2025-03-23 14:46:29.430 UTC [81] LOG:  pg_cron scheduler started
2025-03-23 14:46:30.220 UTC [84] LOG:  database bluesky_1m_snappy has collections: false
2025-03-23 14:46:30.220 UTC [84] CONTEXT:  SQL statement "SELECT documentdb_api.create_collection($1, $2)"
2025-03-23 14:46:30.220 UTC [84] STATEMENT:  SELECT p_result::bytea, p_success FROM documentdb_api.insert($1, $2::bytea, $3::bytea)
2025-03-23 14:46:30.232 UTC [84] LOG:  Creating and returning documentdb_data.documents_1 for the sentinel database bluesky_1m_snappy
2025-03-23 14:46:30.232 UTC [84] CONTEXT:  SQL statement "SELECT documentdb_api.create_collection($1, $2)"
2025-03-23 14:46:30.232 UTC [84] STATEMENT:  SELECT p_result::bytea, p_success FROM documentdb_api.insert($1, $2::bytea, $3::bytea)
2025-03-23 14:47:00.007 UTC [81] LOG:  cron job 1 starting: CALL documentdb_api_internal.delete_expired_rows();
2025-03-23 14:47:00.027 UTC [81] LOG:  cron job 1 COMMAND completed: CALL 
2025-03-23 14:47:23.015 UTC [76] LOG:  checkpoint starting: wal
2025-03-23 14:48:00.010 UTC [81] LOG:  cron job 1 starting: CALL documentdb_api_internal.delete_expired_rows();
2025-03-23 14:48:00.026 UTC [81] LOG:  cron job 1 COMMAND completed: CALL 
2025-03-23 14:48:12.845 UTC [195] LOG:  database bluesky_1m_zstd has collections: false
2025-03-23 14:48:12.845 UTC [195] STATEMENT:  SELECT create_collection FROM documentdb_api.create_collection($1, $2)
2025-03-23 14:48:12.852 UTC [195] LOG:  Creating and returning documentdb_data.documents_3 for the sentinel database bluesky_1m_zstd
2025-03-23 14:48:12.852 UTC [195] STATEMENT:  SELECT create_collection FROM documentdb_api.create_collection($1, $2)

Just to be sure mongod wasn't running during benchmarking:

systemctl status mongod
○ mongod.service - MongoDB Database Server
     Loaded: loaded (/usr/lib/systemd/system/mongod.service; disabled; preset: enabled)
     Active: inactive (dead)
       Docs: https://docs.mongodb.org/manual

@saurabhojha saurabhojha requested a review from rschu1ze March 23, 2025 15:02
@rschu1ze

This comment was marked as resolved.

@rschu1ze

This comment was marked as duplicate.

@saurabhojha

This comment was marked as resolved.

@saurabhojha saurabhojha requested a review from rschu1ze March 24, 2025 10:32
@saurabhojha

This comment was marked as resolved.

@rschu1ze rschu1ze force-pushed the saurabh/jsonbench/ferret-db-2 branch 2 times, most recently from fe58120 to c057854 Compare March 27, 2025 07:57
@rschu1ze rschu1ze force-pushed the saurabh/jsonbench/ferret-db-2 branch from c057854 to 6f06e70 Compare March 27, 2025 07:59
@rschu1ze rschu1ze force-pushed the saurabh/jsonbench/ferret-db-2 branch from 153249b to ce36cd2 Compare March 27, 2025 08:04
@rschu1ze
Copy link
Member

@saurabhojha Sorry that this took a bit longer. The benchmark ran for over two days, after which I stopped it. For that reason, I also omitted the 1000m results JSON file (this case didn't complete).

@rschu1ze rschu1ze merged commit ff7e52d into ClickHouse:main Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add FerretDB

2 participants