Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minified version with no table engines #53

Open
danthegoodman1 opened this issue Jul 3, 2023 · 8 comments
Open

Minified version with no table engines #53

danthegoodman1 opened this issue Jul 3, 2023 · 8 comments
Labels

Comments

@danthegoodman1
Copy link

A lot of binary size comes from the tables engines that may not be relevant for in-process use cases like the merge tree engines, log engines, etc.

Would be great to either have an easy way to compile with engines omitted, or a build that is effectively engine-free (except for some basics like url, s3, file) for a far smaller build. The expectation is that custom engines would be made on top of the url/s3 engines in #52 as sorts of aliases

@auxten
Copy link
Member

auxten commented Jul 4, 2023

I did some analysis with bloaty, the AggravateFunctions take up a lot of space.
We can work with clickhouse team to add more compile flags to enable stripping some of them.

compileunits vmsize filesize
[section .debug_loc] 0 208584312
[section .text] 151282685 151282685
[section .strtab] 0 59946963
[section .debug_ranges] 0 56290240
[section .rodata] 48926306 48926306
./src/AggregateFunctions/AggregateFunctionSumMap.cpp 7787534 48880276
./src/AggregateFunctions/AggregateFunctionMin.cpp 6657137 45979726
./src/AggregateFunctions/AggregateFunctionMax.cpp 6640377 45884458
[section .dynstr] 38266753 38266753
./src/Interpreters/HashJoin.cpp 3760601 35714954
./src/Interpreters/Aggregator.cpp 3730680 34616722
./src/AggregateFunctions/AggregateFunctionUniqCombined.cpp 5393994 32101843
./src/AggregateFunctions/AggregateFunctionAvgWeighted.cpp 5165493 31640382
./src/AggregateFunctions/AggregateFunctionStatisticsSimple.cpp 6239088 30171734
./src/AggregateFunctions/AggregateFunctionUniq.cpp 3541725 28680035
./src/AggregateFunctions/AggregateFunctionStatistics.cpp 5169543 26427110
./src/Interpreters/ActionsDAG.cpp 219076 23502928
./src/Storages/MergeTree/KeyCondition.cpp 126008 19541606
./src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.cpp 2681548 19281574
./src/Planner/Planner.cpp 59051 16236022
./src/Planner/PlannerJoins.cpp 40060 16084904
./src/Planner/PlannerJoinTree.cpp 51178 16039921
./contrib/hive-metastore/ThriftHiveMetastore.cpp 2556915 15995512
[section .eh_frame] 15656064 15656064
./src/AggregateFunctions/AggregateFunctionSparkbar.cpp 1801900 15486531
./src/Interpreters/castColumn.cpp 27915 15357640
./src/Storages/StorageReplicatedMergeTree.cpp 1145437 13526552
./src/Columns/ColumnVector.cpp 1582603 13425874
./src/Core/Settings.cpp 1331377 11051219
./src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.cpp 1463071 10388820
./src/Formats/ProtobufSerializer.cpp 571179 9525325
[section .gcc_except_table] 9466872 9466872
./src/Storages/MergeTree/MergeTreeData.cpp 670858 9271290
[section .symtab] 0 8849736
./src/AggregateFunctions/AggregateFunctionQuantile.cpp 1094793 8697684
./contrib/hive-metastore/hive_metastore_types.cpp 1262919 8545635
./src/Core/SettingsEnums.cpp 307120 8250957
./src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp 939254 8015627
./contrib/NuRaft/src/asio_service.cxx 680502 7699920
./src/AggregateFunctions/AggregateFunctionAny.cpp 1000265 7280299
./src/Columns/ColumnDecimal.cpp 806743 7277378
./src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp 897465 7270971
./src/AggregateFunctions/AggregateFunctionGroupArrayMoving.cpp 1169946 7142681
./src/Interpreters/InterpreterSelectQuery.cpp 322195 7088294
./src/Interpreters/Context.cpp 400123 6891480
./contrib/libunwind/src/libunwind.cpp 42849 6649674
./src/AggregateFunctions/AggregateFunctionSum.cpp 935631 6490650
./src/AggregateFunctions/AggregateFunctionSequenceMatch.cpp 464328 6480659
./src/AggregateFunctions/AggregateFunctionTopK.cpp 583102 6290272

Full table: clickhouse.sizeinfo.csv

@lmangani
Copy link
Contributor

lmangani commented Jul 4, 2023

@alexey-milovidov any input or comments on this intent and related opportunities from the ClickHouse team?

@alexey-milovidov
Copy link

We can easily disable particular storages during the build - first try to do it manually and check the difference in the binary size. Then we can introduce some flags for it.

Disabling functions is as easy as removing a .cpp file.

Also, we can get benefits from the removal of the dynamic symbol table: ClickHouse/ClickHouse#47475

@zhanglistar
Copy link

Simply strip can reduce about 100MB file size.

@lmangani
Copy link
Contributor

@zhanglistar true, but we already strip libchdb.so which brings it to about ~380MB uncompressed and ~100MB compressed.

@zhanglistar
Copy link

Also, disabling some third party libraries can reduce size, like hive or mysql etc.

@auxten
Copy link
Member

auxten commented Jul 13, 2023

Also, disabling some third party libraries can reduce size, like hive or mysql etc.

Thanks, I have disabled unnecessary libs for chdb. Hive and MySQL might be useful for chDB users.

@auxten
Copy link
Member

auxten commented Mar 11, 2024

@nmreadelf is working on that

@auxten auxten added the Minify label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants