-
Notifications
You must be signed in to change notification settings - Fork 153
Support cloudberry #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: madlib2-master
Are you sure you want to change the base?
Support cloudberry #627
Conversation
zhangwenchao-123
commented
Oct 21, 2025
- Add the module name, JIRA# to PR/commit and description.
- Add tests for the change.
The following two operator delete functions doesn't lookup in madlib library. Because it's not added in the library script file. void operator delete (void *ptr, std::size_t sz) noexcept; void operator delete[](void *ptr, std::size_t sz) noexcept; The two functions are missing previously.
| set(_PG_CONFIG_VERSION_MACRO "GP_VERSION") | ||
| set(_SEARCH_PATH_HINTS | ||
| "/usr/local/cloudberry-db-devel/bin" | ||
| "/usr/local/cloudberry-db/bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to add /usr/local/cbdb/bin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we used this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added path /usr/local/cloudberry/bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the line just after this "$ENV{GPHOME}/bin" will help catch most scenarios. Users will be sourcing cloudberry-env.sh (Cloudberry 3+) or greenplum_path.sh (Cloudberry 2 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea!
| if(_PG_CONFIG_HEADER_CONTENTS MATCHES "#define SERVERLESS 1") | ||
| message("-- Detected Hashdata Cloud (Cloudberry Serverless)") | ||
| set(CLOUDBERRY_SERVERLESS TRUE PARENT_SCOPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still no Cloudberry 3.0 yet. So can remove this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache MADlib should be able to build against both the REL_2_STABLE and main (3.0.0) branches. I believe it is better to keep support for Cloudberry 3.0. As main (3.0) has not released, maybe support for 3.0 can be labelled as experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with ed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add the standard Apache license header to the new files, including FindCloudberry.cmake, and FindCloudberry_1.cmake and other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
edespino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ What's Missing (Critical Issues)
1. No main CMakeLists.txt for Cloudberry:
- src/ports/cloudberry/CMakeLists.txt doesn't exist
- This file should mirror the structure of
src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
- Should define: port configuration, source files, SQL handling,
build functions, and version management
2. Not integrated into the build system:
- src/ports/CMakeLists.txt only contains:
add_subdirectory(postgres)
add_subdirectory(greenplum)
- Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
- Greenplum has GreenplumUtils.cmake with utility functions
- May need similar utilities for Cloudberry-specific features
🔍 Current State
CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories
However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure
📊 Summary
The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.
To complete the implementation, you would need:
1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection
|
Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well. |
|
As @tuhaihe mentioned about ASF headers, when I ran the Apache Release Audit Tool (RAT), the following is seen (run the following in the root of the MADlib source: mvn apache-rat:check): |
No, have not. Should we update this website? |
Yes, we should update the related description on the website. I’d like to help with this. |
Nice! |
00d24da to
e6d3c42
Compare
Have fixed all mentioned problems and license lose |
|
PR Review: Cloudberry MADlib Build Issues CMake Configuration Command CMake Configuration Error Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h CMake Error at src/CMakeLists.txt:202 (add_library): CMake Generate step failed. Build files cannot be regenerated correctly. Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61 Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum
Additional Build Errors (After Manual Directory Creation) After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in
These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base. |
|
@zhangwenchao-123 - Unless absolutely necessary, there is no need to force push additional PR commits. This will allow us to view the PR history easily. |
00de02c to
1aad3dd
Compare
Fix SEGFAULT memory bugs There're weird SEGFAULT bug due to custom allocation erroneously paired with std::free (should be custom free) and we're unable to solve them. This is a workaround.
6b73eaa to
c725ce5
Compare
Yeah, there are some other commits not picked, I will continue to complete this PR and test it. |
edespino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few changes to consider. I have more testing of this PR to perform.
Please do not force push changes to this PR. I want to be able to follow the history of this work. Force pushing is not helping.
src/config/Ports.yml
Outdated
| name: Greenplum DB | ||
|
|
||
| cloudberry: | ||
| name: Cloudberry DB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudberry DB should be Apache Cloudberry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| ) | ||
| set(${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS | ||
| "${${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS}/internal") | ||
| message("-- Detected Cloudberry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message("-- Detected Cloudberry") should be message("-- Detected Apache Cloudberry")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/madpack/madpack.py
Outdated
| # only need the first two digits for <= 4.3.4 | ||
| dbver = '.'.join(map(str, dbver_split[:2])) | ||
| elif portid == 'cloudberry': | ||
| # Assume Cloudberry will stick to semantic versioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume Cloudberry will stick to semantic versioning should be Assume Apache Cloudberry will stick to semantic versioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this symlink needed? I believe we should only be providing support for the Apache Cloudberry 2 & 3 (future) releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/madpack/utilities.py
Outdated
| # 4.3.5+ from versions < 4.3.5 | ||
| match = re.search("Greenplum[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr) | ||
| elif portid == 'cloudberry': | ||
| match = re.search("Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" should be "Apache Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" ?
I am not entirely sure about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudberry is enough to achieve our goal, while Apache Cloudberry is more accurate that maybe is better.
requirements.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this empty file needed?
I noticed this when I ran the Apache Release Audit tool (mvn apache-rat:check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In apache cloudberry, it's not needed. I will remove it.
| # implying we only need 1 folder for same major versions | ||
| set(VERSION ${${PORT_UC}_VERSION_MAJOR}) | ||
| elseif(${PORT_UC} STREQUAL "CLOUDBERRY") | ||
| # Assumes CBDB always follows semantic versioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Assumes CBDB always follows semantic versioning | |
| # Assumes Apache Cloudberry always follows semantic versioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fiexd
src/madpack/madpack.py
Outdated
| libdir = libdir.decode() | ||
|
|
||
| libdir = libdir.strip()+'/postgresql' | ||
| libdir = str(libdir.strip(), encoding='utf-8')+'/postgresql' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing Note: Encountering TypeError: decoding str is not supported when installing on PostgreSQL 14.19.
Root Cause:
For Postgres 13+ (line 1347), libdir is already decoded to a string via .decode(), but line 1349 attempts to decode it again with
str(libdir.strip(), encoding='utf-8'), which fails because you cannot decode a string that's already been decoded.
Recommended Solution:
Ensure libdir is always decoded to a string before line 1349, then simply strip and append the path:
libdir = subprocess.check_output(['pg_config','--libdir'])
if ((portid == 'greenplum' and is_rev_gte(dbver_split, get_rev_num('7.0'))) or
(portid == 'postgres' and is_rev_gte(dbver_split, get_rev_num('13.0')))):
libdir = libdir.decode()
else:
libdir = libdir.decode('utf-8')
libdir = libdir.strip() + '/postgresql'
This ensures libdir is consistently a string for all code paths (older and newer versions), eliminating the type inconsistency that causes the
error.
Request for Review: Please validate this fix works correctly for both:
- Older versions (Postgres <13, Greenplum <7) where subprocess.check_output() returns bytes
- Newer versions (Postgres 13+, Greenplum 7+) where explicit decoding is neededThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
|
All mentioned comments have been fixed and have tested it in cloudberry 3.0. |
|
Hi @zhangwenchao-123 could you rebase your commits on the latest |
It's the NOTICE file check failed, I have fixed and test whether it can pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed these two files, FindCloudberry_2.cmake & FindCloudberry_3.cmake, are all symbolic links to FindCloudberry.cmake. Should we create them like GP / PG as ASCII text files? FYI.
|
Based on the new codebase, I can build and deploy the MADlib into the Cloudberry 2.0 + 3.0 (main) gpdemo database:
If something wrong, please help correct me. Thanks! |