Skip to content

Conversation

@zhangwenchao-123
Copy link

  • Add the module name, JIRA# to PR/commit and description.
  • Add tests for the change.

The following two operator delete functions doesn't lookup in madlib
library. Because it's not added in the library script file.

void operator delete  (void *ptr, std::size_t sz) noexcept;
void operator delete[](void *ptr, std::size_t sz) noexcept;

The two functions are missing previously.
set(_PG_CONFIG_VERSION_MACRO "GP_VERSION")
set(_SEARCH_PATH_HINTS
"/usr/local/cloudberry-db-devel/bin"
"/usr/local/cloudberry-db/bin"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to add /usr/local/cbdb/bin?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we used this path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added path /usr/local/cloudberry/bin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the line just after this "$ENV{GPHOME}/bin" will help catch most scenarios. Users will be sourcing cloudberry-env.sh (Cloudberry 3+) or greenplum_path.sh (Cloudberry 2 ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea!

Comment on lines 27 to 29
if(_PG_CONFIG_HEADER_CONTENTS MATCHES "#define SERVERLESS 1")
message("-- Detected Hashdata Cloud (Cloudberry Serverless)")
set(CLOUDBERRY_SERVERLESS TRUE PARENT_SCOPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still no Cloudberry 3.0 yet. So can remove this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache MADlib should be able to build against both the REL_2_STABLE and main (3.0.0) branches. I believe it is better to keep support for Cloudberry 3.0. As main (3.0) has not released, maybe support for 3.0 can be labelled as experimental.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with ed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add the standard Apache license header to the new files, including FindCloudberry.cmake, and FindCloudberry_1.cmake and other files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@edespino edespino self-requested a review October 22, 2025 01:23
Copy link
Contributor

@edespino edespino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ What's Missing (Critical Issues)

1. No main CMakeLists.txt for Cloudberry:
   - src/ports/cloudberry/CMakeLists.txt doesn't exist
   - This file should mirror the structure of
     src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
   - Should define: port configuration, source files, SQL handling,
     build functions, and version management

2. Not integrated into the build system:
   - src/ports/CMakeLists.txt only contains:
       add_subdirectory(postgres)
       add_subdirectory(greenplum)
   - Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
   - Greenplum has GreenplumUtils.cmake with utility functions
   - May need similar utilities for Cloudberry-specific features

🔍 Current State

CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories

However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure

📊 Summary

The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.

To complete the implementation, you would need:

1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection

@edespino
Copy link
Contributor

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

@edespino
Copy link
Contributor

As @tuhaihe mentioned about ASF headers, when I ran the Apache Release Audit Tool (RAT), the following is seen (run the following in the root of the MADlib source: mvn apache-rat:check):

❯ head -30 target/rat.txt

*****************************************************
Summary
-------
Generated at: 2025-10-21T18:46:08-07:00
Notes: 4
Binaries: 5
Archives: 0
Standards: 311

Apache Licensed: 307
Generated Documents: 0

JavaDocs are generated and so license header is optional
Generated files do not required license headers

4 Unknown Licenses

*******************************

Unapproved licenses:

  src/ports/cloudberry/cmake/FindCloudberry.cmake
  src/ports/cloudberry/cmake/FindCloudberry_1.cmake
  src/ports/cloudberry/cmake/FindCloudberry_2.cmake
  src/ports/cloudberry/cmake/FindCloudberry_3.cmake

*******************************

@zhangwenchao-123
Copy link
Author

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

@tuhaihe
Copy link
Member

tuhaihe commented Oct 22, 2025

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

Yes, we should update the related description on the website. I’d like to help with this.

@zhangwenchao-123
Copy link
Author

Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.

No, have not. Should we update this website?

Yes, we should update the related description on the website. I’d like to help with this.

Nice!

@zhangwenchao-123
Copy link
Author

❌ What's Missing (Critical Issues)

1. No main CMakeLists.txt for Cloudberry:
   - src/ports/cloudberry/CMakeLists.txt doesn't exist
   - This file should mirror the structure of
     src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
   - Should define: port configuration, source files, SQL handling,
     build functions, and version management

2. Not integrated into the build system:
   - src/ports/CMakeLists.txt only contains:
       add_subdirectory(postgres)
       add_subdirectory(greenplum)
   - Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
   - Greenplum has GreenplumUtils.cmake with utility functions
   - May need similar utilities for Cloudberry-specific features

🔍 Current State

CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories

However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure

📊 Summary

The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.

To complete the implementation, you would need:

1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection

Have fixed all mentioned problems and license lose

@edespino
Copy link
Contributor

PR Review: Cloudberry MADlib Build Issues

CMake Configuration Command

cmake \
    -DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \
    ..

CMake Configuration Error

Error:
CMake Error at src/CMakeLists.txt:202 (add_library):
  Cannot find source file:

    /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hpp

Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
.hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

CMake Error at src/CMakeLists.txt:202 (add_library):
No SOURCES given to target: madlib_cloudberry_3

CMake Generate step failed. Build files cannot be regenerated correctly.

Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61

Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum
directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:

  • Compatibility.hpp
  • dbconnector.hpp

Additional Build Errors (After Manual Directory Creation)

After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in
Compatibility.hpp:

  1. AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
  2. WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
  3. Missing function: format_procedure not declared
  4. Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist

These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.

@edespino
Copy link
Contributor

@zhangwenchao-123 - Unless absolutely necessary, there is no need to force push additional PR commits. This will allow us to view the PR history easily.

@zhangwenchao-123 zhangwenchao-123 force-pushed the support_cloudberry branch 2 times, most recently from 00de02c to 1aad3dd Compare October 22, 2025 06:57
Fix SEGFAULT memory bugs

There're weird SEGFAULT bug due to custom allocation erroneously paired with std::free (should be custom free) and we're unable to solve them. This is a workaround.
@zhangwenchao-123
Copy link
Author

zhangwenchao-123 commented Oct 23, 2025

PR Review: Cloudberry MADlib Build Issues

CMake Configuration Command

cmake \
    -DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \
    ..

CMake Configuration Error

Error:
CMake Error at src/CMakeLists.txt:202 (add_library):
  Cannot find source file:

    /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hpp

Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

CMake Error at src/CMakeLists.txt:202 (add_library): No SOURCES given to target: madlib_cloudberry_3

CMake Generate step failed. Build files cannot be regenerated correctly.

Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61

Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:

  • Compatibility.hpp
  • dbconnector.hpp

Additional Build Errors (After Manual Directory Creation)

After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in Compatibility.hpp:

  1. AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
  2. WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
  3. Missing function: format_procedure not declared
  4. Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist

These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.

Yeah, there are some other commits not picked, I will continue to complete this PR and test it.

Copy link
Contributor

@edespino edespino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few changes to consider. I have more testing of this PR to perform.

Please do not force push changes to this PR. I want to be able to follow the history of this work. Force pushing is not helping.

name: Greenplum DB

cloudberry:
name: Cloudberry DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudberry DB should be Apache Cloudberry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)
set(${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS
"${${PKG_NAME}_ADDITIONAL_INCLUDE_DIRS}/internal")
message("-- Detected Cloudberry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message("-- Detected Cloudberry") should be message("-- Detected Apache Cloudberry")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

# only need the first two digits for <= 4.3.4
dbver = '.'.join(map(str, dbver_split[:2]))
elif portid == 'cloudberry':
# Assume Cloudberry will stick to semantic versioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume Cloudberry will stick to semantic versioning should be Assume Apache Cloudberry will stick to semantic versioning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this symlink needed? I believe we should only be providing support for the Apache Cloudberry 2 & 3 (future) releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# 4.3.5+ from versions < 4.3.5
match = re.search("Greenplum[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr)
elif portid == 'cloudberry':
match = re.search("Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)", versionStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" should be "Apache Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" ?

I am not entirely sure about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudberry is enough to achieve our goal, while Apache Cloudberry is more accurate that maybe is better.

requirements.txt Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this empty file needed?

I noticed this when I ran the Apache Release Audit tool (mvn apache-rat:check).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In apache cloudberry, it's not needed. I will remove it.

# implying we only need 1 folder for same major versions
set(VERSION ${${PORT_UC}_VERSION_MAJOR})
elseif(${PORT_UC} STREQUAL "CLOUDBERRY")
# Assumes CBDB always follows semantic versioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Assumes CBDB always follows semantic versioning
# Assumes Apache Cloudberry always follows semantic versioning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fiexd

libdir = libdir.decode()

libdir = libdir.strip()+'/postgresql'
libdir = str(libdir.strip(), encoding='utf-8')+'/postgresql'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing Note: Encountering TypeError: decoding str is not supported when installing on PostgreSQL 14.19.

Root Cause:
For Postgres 13+ (line 1347), libdir is already decoded to a string via .decode(), but line 1349 attempts to decode it again with
str(libdir.strip(), encoding='utf-8'), which fails because you cannot decode a string that's already been decoded.

Recommended Solution:
Ensure libdir is always decoded to a string before line 1349, then simply strip and append the path:

libdir = subprocess.check_output(['pg_config','--libdir'])
if ((portid == 'greenplum' and is_rev_gte(dbver_split, get_rev_num('7.0'))) or
    (portid == 'postgres' and is_rev_gte(dbver_split, get_rev_num('13.0')))):
    libdir = libdir.decode()
else:
    libdir = libdir.decode('utf-8')

libdir = libdir.strip() + '/postgresql'

This ensures libdir is consistently a string for all code paths (older and newer versions), eliminating the type inconsistency that causes the
error.

Request for Review: Please validate this fix works correctly for both:
- Older versions (Postgres <13, Greenplum <7) where subprocess.check_output() returns bytes
- Newer versions (Postgres 13+, Greenplum 7+) where explicit decoding is needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@zhangwenchao-123
Copy link
Author

All mentioned comments have been fixed and have tested it in cloudberry 3.0.

@tuhaihe
Copy link
Member

tuhaihe commented Oct 29, 2025

Hi @zhangwenchao-123 could you rebase your commits on the latest madlib2-master? Let's see if the CI can pass successfully. Thanks!

@zhangwenchao-123
Copy link
Author

Hi @zhangwenchao-123 could you rebase your commits on the latest madlib2-master? Let's see if the CI can pass successfully. Thanks!

It's the NOTICE file check failed, I have fixed and test whether it can pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed these two files, FindCloudberry_2.cmake & FindCloudberry_3.cmake, are all symbolic links to FindCloudberry.cmake. Should we create them like GP / PG as ASCII text files? FYI.

@tuhaihe
Copy link
Member

tuhaihe commented Oct 31, 2025

Based on the new codebase, I can build and deploy the MADlib into the Cloudberry 2.0 + 3.0 (main) gpdemo database:

  1. Build the Cloudberry gpdemo env following the docs

  2. Build and deploy the MADlib

## Download this PR change
git clone https://github.com/apache/madlib.git
cd madlib
git fetch origin pull/627/head:zhangwenchao-123/support_cloudberry
git switch zhangwenchao-123/support_cloudberry


## Set Python env
sudo alternatives --install /usr/bin/python python /usr/bin/python3 1

## Install required depencies to the Cloudberry Dev container
sudo dnf install boost-devel -y
sudo dnf install -y graphviz # for docs
sudo dnf install --enablerepo=crb doxygen -y # for docs

## 
cd ~/madlib
mkdir build ; cd build

## for Cloudberry 3.0
cmake \
    -DCLOUDBERRY_3_PG_CONFIG=$GPHOME/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_3_EXECUTABLE=$GPHOME/bin/postgres \
    ..

## for Cloudberry 2.0
cmake \
    -DCLOUDBERRY_2_PG_CONFIG=$GPHOME/bin/pg_config \
    -DCMAKE_C_COMPILER=gcc \
    -DCMAKE_CXX_COMPILER=g++ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
    -DCLOUDBERRY_2_EXECUTABLE=$GPHOME/bin/postgres \
    ..

## Make, deploy, and run test
make -j$(nproc)
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install-check

If something wrong, please help correct me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants