# HypergraphPercol Colab build

This notebook reproduces the multi-stage Docker build pipeline inside a Google Colab runtime so that the `HypergraphPercol` package and its CGAL helpers are available directly from a Colab session.

> **Execution order**
> Run the cells sequentially from top to bottom in a fresh Colab runtime. The build relies on system packages, so restarting midway may require rerunning the earlier cells.

## 1. Install system dependencies

The Dockerfile installs a series of Ubuntu packages that provide CGAL, Boost, Eigen and a modern build toolchain. We replicate the same setup here.

In [1]:
%%bash
# set -euo pipefail
# apt-get update
# DEBIAN_FRONTEND=noninteractive apt-get install -y     build-essential     cmake     git     libtbb-dev     libcgal-dev     libboost-all-dev     libeigen3-dev

# Colab: paquets système pour CGAL/TBB/CMake
apt-get update -qq
apt-get install -y -qq build-essential cmake libcgal-dev libtbb-dev libtbbmalloc2 \
    libgmp-dev libmpfr-dev libeigen3-dev

Selecting previously unselected package libgmpxx4ldbl:amd64.
(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading database ... 90%(Reading database ... 95%(Reading database ... 100%(Reading database ... 126281 files and directories currently installed.)
Preparing to unpack .../libgmpxx4ldbl_2%3a6.2.1+dfsg-3ubuntu1_amd64.deb ...
Unpacking libgmpxx4ldbl:amd64 (2:6.2.1+dfsg-3ubuntu1) ...
Selecting previously unselected package libgmp-dev:amd64.
Preparing to unpack .../libgmp-dev_2%3a6.2.1+dfsg-3ubuntu1_amd64.deb ...
Unpacking libgmp-dev:amd64 (2:6.2.1+dfsg-3ubuntu1) .

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)


## 2. Upgrade `pip` and core Python build tooling

The compiled extensions require an up-to-date Python build stack along with Cython and NumPy 2.1.3 (force reinstalled to avoid mixed ABI issues on Colab). We also pre-install `cmake` through `pip` to match the Docker workflow.

In [2]:
# %%bash
# set -euo pipefail
# python3 -m pip install --upgrade pip setuptools wheel "Cython>=3.0" cmake jedi
# python3 -m pip install --no-cache-dir --force-reinstall "numpy==2.1.3"
!pip install -q --upgrade pip setuptools wheel Cython cmake jedi
# Choix stable: NumPy 2.0.x pour rester compatible avec numba 0.60 et hdbscan
# pip install -q "numpy==2.0.2" "scikit-learn==1.7.2" "numba==0.60.*" "hdbscan==0.8.40" "gudhi==3.11.0" joblib threadpoolctl



[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m95.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m152.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.7/29.7 MB[0m [31m126.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m104.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# import os, signal
# os.kill(os.getpid(), signal.SIGKILL)


## 3. Clone the required repositories

We fetch both the main HypergraphPercol sources and the `cyminiball` dependency at the same revisions used by the Docker build.

In [4]:
%%bash
set -euo pipefail
WORKDIR="${HGP_WORKDIR:-/content}"
mkdir -p "${WORKDIR}"
cd "${WORKDIR}"
if [ -d HypergraphPercol ]; then
    git -C HypergraphPercol pull --ff-only
else
    git clone https://github.com/Ludwig-H/HypergraphPercol.git
fi
if [ -d cyminiball ]; then
    git -C cyminiball pull --ff-only
else
    git clone https://github.com/Ludwig-H/cyminiball.git
fi


Cloning into 'HypergraphPercol'...
Cloning into 'cyminiball'...


## 4. Build and install `cyminiball`

The Docker image creates a wheel from source and installs it without build isolation. We mirror that approach so the same binary is present inside Colab.

In [5]:
%%bash
set -euo pipefail
WORKDIR="${HGP_WORKDIR:-/content}"
mkdir -p "${WORKDIR}/wheels"
cd "${WORKDIR}/cyminiball"
python3 -m pip wheel --no-build-isolation --no-deps --wheel-dir="${WORKDIR}/wheels" .
python3 -m pip install --force-reinstall --no-deps --no-index --find-links="${WORKDIR}/wheels" cyminiball
# python3 -m pip install --no-cache-dir --force-reinstall "numpy==2.1.3"


Processing /content/cyminiball
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: cyminiball
  Building wheel for cyminiball (pyproject.toml): started
  Building wheel for cyminiball (pyproject.toml): finished with status 'done'
  Created wheel for cyminiball: filename=cyminiball-2.1.1-cp311-cp311-linux_x86_64.whl size=736473 sha256=c54093fa1f985669856e1a10367af9db82f4e6e5d2183a5fdebdb303f257d8ea
  Stored in directory: /tmp/pip-ephem-wheel-cache-m9ztpsoa/wheels/21/27/f8/6e45a319e6b22be890a545c6e52d90f59618cbc5b7e9261c97
Successfully built cyminiball
Looking in links: /content/wheels
Processing /content/wheels/cyminiball-2.1.1-cp311-cp311-linux_x86_64.whl
Installing collected packages: cyminiball
Successfully installed cyminiball-2.1.1


## 5. Download the CGAL helper projects

The helper script clones the six CGAL-based executables required by HypergraphPercol.

In [6]:
%%bash
set -euo pipefail
WORKDIR="${HGP_WORKDIR:-/content}"
cd "${WORKDIR}/HypergraphPercol"
python3 scripts/setup_cgal.py


[clone] https://github.com/Ludwig-H/EdgesCGALWeightedDelaunay3D.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunay3D
[clone] https://github.com/Ludwig-H/EdgesCGALWeightedDelaunay2D.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunay2D
[clone] https://github.com/Ludwig-H/EdgesCGALWeightedDelaunayND.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunayND
[clone] https://github.com/Ludwig-H/EdgesCGALDelaunay3D.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunay3D
[clone] https://github.com/Ludwig-H/EdgesCGALDelaunay2D.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunay2D
[clone] https://github.com/Ludwig-H/EdgesCGALDelaunayND.git -> /content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunayND


Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunay3D'...
Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunay2D'...
Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALWeightedDelaunayND'...
Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunay3D'...
Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunay2D'...
Cloning into '/content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunayND'...


## 7. Install Python runtime dependencies

HypergraphPercol depends on scientific Python packages such as scikit-learn, HDBSCAN and GUDHI. Installing them upfront ensures that the later `pip install` step can reuse the locally built `cyminiball` wheel without attempting to rebuild it. Some of these wheels may opportunistically upgrade NumPy, so we immediately reinstall 2.1.3 afterwards to keep the ABI aligned with the compiled extensions.

In [7]:
%%bash
set -euo pipefail
WORKDIR="${HGP_WORKDIR:-/content}"
cd "${WORKDIR}/HypergraphPercol/CGALDelaunay"

projects=(
    EdgesCGALDelaunay2D
    EdgesCGALDelaunay3D
    EdgesCGALDelaunayND
    EdgesCGALWeightedDelaunay2D
    EdgesCGALWeightedDelaunay3D
    EdgesCGALWeightedDelaunayND
)

for project in "${projects[@]}"; do
    cmake -S "${project}" -B "${project}/build" -DCMAKE_BUILD_TYPE=Release
    cmake --build "${project}/build" --config Release
    cmake --install "${project}/build" --prefix "${WORKDIR}/HypergraphPercol"
done


-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using header-only CGAL
-- Targetting Unix Makefiles
-- Using /usr/bin/c++ compiler.
-- Found GMP: /usr/lib/x86_64-linux-gnu/libgmp.so
-- Found MPFR: /usr/lib/x86_64-linux-gnu/libmpfr.so
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found suitable version "1.74.0", minimum required is "1.48")
-- Boost include dirs: /usr/include
-- Boost libraries:    
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Using gcc version 4 or later. Adding -frounding-math
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /content/HypergraphPercol/CGALDelaunay/EdgesCGALDelaunay2D/build
[ 50%] Building CXX o

  CGAL_DATA_DIR cannot be deduced, set the variable CGAL_DATA_DIR to set the
  default value of CGAL::data_file_path()
Call Stack (most recent call first):
  CMakeLists.txt:6 (find_package)


  Policy CMP0167 is not set: The FindBoost module is removed.  Run "cmake
  --help-policy CMP0167" for policy details.  Use the cmake_policy command to

Call Stack (most recent call first):
  /usr/lib/x86_64-linux-gnu/cmake/CGAL/CGAL_SetupCGALDependencies.cmake:47 (include)
  /usr/lib/x86_64-linux-gnu/cmake/CGAL/CGALConfig.cmake:153 (include)
  CMakeLists.txt:6 (find_package)

  CGAL_DATA_DIR cannot be deduced, set the variable CGAL_DATA_DIR to set the
  default value of CGAL::data_file_path()
Call Stack (most recent call first):
  CMakeLists.txt:14 (find_package)


  CGAL_DATA_DIR cannot be deduced, set the variable CGAL_DATA_DIR to set the
  default value of CGAL::data_file_path()
Call Stack (most recent call first):
  CMakeLists.txt:12 (find_package)


  Policy CMP0167 is not set: The FindBoost

## 7. Install Python runtime dependencies

HypergraphPercol depends on scientific Python packages such as scikit-learn, HDBSCAN and GUDHI. Installing them upfront ensures that the later `pip install` step can reuse the locally built `cyminiball` wheel without attempting to rebuild it. Some of these wheels may opportunistically upgrade NumPy, so we immediately reinstall 2.1.3 afterwards to keep the ABI aligned with the compiled extensions.

In [8]:
# %%bash
# set -euo pipefail
# python3 -m pip install --upgrade scikit-learn hdbscan gudhi joblib threadpoolctl
# python3 -m pip install --no-cache-dir --force-reinstall "numpy==2.1.3"


## 8. Install HypergraphPercol from source

Finally, install the package so that it becomes importable inside the notebook runtime. Using `--no-deps` keeps the `cyminiball` wheel we built earlier instead of asking `pip` to recompile it.

In [9]:
%%bash
set -euo pipefail
WORKDIR="${HGP_WORKDIR:-/content}"
cd "${WORKDIR}/HypergraphPercol"
python3 -m pip install --no-deps --force-reinstall .


Processing /content/HypergraphPercol
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: hypergraphpercol
  Building wheel for hypergraphpercol (pyproject.toml): started
  Building wheel for hypergraphpercol (pyproject.toml): finished with status 'done'
  Created wheel for hypergraphpercol: filename=hypergraphpercol-0.1.0-cp311-cp311-linux_x86_64.whl size=682076 sha256=951c3e952afb8f76a28918b2c9a9a0a78df693c2ea8ade57af91a31fe22461f7
  Stored in directory: /tmp/pip-ephem-wheel-cache-f80i2je0/wheels/ce/3f/57/e100c6af9792b9d522941569c843abefaa87ccb546c540954f
Successfully built hypergraphpercol
Installing collected packages: hypergraphpercol
Successfully installed hypergraphpe

## 9. Configure the runtime (optional) and validate the installation

The package defaults to `/content/HypergraphPercol/CGALDelaunay` when `cgal_root` is not provided. The following cell sets the environment variable explicitly and performs a simple clustering run to ensure everything is functional.

In [10]:
import os
import numpy as np

workdir = os.environ.get("HGP_WORKDIR", "/content")
repo_root = os.path.join(workdir, "HypergraphPercol")
os.environ["CGALDELAUNAY_ROOT"] = os.path.join(repo_root, "CGALDelaunay")

from hypergraphpercol import HypergraphPercol

rng = np.random.default_rng(0)
data = np.vstack([
    rng.normal(loc=-2.0, scale=0.4, size=(40, 3)),
    rng.normal(loc=2.0, scale=0.4, size=(40, 3)),
])
labels = HypergraphPercol(
    data,
    K=2,
    min_cluster_size=20,
    min_samples=None,
    metric="euclidean",
    complex_chosen="auto",
    expZ=2,
    precision="safe",
    verbeux=True,
    cgal_root=os.environ["CGALDELAUNAY_ROOT"],
)
print("Unique labels:", np.unique(labels))


orderk_delaunay k = 1
Computed weighted barycentres 492
orderk_delaunay k = 2
Simplexes sans filtration : 2539
N_CPU_dispo utilisés :  -1
2-simplices=2539
Faces uniques: 796 (compression 7617→796)
Arêtes uniques (u<v): 2470 (avant dédup 5078)
MST faces: 795 arêtes, composantes estimées: 1
[COMP-F:0] components=1, faces=796, points=80, RSS=247.6 MiB
[COMP-F:1] comp 1/1 | faces=796, edges=795, Z_est≈24.8 KiB, RSS=247.7 MiB
[PRUNE] leaves=796, classes=104, Z_pruned shape=(103, 4)
[COMP-F:2] comp 1/1 done | cumul points labellisés=78 | conflits=2 | RSS=256.5 MiB
Clusters finaux : 2 | bruit : 2 | RSS=256.5 MiB
Unique labels: [-1  0  1]
