Skip to content

feat: Windows native backend support + cross-platform daemon locking fixes#263

Merged
andylizf merged 50 commits into
mainfrom
feat/windows-native-backends-support
Mar 7, 2026
Merged

feat: Windows native backend support + cross-platform daemon locking fixes#263
andylizf merged 50 commits into
mainfrom
feat/windows-native-backends-support

Conversation

@andylizf
Copy link
Copy Markdown
Collaborator

@andylizf andylizf commented Feb 21, 2026

Summary

Add full Windows build/test support for native HNSW (FAISS) and DiskANN backends, and fix daemon startup race conditions on all platforms (Linux, macOS, Windows).

Validated on real Windows hardware

In addition to CI (GitHub Actions windows-2022 runners), this branch has been end-to-end validated on a physical Windows machine via SSH:

  • Full native build: Both leann-backend-hnsw (FAISS/SWIG) and leann-backend-diskann (pybind11) compiled successfully with MSVC 19.44 + vcpkg dependencies
  • Test results (CI mode): 191 passed, 33 skipped, 0 failed — all tests that run in CI pass on real Windows
  • Test results (full mode): Ran the 33 previously-skipped tests — all pass when run individually. One test (test_llm_config_simulated[diskann]) fails only in sequential runs due to a daemon embedding server reuse bug (tracked in Daemon embedding server reuse causes DiskANN search failures across tests #281). test_large_index times out due to CPU-only torch inference (expected on Windows without GPU).
  • DiskANN file handle fix: Added DiskannSearcher.close() and updated LeannSearcher.cleanup() to release C++ memory-mapped file handles, fixing PermissionError: [WinError 32] during temp directory cleanup on Windows
  • Import validation: import leann, import leann_backend_hnsw, import leann_backend_diskann all succeed
  • DLL loading: vcpkg DLL discovery via os.add_dll_directory() works correctly for both source installs and CI scenarios

Known issues discovered during Windows validation

CI / Build

  • Add Windows matrix builds (windows-2022, Python 3.11–3.14) to reusable build workflow
  • Run all build job scripts with shell: bash for consistent cross-OS behavior
  • Install Windows build prerequisites via Chocolatey (swig, pkgconfiglite) and vcpkg (zeromq, openblas, lapack, boost-program-options, protobuf)
  • Add delvewheel repair step for Windows wheels (was installed but never invoked — Linux/macOS had repair steps, Windows did not)
  • Add retry wrappers around all network operations (uv python install, uv pip install, vcpkg install) to tolerate transient CI failures
  • Remove hard-coded -j8 / CMAKE_BUILD_PARALLEL_LEVEL=8 that broke MSBuild and let CMake auto-detect parallelism

Documentation

  • Add Windows development installation instructions to README.md (VS Build Tools, vcpkg dependencies, environment variables)
  • Update platform badge to include Windows
  • Update CLAUDE.md with Windows build commands

Daemon locking (all platforms)

  • Fix thread-level race in daemon startupfcntl.flock is process-granularity and does not block concurrent threads within the same process. Added per-registry-key threading.Lock so that multiple threads calling search() on the same or different LeannSearcher instances don't spawn duplicate daemon processes. This bug existed on Linux and macOS too, not just Windows.
  • Fix cross-process locking on Windowsfcntl does not exist on Windows; the old code silently fell back to no locking. Added msvcrt.locking fallback so Windows gets real cross-process file locks (with timeout + retry).
  • Extracted _flock_acquire / _flock_release helpers for clean cross-platform file locking.

Native backend DLL loading (Windows-only)

  • Both HNSW and DiskANN __init__.py now call os.add_dll_directory to register vcpkg DLL directories before importing the native extension. This is the standard Python 3.8+ approach for resolving DLL dependencies on Windows (cf. numpy/numpy wiki).
  • Only affects CI and source-install scenarios; pre-built wheels bundle DLLs via delvewheel.

Windows file handle cleanup

  • Added DiskannSearcher.close() to explicitly release C++ index objects and their memory-mapped file handles
  • Updated LeannSearcher.cleanup() to call backend close(), ensuring file handles are released before temp directory cleanup
  • Updated all tests to use tempfile.TemporaryDirectory(ignore_cleanup_errors=True) and context manager patterns for robust cleanup on Windows

FAISS submodule updates

  • Replace POSIX-only APIs with portable equivalents (unistd.hio.h, preadwin_pread shim, ssize_tSSIZE_T, st_blksize → hardcoded 4096 on Windows)
  • Add Windows-specific OpenBLAS/LAPACK linking (vcpkg FindBLAS is unreliable on MSVC)
  • Change BLAS/LAPACK/MKL linkage from PRIVATE to PUBLIC so consumers (Python SWIG extension) correctly inherit link dependencies when faiss is built as a static library

DiskANN submodule updates

  • Guard optional diskann_s target and protobuf target usage
  • Add three-level protoc discovery chain (CMake target → find_program → vcpkg tools path)
  • Change OpenMP loop variables from unsigned to int64_t for MSVC OpenMP 2.0 compatibility (safe on GCC/Clang too — values are always non-negative)
  • Suppress MSVC-specific warnings (C4661) and disable CMAKE_COMPILE_WARNING_AS_ERROR on MSVC
  • Link Abseil targets required by protobuf 4.x on Windows
  • Skip partitioner CLI build under PYBIND to avoid unnecessary Boost linkage
  • Bundle runtime DLLs into wheel packages and editable installs

Other fixes

  • Remove emoji and non-ASCII text from document_rag.py to avoid cp1252 encoding errors on Windows consoles
  • Use tempfile.gettempdir() and os.path.join in tests instead of hard-coded /tmp
  • Disable pytest-timeout for long-running AST integration subprocess tests

Breaking changes

None for end-users. Internal behavioral changes:

  • Daemon startup is now serialised per config key within the same process (via threading.Lock). Previously, concurrent threads could silently spawn duplicate daemons, leaking resources. This is a bugfix, not a regression.
  • FAISS BLAS/LAPACK CMake linkage changed from PRIVATE to PUBLIC. This is strictly more correct for static library consumers and has no observable effect on existing Linux/macOS builds.

Test plan

  • CI passes on all matrix entries (Ubuntu, macOS ARM/Intel, Windows, Arch smoke)
  • Verified on real Windows hardware: 191/191 CI-equivalent tests pass
  • All 33 previously-skipped tests pass individually on Windows
  • DiskANN PermissionError: [WinError 32] fix verified — test_backend_basic[diskann] now passes
  • DLL loading works via vcpkg path discovery (os.add_dll_directory)
  • HNSW backend: build, import, index, search all verified on Windows
  • DiskANN backend: build, import, index, search all verified on Windows
  • delvewheel repair bundles correct DLLs into Windows wheels (CI verification pending)

@andylizf andylizf changed the title ci: enable Windows native backend build path (HNSW + DiskANN) feat: Windows native backend support + cross-platform daemon locking fixes Feb 26, 2026
- Add msvcrt.locking fallback for cross-process file locking on Windows
  (fcntl does not exist; the old code silently proceeded with no lock).
- Extract _flock_acquire/_flock_release helpers for clean POSIX/Windows
  file locking with proper docstrings explaining the two-layer design
  (threading.Lock for intra-process, file lock for inter-process).
- Add Python 3.13 and 3.14 to Windows CI matrix.
- Remove unnecessary `from __future__ import annotations` in backend
  __init__.py (project requires 3.10+).
- Add detailed docstrings to DLL search functions explaining they only
  affect CI/source-install scenarios, not pip end-users.

Made-with: Cursor
ty runs on Linux where msvcrt typeshed stubs gate locking/LK_NBLCK
behind sys.platform == "win32".  Replace try/except ImportError with
an explicit platform check that type checkers understand as narrowing.

Made-with: Cursor
The existing pattern only matched api.star-history.com but the README
also links to www.star-history.com which returns 403 to bots.

Made-with: Cursor
@andylizf andylizf marked this pull request as ready for review February 28, 2026 00:19
@andylizf andylizf mentioned this pull request Feb 28, 2026
Keep main's multi-line args format with --max-retries, --retry-wait-time,
and fail: false.

Made-with: Cursor
- Add missing "Repair wheels (Windows)" step using delvewheel in CI
  workflow. Linux (auditwheel) and macOS (delocate) had repair steps
  but Windows was missing, meaning shipped wheels lacked bundled DLLs.
- Add Windows development installation instructions to README.md
  (VS Build Tools, vcpkg deps, env vars) alongside existing
  macOS/Ubuntu/Arch/RHEL sections.
- Update CLAUDE.md with Windows build commands.
- Update platform badge to include Windows.

Made-with: Cursor
@andylizf andylizf force-pushed the feat/windows-native-backends-support branch from 23a9e4c to ec9d1f1 Compare March 6, 2026 18:12
- Fix DiskANN wheel repair: delvewheel can't find diskann.dll and
  libiomp5md.dll even though CMake install bundles them inside the wheel.
  Extract the wheel to a temp dir with native Windows paths (Python's
  tempfile.mkdtemp, not Git Bash mktemp which produces /tmp/... paths
  that Windows executables can't resolve) and pass via --add-path.
- Download Intel OpenMP NuGet package to a persistent location so
  delvewheel can find libiomp5md.dll after uv build cleans up the
  CMake temp build directory.
- Add retry wrappers around Chocolatey install commands (swig,
  nuget.commandline, pkgconfiglite) to handle transient package
  repository failures.

Verified on real Windows hardware via SSH.

Made-with: Cursor
@andylizf andylizf force-pushed the feat/windows-native-backends-support branch from ec9d1f1 to c2955e7 Compare March 6, 2026 19:00
andylizf added 2 commits March 7, 2026 03:16
- Add DiskannSearcher.close() to explicitly release C++ index objects
  and memory-mapped file handles, preventing PermissionError on Windows
  during temporary directory cleanup.
- Update LeannSearcher.cleanup() to call close() on backend searchers.
- Migrate tests to use context managers and ignore_cleanup_errors=True
  for robust Windows temporary directory handling.
- Skip DiskANN portion of test_backend_options in CI to avoid MKL
  parameter errors with small datasets (#280) and pytest-timeout
  thread-kill segfaults on Windows.
- Re-add Python 3.14 to Windows CI matrix (stable release, not alpha).

Made-with: Cursor
Take main's version of api.py and re-apply the close() call in
LeannSearcher.cleanup() for Windows file handle release.

Made-with: Cursor
@andylizf andylizf merged commit 1ceb5d5 into main Mar 7, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant