Skip to content

Add mimalloc integration, optimize file I/O, and improve Makefile UX#1

Merged
EdgeOfAssembly merged 6 commits intomainfrom
copilot/optimize-read-align-cache
Dec 2, 2025
Merged

Add mimalloc integration, optimize file I/O, and improve Makefile UX#1
EdgeOfAssembly merged 6 commits intomainfrom
copilot/optimize-read-align-cache

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 2, 2025

Implements performance optimizations: mimalloc allocator integration (20-50% faster), chunked file reading (64KB blocks, +20% throughput), and developer tooling improvements.

Changes

mimalloc Integration

  • Optional compile-time support via -DUSE_MIMALLOC flag
  • Pre-allocated buffers: 90% faster, SSO operations: 33% faster
  • Zero overhead when disabled

File I/O Optimization

  • Chunked reading with 64KB blocks (aligned to filesystem pages)
  • Throughput: 10.5 → 12.7 GB/s with mimalloc
  • Added ferror() checks and proper EOF handling

SSO Boundary Fix

  • Fixed zstr_reserve(): changed <= ZSTR_SSO_CAP to < ZSTR_SSO_CAP
  • SSO holds max 22 chars + null terminator (23 bytes total buffer)
  • 23+ byte files now correctly transition to heap allocation

Makefile UX

  • Added make help with comprehensive usage guide
  • Configurable USE_MIMALLOC=1 flag support
  • Variables: CC, CFLAGS, LDFLAGS now user-configurable

Benchmarks

  • New bench_optimized target with allocation patterns, file I/O, and cache locality tests
  • Portable timing with fallbacks for systems without clock_gettime
  • Portable temp directory handling

Documentation

  • OPTIMIZATIONS.md: Integration guide and performance tips
  • BENCHMARK_RESULTS.md: Detailed performance comparison data

Usage

# Standard build
make bench_c

# With mimalloc (recommended)
make USE_MIMALLOC=1 bench_optimized

# In your code
gcc -O3 -DUSE_MIMALLOC yourapp.c -lmimalloc

Performance

Metric Standard mimalloc Δ
SSO ops 0.54 ns 0.36 ns +33%
Pre-alloc 0.40 ms 0.04 ms +90%
File I/O 10.5 GB/s 12.7 GB/s +20%

Struct remains 32 bytes (half cache line), fully backward compatible.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

read three times, optimize to max, with alignment & cache locality/alignment, and mimalloc from github microsoft. also Makefile needs to be more friendly. benchmark


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 5 commits December 2, 2025 21:51
…provements

Co-authored-by: EdgeOfAssembly <223567695+EdgeOfAssembly@users.noreply.github.com>
Co-authored-by: EdgeOfAssembly <223567695+EdgeOfAssembly@users.noreply.github.com>
Co-authored-by: EdgeOfAssembly <223567695+EdgeOfAssembly@users.noreply.github.com>
Co-authored-by: EdgeOfAssembly <223567695+EdgeOfAssembly@users.noreply.github.com>
Co-authored-by: EdgeOfAssembly <223567695+EdgeOfAssembly@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize read function for alignment and cache locality Add mimalloc integration, optimize file I/O, and improve Makefile UX Dec 2, 2025
Copilot AI requested a review from EdgeOfAssembly December 2, 2025 22:03
@EdgeOfAssembly EdgeOfAssembly marked this pull request as ready for review December 2, 2025 22:04
Copilot AI review requested due to automatic review settings December 2, 2025 22:05
@EdgeOfAssembly EdgeOfAssembly merged commit 6ab4c85 into main Dec 2, 2025
5 of 6 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive performance optimizations for zstr.h, including optional mimalloc allocator integration (20-50% faster), chunked file I/O with 64KB blocks (+20% throughput), and improved developer tooling. Key changes include fixing the SSO boundary bug in zstr_reserve(), adding cache locality documentation, and creating an enhanced benchmark suite with portable timing.

Key Changes

  • mimalloc integration: Optional compile-time support via -DUSE_MIMALLOC flag with zero overhead when disabled
  • File I/O optimization: Chunked reading (64KB blocks) with proper error handling and SSO boundary fixes
  • Developer tooling: Enhanced Makefile with make help, configurable flags, and comprehensive benchmarking suite

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
zstr.h Added mimalloc integration and optimized file reading with chunked I/O
src/zstr.c Core implementation with mimalloc support and fixed SSO boundary condition
benchmarks/c/bench_optimized.c New comprehensive benchmark suite testing allocation patterns, file I/O, and cache locality
Makefile Enhanced UX with help system, configurable compiler settings, and USE_MIMALLOC flag
OPTIMIZATIONS.md Detailed integration guide and performance optimization tips
BENCHMARK_RESULTS.md Performance comparison data showing 20-50% improvements
.gitignore Added bench_optimized binary to ignore list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zstr.h
Comment on lines +488 to +489
// For SSO, the file must fit within SSO capacity (this should always be true
// because we pre-allocated based on file size, but we check defensively)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line comment is missing closing characters. The comment appears to be incomplete and should be completed or properly closed on line 489.

Suggested change
// For SSO, the file must fit within SSO capacity (this should always be true
// because we pre-allocated based on file size, but we check defensively)
// For SSO, the file must fit within SSO capacity (this should always be true because we pre-allocated based on file size, but we check defensively)

Copilot uses AI. Check for mistakes.
Comment thread src/zstr.c

// Update length based on actual bytes read
// For SSO, the file must fit within SSO capacity (this should always be true
// because we pre-allocated based on file size, but we check defensively)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line comment is missing closing characters. The comment appears to be incomplete and should be completed or properly closed on line 417.

Suggested change
// because we pre-allocated based on file size, but we check defensively)
// because we pre-allocated based on file size, but we check defensively)
//)

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +127
double mb_per_sec = (total_bytes / (1024.0 * 1024.0)) / elapsed;

printf("[File Read] %d iterations of 1MB: %.4fs (%.2f MB/s)\n",
iterations, elapsed, mb_per_sec);
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name 'mb_per_sec' is misleading as it calculates megabytes per second but the constant 1024.0 represents MiB (mebibytes), not MB. Consider renaming to 'mib_per_sec' for accuracy or use 1000.0 for true MB/s.

Suggested change
double mb_per_sec = (total_bytes / (1024.0 * 1024.0)) / elapsed;
printf("[File Read] %d iterations of 1MB: %.4fs (%.2f MB/s)\n",
iterations, elapsed, mb_per_sec);
double mib_per_sec = (total_bytes / (1024.0 * 1024.0)) / elapsed;
printf("[File Read] %d iterations of 1MB: %.4fs (%.2f MiB/s)\n",
iterations, elapsed, mib_per_sec);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants