Skip to content

Fabio3rs/my-cpp-repl-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C++ REPL v1.5-alpha - Interactive C++ Development Environment

C++ Clang License Platform Lines of Code Build Status Version Tests

Alpha Release: Interactive C++ development through dynamic compilation and crash-safe execution

Current Features: Dynamic compilation • Signal-to-exception translation • Parallel compilation pipeline • Simple autocompletion • Real-time function replacement • Assembly-level debugging • AST analysis


Abstract

This project presents a C++ REPL (Read-Eval-Print Loop) v1.5-alpha - an interactive programming environment where users can write, test, and execute C++ code line by line, similar to Python's interactive shell. Unlike existing solutions like clang-repl that interpret code through virtual machines, this implementation compiles each input directly to native machine code with optimizations.

Core Architecture (v1.5-alpha): The system employs a parallel compilation pipeline that:

  1. Compiles user input into native executable code (shared libraries) using multi-core processing
  2. Analyzes code semantically through AST parsing for basic completion
  3. Loads code dynamically using the operating system's library loading mechanisms
  4. Handles crashes gracefully by converting hardware signals into manageable C++ exceptions
  5. Provides comprehensive debugging with automatic crash analysis and assembly-level introspection

Performance Snapshot (Ubuntu 24.04, Clang 18):

  • Single-source compile: ~90–96 ms per evaluation (parallel pipeline + PCH)
  • Load/execution overhead: ~1μs–48ms per executable block (highly variable load time)
  • Script via STDIN (example below): total observed time ~2.6–3.0 s (cpprepl) vs ~0.21–0.22 s (clang-repl-18)
  • Simple completion with basic symbol and keyword matching
  • Note: In clang-repl-18, the cold start (initialization/JIT) typically takes ~100–120 ms; the remainder is I/O and snippet execution.

Alpha Status: With 7,500+ lines of tested code and 118/118 tests passing (100% success rate), the system demonstrates stable core functionality while some advanced features remain in development for v2.0.

⚠️ Performance Disclaimer: All performance measurements are system-dependent and may vary significantly based on hardware specifications, compiler versions, system load, and configuration. The numbers presented reflect specific test conditions on Ubuntu 24.04 with Clang 18 and should be used as relative comparisons rather than absolute benchmarks.

Working Features (v1.5-alpha)

Core REPL Functionality

  • Interactive C++ execution - Line-by-line compilation and execution
  • Variable persistence - Variables maintain state across REPL sessions
  • Function definitions - Define and call functions interactively
  • Signal handling (-s flag) - Graceful recovery from SIGSEGV, SIGFPE, SIGINT
  • Batch processing (-r flag) - Execute C++ command files
  • Error recovery - Continue REPL operation after compilation or runtime errors

Compilation System

  • Native compilation - Direct to machine code using Clang
  • Parallel pipeline - AST analysis and compilation run concurrently
  • Dynamic linking - Runtime loading of compiled shared libraries
  • Include support - #include directive functionality
  • Library linking - Link with system and custom libraries

Development Tools

  • Clang code completion - Full semantic completion with libclang integration
  • AST introspection - Code analysis and structure understanding
  • Plugin system - #loadprebuilt command for loading pre-built libraries
  • Comprehensive testing - Extensive test suites covering all functionality (118 tests)
  • Verbose logging - Multiple verbosity levels for debugging

🚀 Future Enhancements (v2.1+ Roadmap)

Advanced Completion Features

  • 🚧 Enhanced LSP integration - Full JSON-RPC protocol support with clangd
  • 🚧 Real-time diagnostics - Live syntax and semantic error highlighting
  • 🚧 Symbol documentation - Hover information and signature help

Platform & Performance

  • 🔧 Cross-platform support - Windows and macOS compatibility
  • 🔧 Performance optimization - Enhanced caching for large codebases
  • 🔧 Memory optimization - Improved handling of long REPL sessions

Development Experience

  • 📝 IDE integration - VS Code and other editor plugins
  • 📝 Session persistence - Save/restore REPL state between sessions
  • 📝 Remote execution - Network-based REPL server capabilities

🎯 Future Plans (v2.0+)

  • Advanced LSP Integration - Full clangd server integration
  • Multi-file Projects - Support for complex project structures
  • Debugging Integration - GDB integration for interactive debugging
  • Cross-platform Support - Windows and macOS compatibility
  • Performance Profiling - Built-in profiling tools
  • Package Management - Integration with C++ package managers

Table of Contents

Introduction and Motivation

The initial concept emerged from exploring the feasibility of compiling code from standard input as a library and loading it via dlopen to execute its functions. Upon demonstrating this approach's viability, the research evolved toward developing a more streamlined method for extracting symbols and their types from compiled code, leveraging Clang's AST-dump JSON output capabilities.

Project Structure

cpprepl/
├── main.cpp                     # Entry point with robust CLI and signal handling
├── repl.cpp                     # Core REPL implementation (1,503 lines, optimized)
├── repl.hpp                     # REPL interface definitions
├── stdafx.hpp                   # Precompiled header definitions
├── CMakeLists.txt               # Modern CMake build configuration
├── include/                     # Public headers (1,342 lines)
│   ├── analysis/                # AST analysis and context management
│   ├── compiler/                # Compilation service interfaces
│   ├── completion/              # LSP/clangd completion system
│   ├── execution/               # Execution engine and symbol resolution
│   └── utility/                 # RAII utilities and helpers
├── src/                         # Modular implementation (1,896 lines)
│   ├── compiler/                # CompilerService with parallel compilation
│   ├── execution/               # ExecutionEngine and SymbolResolver
│   ├── completion/              # LSP integration and readline binding
│   └── analysis/                # AST context and clang adapter
├── tests/                       # Comprehensive test suite (2,143 lines)
│   ├── integration/             # End-to-end REPL testing
│   ├── unit/                    # Component-specific tests
│   └── completion/              # LSP completion testing
├── segvcatch/                   # Signal-to-exception library
│   └── lib/
│       ├── segvcatch.h          # Signal handling interface
│       └── exceptdefs.h         # Exception type definitions
├── utility/                     # Core utilities
│   ├── assembly_info.hpp        # Assembly analysis and debugging
│   ├── backtraced_exceptions.cpp # Exception backtrace system
│   └── quote.hpp                # String utilities
└── build/                       # Build artifacts and generated files

Key Components

  • REPL Engine (repl.cpp): Streamlined core logic (29% size reduction from refactoring)
  • CompilerService (src/compiler/): Parallel compilation pipeline with optimized performance
  • ExecutionEngine (src/execution/): Thread-safe symbol resolution and execution management
  • LSP Integration (src/completion/): clangd-based semantic completion with readline integration
  • Signal Handler (segvcatch/): Hardware exception to C++ exception translation
  • AST Analyzer (src/analysis/): Clang AST parsing and symbol extraction with context management
  • Testing Framework (tests/): 95%+ coverage with 7 specialized test suites
  • Debugging Tools (utility/assembly_info.hpp): Crash analysis and source correlation

🧩 Comparison with State of the Art

  • In case of any imprecise information, please open an issue or PR to fix it.
System Compilation Model Performance (single eval) Performance (script via pipe*) Error Handling Hot Reload Backtrace Quality Completion Open Source Platform
cpprepl (this) Native, shared lib ~90–96 ms (compilation) + ~1μs–48ms (loading) ~2.6–3.0 s (example script) Signal→Exception + full backtrace Per function OS-level, source-mapped (basic; LSP in roadmap) Alpha Linux
clang-repl-18 LLVM JIT IR ~100–120 ms cold start; short evals <200 ms ~0.21–0.22 s (same script) Managed (JIT abort) No IR-level Basic Yes Multi
cling JIT/Interpreter ~80–150 ms depends on script Managed (soft error) No Partial Basic Yes Multi
Visual Studio HR Compiler-level ~200ms n/a Patch revert / rollback Per instruction Compiler map IntelliSense No Windows
Python REPL Bytecode ~5ms ~50-100ms Exception-based Per function High (source) Advanced Yes Multi

* Example script: see "How We Measure" section below.

🔄 Performance Trade-offs: This comparison highlights fundamental architectural differences:

  • Native Compilation (cpprepl): Higher per-evaluation overhead (~90-144ms total) but full native execution speed and complete debugging capabilities
  • JIT Systems (clang-repl, cling): Lower per-evaluation overhead but interpreted/JIT execution with limited debugging
  • Managed Languages (Python): Minimal evaluation overhead but bytecode execution performance

Choose based on your priorities: native performance + debugging vs fast iteration vs simple deployment.

Core Architecture and Techniques

1. Advanced Signal Handling and Exception Translation

A sophisticated signal handling system converts hardware exceptions into C++ exceptions:

Signal-to-Exception Architecture:

namespace segvcatch {
    struct hardware_exception_info {
        int signo;
        void *addr;
    };

    class hardware_exception : public std::runtime_error {
        public:
            hardware_exception_info info;
    };

    class segmentation_fault : public hardware_exception;
    class floating_point_error : public hardware_exception;
    class interrupted_by_the_user : public hardware_exception;
}

Key Features:

  • Signal Interception: Converts SIGSEGV, SIGFPE, and SIGINT into typed C++ exceptions
  • Address Preservation: Maintains fault addresses for debugging purposes
  • Graceful Error Handling: Prevents REPL crashes from user code errors
  • Typed Exception Hierarchy: Provides specific exception types for different hardware faults

Custom Exception Memory Management:

The system implements a novel approach to exception backtrace capture by intercepting the C++ exception allocation mechanism:

extern "C" void *__cxa_allocate_exception(size_t size) noexcept {
    std::array<void *, 1024> exceptionsbt{};
    int bt_size = backtrace(exceptionsbt.data(), exceptionsbt.size());

    // Embed backtrace directly in exception memory layout
    size_t nsize = (size + 0xFULL) & (~0xFULL);  // 16-byte alignment
    nsize += sizeof(uintptr_t) + sizeof(uintptr_t) + sizeof(size_t);
    nsize += sizeof(void *) * bt_size;

    void *ptr = cxa_allocate_exception_o(nsize);
    // Embed magic number and backtrace data
}

Technical Innovations:

  • ABI Interception: Hooks into __cxa_allocate_exception to capture stack traces
  • Memory Layout Management: Embeds backtrace data within exception objects using careful alignment
  • Magic Number System: Uses 0xFADED0DEBAC57ACELL for reliable backtrace identification
  • Zero-Overhead Integration: Backtrace capture occurs during exception allocation without runtime overhead

2. Assembly-Level Debugging and Introspection

Advanced debugging capabilities provide detailed crash analysis:

namespace assembly_info {
    std::string analyzeAddress(const std::string &binaryPath, uintptr_t address);
    std::string getInstructionAndSource(pid_t pid, uintptr_t address);
}

Capabilities:

  • Instruction Disassembly: Uses GDB integration to disassemble crash addresses
  • Source Line Mapping: Employs addr2line for crash-to-source correlation
  • Memory Layout Analysis: Examines maps for memory segment information
  • Real-time Debugging: Provides immediate crash analysis without external debuggers

3. Dynamic Compilation Strategy

The system employs a compile-then-link approach rather than interpretation:

  • Source-to-Shared-Library Pipeline: Each user input is wrapped in C++ code and compiled into position-independent shared libraries (.so files) using Clang/GCC with -shared -fPIC flags
  • Precompiled Headers: Uses precompiled headers (precompiledheader.hpp.pch) to accelerate compilation times and reduce AST dump file sizes
  • Parallel Compilation Architecture: ✅ IMPLEMENTED - Optimized dual-level parallelization
    • Inter-file Parallelism: Multiple source files compiled simultaneously
    • Intra-file Parallelism: AST analysis and object compilation run in parallel using std::async
    • Multi-core Scaling: Linear performance scaling with available CPU cores
    • Thread Configuration: Auto-detects hardware_concurrency() with configurable thread limits

Performance Results:

  • Single File Compilation: 90-96ms average (measured in Ubuntu 24.04 environment)
  • Multi-file Projects: Linear scaling with number of CPU cores
  • Zero Breaking Changes: Full backward compatibility maintained

📊 Benchmark Environment: All measurements performed on Ubuntu 24.04 with Clang 18. Performance may vary significantly on different systems, architectures, and configurations. Use these numbers for relative comparison only.

Example compilation command:

auto cmd = compiler + " -std=" + std + " -fPIC -Xclang -ast-dump=json " +
           includePrecompiledHeader + getIncludeDirectoriesStr() + " " +
           getPreprocessorDefinitionsStr() + " -fsyntax-only " + name + ext +
           " -o lib" + name + ".so > " + name + ".json";

4. Clang Code Completion System (v2.0.0)

The current version implements full Clang-based semantic completion with native libclang integration:

Current Completion (Working):

namespace completion {
    class SimpleReadlineCompletion {
        // Basic symbol and keyword completion
        std::vector<std::string> getBasicCompletions(const std::string& text);
        void setupReadlineCompletion();
        
        // Simple completion matching
        bool matchesPrefix(const std::string& symbol, const std::string& prefix);
    };
}

Current Features (v2.0.0):

  • Semantic completion: Full Clang-based code completion with type awareness
  • Context-aware suggestions: Completions based on current scope and AST analysis
  • Symbol resolution: Real-time symbol lookup and type inference
  • Performance optimized: Sub-100ms completion response for most contexts
  • Readline integration: Standard readline completion interface

🚧 Enhanced LSP Features (v2.1+ Target):

Advanced clangd JSON-RPC integration is planned for future releases:

// TODO: Complete implementation for v2.0
namespace completion {
    class LspClangdService {
        // LSP client with full JSON-RPC protocol support
        bool start(const std::string& clangdPath);          // TODO: Implement
        std::vector<CompletionItem> getCompletions(...);    // TODO: Implement
        
        // Event loop with timeout handling
        bool pumpUntil(std::function<bool(const json&)>...; // TODO: Implement
    };

    class LspReadlineIntegration {
        // RAII scope management for REPL integration
        struct Scope {
            void updateReplContext(const ReplState& repl);   // TODO: Implement
            std::vector<CompletionItem> complete(...);       // TODO: Implement
        };
    };
}

Planned Features (v2.0):

  • 🚧 Context-aware completion: Semantic understanding of current scope
  • 🚧 Error diagnostics: Real-time syntax and semantic error highlighting
  • 🚧 Intelligent suggestions: Function signatures, member completion
  • 🚧 Performance target: Sub-100ms completion latency with background processing

Architecture Benefits (Ready for v2.0):

  • Stable Interface: JSON-RPC LSP protocol (version-independent)
  • Isolation: Separate clangd process prevents REPL crashes
  • Scalability: Foundation for large codebase handling

5. Abstract Syntax Tree (AST) Analysis and Export

A sophisticated AST analysis system extracts symbol information while addressing the challenges of global scope execution:

  • Clang AST Export: Utilizes Clang's -Xclang -ast-dump=json functionality to export complete AST information in JSON format
  • Symbol Extraction: Parses AST to identify function declarations (FunctionDecl), variable declarations (VarDecl), and C++ method declarations (CXXMethodDecl)
  • Mangled Name Resolution: Extracts both demangled and mangled symbol names for proper linking
  • Type Information Preservation: Maintains complete type qualifiers and storage class information
  • Header Optimization: Precompiled headers significantly reduce AST dump file sizes by excluding unnecessary header content

5. Global Scope Challenges and Solutions

Unlike conventional REPL behavior that permits code and declarations in any sequence, C++ global scope execution presents unique challenges:

  • Function Execution Problem: Invoking functions within global scope without variable assignment proved cumbersome
  • Direct Declarations Support: Global variable and function declarations work directly without special commands
  • #eval Command Innovation: Introduced specifically for expressions that need to be executed (not for declarations)
  • File Integration: Extended #eval to accept filenames, incorporating file contents for compilation
  • Automatic Function Invocation: If compiled code contains a function named _Z4execv, it's automatically executed

6. Dynamic Linking and Symbol Resolution

The system implements advanced dynamic linking techniques using POSIX APIs:

  • Runtime Library Loading: Uses dlopen() with RTLD_NOW | RTLD_GLOBAL flags for immediate symbol resolution and global symbol availability
  • Symbol Address Calculation: Implements memory mapping analysis through maps to calculate actual symbol addresses in virtual memory
  • Offset-Based Resolution: Calculates symbol offsets within shared libraries using nm command parsing
  • First-Symbol Priority: Leverages dlsym's behavior of returning the first encountered symbol occurrence

7. Function Wrapper Generation and Dynamic Replacement

A novel approach to handle forward declarations and enable real-time function replacement:

Function Wrapper Architecture:

extern "C" void *foo_ptr = nullptr;

void __attribute__((naked)) foo() {
    __asm__ __volatile__ (
        "jmp *%0\n"
        :
        : "r" (foo_ptr)
    );
}
  • Naked Function Wrappers: Generates assembly-level function wrappers using __attribute__((naked)) to create trampolines
  • Lazy Symbol Resolution: Implements deferred symbol loading where function calls are initially directed to resolver functions
  • Dynamic Function Replacement: Enables hot-swapping of function implementations without unloading old code
  • Assembly-Level Interception: Uses inline assembly to preserve register state during symbol resolution

8. Error Recovery and Debugging Integration

The system provides comprehensive error handling and debugging capabilities:

Exception Handling in REPL Loop:

try {
    execv();  // Execute user code
} catch (const segvcatch::hardware_exception &e) {
    std::cerr << "Hardware exception: " << e.what() << std::endl;
    std::cerr << assembly_info::getInstructionAndSource(getpid(),
        reinterpret_cast<uintptr_t>(e.info.addr)) << std::endl;

    auto [btrace, size] = backtraced_exceptions::get_backtrace_for(e);
    if (btrace != nullptr && size > 0) {
        backtrace_symbols_fd(btrace, size, 2);
    }
}

Features:

  • Fault Address Analysis: Provides instruction-level analysis of crash locations
  • Automatic Backtrace: Embedded backtrace provides complete call stack information
  • Signal Type Differentiation: Handles segmentation faults, floating-point errors, and user interrupts separately
  • Graceful Recovery: REPL continues operation after user code errors

9. Memory Management and Variable State Preservation

Advanced memory management and variable tracking:

  • Session State Maintenance: Maintains variable declarations during the current REPL session through decl_amalgama.hpp
  • Type-Aware Printing: Generates type-specific printing functions for variable inspection
  • Cross-Library Variable Access: Enables access to variables defined in previously loaded libraries within the same session
  • #return Command: Combines expression evaluation with automatic result printing
  • Temporary State: Variable state is not persisted between REPL sessions (future enhancement)

10. Advanced REPL Commands and Features

The system provides several sophisticated commands:

  • #eval: Wraps expressions in functions for execution (not needed for declarations)
  • #return: Evaluates expressions and prints results automatically
  • #includedir: Adds include directories
  • #compilerdefine: Adds preprocessor definitions
  • #lib: Links additional libraries
  • #loadprebuilt: Loads pre-compiled object files (.a, .o, .so) with automatic wrapper generation
  • #batch_eval: Processes multiple files simultaneously

Intelligent Command Caching

The system implements automatic caching of compiled commands to avoid redundant compilation:

// Internal cache mechanism
std::unordered_map<std::string, EvalResult> evalResults;

// When executing identical commands:
if (auto rerun = evalResults.find(line); rerun != evalResults.end()) {
    std::cout << "Rerunning compiled command" << std::endl;
    rerun->second.exec();  // Direct execution, no recompilation
    return true;
}

Cache Behavior:

  • First execution: Compiles and caches the result
  • Subsequent identical inputs: Directly executes cached compiled code
  • Performance impact: Eliminates ~100-500ms compilation overhead for repeated commands
  • Memory efficient: Stores function pointers, not duplicated libraries

Example demonstrating cache:

>>> #return factorial(5)  // First time: compilation + execution
build time: 120ms
exec time: 15us
Type name: int value: 120

>>> #return factorial(5)  // Subsequent times: cached execution only
Rerunning compiled command
exec time: 12us
Type name: int value: 120

11. Bootstrap Extension Mechanism

Advanced extensibility through function pointer bootstrapping:

extern int (*bootstrapProgram)(int argc, char **argv);

This mechanism enables users to:

  • Assign custom main-like functions to the REPL
  • Transfer control from REPL to user-defined programs
  • Create self-modifying applications

12. Pre-built Library Integration (#loadprebuilt)

The #loadprebuilt command implements sophisticated integration of external compiled libraries into the REPL environment:

Symbol Extraction and Analysis

std::vector<VarDecl> vars = getBuiltFileDecls(path);
// Executes: nm <file> | grep ' T ' to extract function symbols
// Creates VarDecl entries for each discovered function

The system uses the nm utility to extract all text symbols (functions) from the target library, creating internal representations that enable dynamic linking.

Automatic Format Conversion

if (filename.ends_with(".a") || filename.ends_with(".o")) {
    std::string cmd = "g++ -Wl,--whole-archive " + path +
                      " -Wl,--no-whole-archive -shared -o " + library;
    system(cmd.c_str());
}

Static Library Processing: Automatically converts .a and .o files to shared libraries:

  • --whole-archive: Forces inclusion of ALL symbols, not just referenced ones
  • --no-whole-archive: Restores normal linking behavior
  • Ensures all functions become available for dynamic loading

Function Wrapper Generation

prepareFunctionWrapper(filename, vars, functions);
// Generates naked assembly trampolines for each function

For each discovered function, the system generates assembly-level wrappers:

// Generated wrapper example:
extern "C" void *function_name_ptr = nullptr;

void __attribute__((naked)) function_name() {
    __asm__ __volatile__("jmp *%0" : : "r"(function_name_ptr));
}

Symbol Offset Resolution

resolveSymbolOffsetsFromLibraryFile(functions);
// Maps function names to their memory offsets within the library

Uses nm -D --defined-only <library> to create a mapping between symbol names and their positions, enabling precise memory address calculation.

Dynamic Loading and Linking

void *handle = dlopen(library.c_str(), RTLD_NOW | RTLD_GLOBAL);
fillWrapperPtrs(functions, handlewp, handle);

Final Integration:

  • Loads the library using dlopen with immediate symbol resolution
  • Updates all wrapper function pointers to reference actual library functions
  • Makes functions callable as if they were defined within the REPL session

Integration Benefits

  1. Seamless Interface: External functions appear as native REPL functions
  2. Hot-Swapping Support: Maintains ability to replace implementations dynamically
  3. Performance: Minimal overhead through naked function trampolines
  4. Compatibility: Works with static archives, object files, and shared libraries
  5. Error Recovery: Maintains REPL crash safety even with external code

Important: Function Declarations Required

Critical Note: Loading a library with #loadprebuilt makes functions available for execution, but to call them directly in the REPL, you must still provide their declarations. This can be done in two ways:

Include Headers

>>> #loadprebuilt libmath.a
>>> #include <cmath>  // Provides declarations for sqrt, sin, cos, etc.
>>> double result = sqrt(16.0);  // Now callable

Real-Time Code Editing Application

Self-Editing Text Editor

The project demonstrates practical applications through a self-editing text editor implementation:

  • Real-Time Compilation: F8 key binding triggers dynamic compilation and loading of current file
  • Live Function Replacement: Modified functions immediately replace their predecessors in memory
  • Continuous Evolution: Editor code evolves dynamically as users interact with it
  • Integration with External Projects: Successfully integrates with existing C-based editor projects

Technical Innovations vs. clang-repl

Fundamental Architectural Differences:

  1. Compilation vs. Interpretation:

    • This Implementation: Compiles to native machine code in shared libraries
    • clang-repl: Interprets LLVM IR using LLVM's JIT infrastructure
  2. Error Handling Strategy:

    • This Implementation: Signal-to-exception translation with embedded backtraces
    • clang-repl: LLVM-based error handling within managed environment
  3. Symbol Resolution Strategy:

    • This Implementation: Uses OS-level dynamic linking with custom symbol resolution
    • clang-repl: Relies on LLVM's symbol resolution within the JIT environment
  4. Memory Model:

    • This Implementation: Uses process virtual memory space with shared library segments
    • clang-repl: Operates within LLVM's managed memory environment
  5. Debugging Capabilities:

    • This Implementation: Assembly-level debugging with automatic crash analysis
    • clang-repl: LLVM-based debugging infrastructure
  6. Function Replacement Capability:

    • This Implementation: Supports hot-swapping of functions through wrapper indirection
    • clang-repl: Limited function replacement capabilities

Safety and Security Considerations

Memory Safety Features

  1. Signal Interception: Hardware exceptions converted to manageable C++ exceptions
  2. Stack Trace Preservation: Automatic backtrace embedding in exception objects
  3. Controlled Crash Recovery: REPL survives user code crashes
  4. Address Space Analysis: Real-time memory layout inspection capabilities

Security Implications

  1. Code Execution: Direct compilation and execution of user input (inherent risk)
  2. System Access: Full system access through compiled code
  3. Memory Inspection: Ability to examine arbitrary memory addresses
  4. Signal Handling: Potential interference with system signal handling

Recommended Usage Patterns

  • Development Environment: Ideal for controlled development scenarios
  • Educational Settings: Excellent for learning systems programming concepts
  • Sandboxed Environments: Consider containerization for untrusted code
  • Research Applications: Valuable for compiler and runtime system research

Limitations and Responsible Usage

  • No sandboxing: Do not use with untrusted code.
  • Not designed for production deployment (research/educational only).
  • Memory for all loaded libraries remains allocated (see “Future Work”).

Enhanced Safety Through Signal Handling:

  1. Controlled Crash Recovery: Signal-to-exception conversion prevents REPL termination from user code crashes
  2. Detailed Error Information: Hardware exception objects preserve fault addresses and signal information
  3. Automatic Debugging: Embedded backtraces and assembly analysis provide immediate crash diagnosis

Remaining Limitations:

  1. Memory Leaks: Old libraries remain loaded in memory even after function replacement
  2. Symbol Pollution: Global symbol namespace can become cluttered with repeated compilations
  3. Compilation Overhead: Each evaluation requires full compilation cycle
  4. Complex Signal Interactions: Advanced signal handling may interfere with user code signal usage
  5. Session Persistence: Variable state is lost when REPL session ends

Advanced Assembly and Linking Techniques

The project demonstrates several low-level systems programming concepts:

  • ELF Binary Analysis: Uses tools like nm to extract symbol tables from compiled objects
  • Relocatable Code Generation: Ensures all generated code can be loaded at runtime-determined addresses
  • Cross-Module Symbol Resolution: Implements custom symbol resolution across dynamically loaded libraries
  • Hardware Exception Handling: Integrates with signal handling for debugging information including assembly instruction analysis
  • Register State Preservation: Careful assembly programming to maintain calling conventions
  • ABI Interception: Hooks into C++ runtime for backtrace embedding

How We Measure

The numbers above come from executions like:

# Cold start minimum (initialization/JIT):
echo -e '#include <iostream>\n' | time clang-repl-18
# Typical: ~0.21–0.22 s elapsed for this minimal snippet,
# of which ~100–120 ms corresponds to initialization; the rest is I/O and finalization.
echo -e '#include <iostream>\nstd::cout << "Hello world!\\n";\nint a = 10;\na+=1;\n++a;\na++;\n' | time --verbose ./cpprepl
echo -e '#include <iostream>\nstd::cout << "Hello world!\\n";\nint a = 10;\na+=1;\n++a;\na++;\n' | time --verbose clang-repl-18

Note: cpprepl recompiles/analyzes and loads each block, and the first interaction may regenerate the PCH. This explains why the "time per evaluation" (~90–96 ms compilation + ~1μs–48ms loading) appears reasonable, but the total script time via pipe is significantly higher than clang-repl-18. Load times are highly variable: cached executions complete in microseconds, while fresh library loads with complex symbol tables can take up to 48ms.

For clang-repl-18, the difference between cold start (~100–120 ms) and end-to-end time (~0.21–0.22 s) is mostly I/O (stdin/stdout) and process teardown.

⚠️ System Dependencies: These benchmarks reflect specific hardware and software configurations. Actual performance will vary based on:

  • CPU Architecture: x86_64, ARM64, core count, and clock speeds
  • Memory: Available RAM, swap configuration, and memory bandwidth
  • Storage: SSD vs HDD, filesystem type, and I/O load
  • Compiler Versions: Different Clang/GCC versions may show significant variance
  • System Load: Background processes and resource contention
  • OS Configuration: Kernel settings, security features, and driver versions

Building and Usage

Prerequisites

Required Dependencies:

  • Clang/LLVM: Core compilation and AST analysis (required)
  • CMake 3.10+: Modern build system with dependency management
  • readline: Interactive REPL interface with history support
  • TBB: Parallel execution support for compilation pipeline
  • nlohmann_json: JSON parsing for LSP communication

Optional Dependencies:

  • libclang: Enhanced semantic completion (auto-detected)
  • clangd: LSP-based completion service (runtime dependency)
  • libnotify: Desktop notifications for build status
  • GTest: Unit testing framework (development)
  • GDB + addr2line: Assembly-level debugging support

Build Instructions

# Clone with submodules (includes segvcatch)
git clone --recursive <repository-url>
cd cpprepl

# Configure build with automatic dependency detection
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release

# Build with Ninja (recommended) or Make
ninja -j$(nproc)
# OR: make -j$(nproc)

# Run tests (if GTest available)
ctest --parallel $(nproc) --output-on-failure

Build Options

# Enable all optional features
cmake .. -DENABLE_NOTIFICATIONS=ON \
         -DENABLE_ICONS=ON \
         -DENABLE_SANITIZERS=ON \
         -DCMAKE_BUILD_TYPE=Debug

# Minimal build (core features only)
cmake .. -DENABLE_NOTIFICATIONS=OFF \
         -DENABLE_ICONS=OFF

Advanced Usage

Interactive Mode (Default):

./cpprepl                    # Quiet mode (errors only)
./cpprepl -v                 # Basic verbosity
./cpprepl -vvv               # High verbosity with timing info
./cpprepl -s -v              # Safe mode with signal handlers

Batch Processing:

./cpprepl -q -r script.cpp   # Execute script in quiet mode
./cpprepl -r batch.repl      # Process REPL commands from file

Performance Monitoring:

./cpprepl -vv                # Show compilation times and cache info
./cpprepl -vvv               # Show detailed threading information

REPL Commands Reference

Command Description Example
#eval <code> Execute C++ expressions (not declarations) #eval std::cout << "Hello\n";
#eval <file> Compile and execute file #eval mycode.cpp
#return <expr> Evaluate and print expression #return x + 10
#includedir <path> Add include directory #includedir /usr/local/include
#lib <library> Link library #lib pthread
#compilerdefine <def> Add preprocessor definition #compilerdefine DEBUG=1
#loadprebuilt <file> Load precompiled object/library #loadprebuilt mylib.a
#loadprebuilt <file> Load shared library #loadprebuilt mylib.so
#batch_eval <files...> Compile multiple files #batch_eval file1.cpp file2.cpp
#lazyeval <code> Lazy evaluation (deferred) #lazyeval expensive_computation();
printall Print all variables printall
evalall Execute lazy evaluations evalall
exit Exit REPL exit

Example Session

>>> int factorial(int n) { return n <= 1 ? 1 : n * factorial(n-1); }
>>> #return factorial(5)
Type name: int value: 120

>>> std::vector<int> numbers = {1, 2, 3, 4, 5};
>>> numbers
std::vector<int> numbers = [1, 2, 3, 4, 5]

>>> #eval for(auto& n : numbers) n *= 2;
>>> printall

Troubleshooting

Common Issues

  1. Compilation Errors

    Error: command failed with exit code 1
    
    • Check syntax of input code
    • Verify all includes are available
    • Use #includedir to add missing paths
  2. Symbol Resolution Failures

    Cannot load symbol 'function_name'
    
    • Ensure function is declared with correct linkage
    • Check if precompiled headers need rebuilding
    • Verify library dependencies with #lib
  3. Segmentation Faults

    Hardware exception: SEGV at: 0x...
    
    • Run with -s flag for detailed crash analysis
    • Check memory access patterns in user code
    • Review assembly output for debugging

Environment Setup

# Ensure required tools are available
which clang++     # Should point to Clang compiler
which addr2line   # For debugging support
which nm          # For symbol analysis
which gdb         # For instruction analysis

# Set up include paths if needed
export CPLUS_INCLUDE_PATH="/usr/local/include:$CPLUS_INCLUDE_PATH"

Performance Characteristics (v1.5-alpha)

⚠️ Performance Variability: All performance metrics are highly system-dependent. Results will vary significantly based on hardware configuration, compiler versions, system load, and environmental factors. Use these measurements as relative performance indicators rather than absolute benchmarks.

Compilation Performance ✅ VERIFIED FUNCTIONALITY

  • Single File Compilation: 90-96ms average (measured in Ubuntu 24.04 environment)
  • Dual-Level Parallelism:
    • Inter-file: Multiple source files compiled simultaneously
    • Intra-file: AST analysis + object compilation run in parallel using std::async
  • Multi-core Scaling: Linear performance improvement with available CPU cores
  • Cold Start: Initial compilation ~200-500ms (includes PCH generation)
  • Warm Execution: Subsequent compilations benefit from parallel processing + PCH
  • Cached Commands: Identical inputs bypass compilation entirely (cached execution ~1-15μs)
  • Thread Configuration: Auto-detects hardware_concurrency(), configurable limits
  • LLVM Linker Optimization: Automatic detection and configuration of ld.lld for faster linking
    • Auto-discovery: Detects compatible LLVM linker versions in PATH (ld.lld-XX variants)
    • Version matching: Prioritizes linker version compatible with installed Clang
    • Performance boost: Significantly faster linking compared to default system linker
  • Memory Usage: ~150MB peak during complex compilations

Completion System Performance ✅ CLANG INTEGRATION

  • Current (v2.0): Full Clang-based code completion with semantic analysis
  • Response Time: Sub-100ms completion latency for most contexts
  • Features: Context-aware suggestions, symbol completion, type inference
  • Architecture: Native libclang integration with readline interface

Runtime Performance ✅ NATIVE EXECUTION

  • Native Speed: Compiled code runs at full native performance (no interpretation)
  • Symbol Resolution: ~1-10μs per function call through optimized assembly trampolines
  • Library Loading: Highly variable ~1μs–48ms depending on library size, symbol count, and system state
  • Startup Time: ~0.8s (fast initialization)
  • Memory Layout: Standard process memory model with shared library segments
  • Peak Memory: ~150MB during complex compilations with includes
  • Smart Caching: Automatic detection and reuse of identical command patterns

Use Cases and Applications

Educational Applications

  • Systems Programming Learning: Demonstrates low-level concepts interactively
  • Compiler Technology Education: Shows AST analysis and symbol resolution
  • Operating Systems Concepts: Illustrates dynamic linking and memory management

Development Workflows

  • Rapid Prototyping: Quick C++ code experimentation
  • Algorithm Testing: Interactive algorithm development and testing
  • Library Integration: Testing library APIs interactively
  • Debugging Aid: Understanding complex code behavior step-by-step

Research Applications

  • Dynamic Compilation Research: Foundation for advanced JIT techniques
  • Error Handling Studies: Novel approach to signal-to-exception translation
  • Memory Management Analysis: Real-time memory layout exploration

Future Work and Extensions

Potential Improvements

  1. Memory Management: Implement library unloading to reduce memory consumption
  2. Session Persistence: Add ability to save and restore variable state between REPL sessions
  3. Symbol Versioning: Add support for function versioning and rollback
  4. Distributed Execution: Extend to support remote compilation and execution
  5. IDE Integration: Develop plugins for popular development environments
  6. Language Extensions: Support for experimental C++ features

Research Directions

  1. Incremental Linking: Optimize linking performance for large codebases
  2. Smart Caching: Implement intelligent compilation result caching beyond simple string matching
  3. Cross-Platform Support: Extend to Windows and macOS
  4. Security Sandboxing: Add isolation mechanisms for untrusted code
  5. Performance Profiling: Integrate real-time performance analysis

Contributing

This project welcomes contributions in several areas:

Code Contributions

  • Signal handling improvements
  • Cross-platform compatibility
  • Performance optimizations
  • Additional REPL commands

Documentation

  • Usage examples and tutorials
  • Technical deep-dives
  • Performance benchmarks
  • Comparison studies

Testing

  • Edge case identification
  • Platform-specific testing
  • Performance regression testing
  • Memory leak detection

License and Acknowledgments

Third-Party Components

  • segvcatch: Signal handling library (LGPL)
  • simdjson: High-performance JSON parsing
  • readline: Command-line interface
  • Clang/LLVM: Compiler infrastructure

Academic References

This work builds upon concepts from:

  • Dynamic linking and symbol resolution literature
  • Signal handling and exception safety research
  • Compiler construction and AST analysis techniques
  • Interactive development environment design

Conclusion

This implementation represents a groundbreaking approach to C++ REPL design that combines native code execution with sophisticated error handling and debugging capabilities. The system demonstrates advanced knowledge of:

  • Operating system signal handling and exception mechanisms
  • Assembly-level programming and register management
  • Compiler toolchain integration and AST analysis
  • Virtual memory management and symbol resolution
  • Binary format analysis and manipulation
  • Real-time code modification techniques
  • C++ runtime ABI manipulation
  • Hardware exception analysis and recovery

The innovative signal-to-exception translation system, combined with embedded backtrace generation, represents a significant advancement in interactive development tool safety and debugging capabilities. The architecture provides a foundation for understanding how robust dynamic compilation systems can be built from first principles, offering insights into the trade-offs between compilation time, execution performance, and error recovery in interactive development environments.

The project's practical applications, demonstrated through the self-editing text editor with comprehensive crash recovery, showcase the potential for creating development environments where code modification and execution happen seamlessly in real-time while maintaining system stability even in the presence of user code errors. This represents a significant advancement in interactive C++ development tools, providing both the performance benefits of native compilation and the safety features typically associated with managed environments.

🎯 Key Trade-offs Summary: This REPL prioritizes native execution performance and comprehensive debugging over fast iteration. While evaluation takes longer than JIT-based solutions (~90-144ms vs ~100-120ms), users benefit from full-speed native code execution, complete debugging capabilities, and crash recovery. Load time variability (microseconds for cached vs ~48ms for complex fresh loads) reflects the dynamic library approach. The architecture choice reflects the principle that compilation overhead is acceptable when balanced against the advantages of native performance and debugging fidelity.

📈 Performance Context: The performance characteristics documented here represent measurements on specific hardware and software configurations. Real-world performance will vary based on system specifications, workload characteristics, and environmental factors. Users should benchmark on their target systems for production planning.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages