| 
 | 1 | +- name: "CompilerResearchCon 2025 (day 2)"  | 
 | 2 | +  date: 2025-11-13 15:00:00 +0200  | 
 | 3 | +  time_cest: "15:00"  | 
 | 4 | +  connect: "[Link to zoom](https://princeton.zoom.us/j/94431046845?pwd=D5i77Qb0PgfwwIubvbo2viEunne7eQ.1)"  | 
 | 5 | +  label: gsoc2025_wrapup_2  | 
 | 6 | +  agenda:  | 
 | 7 | +    - title: "Implementing Debugging Support for xeus-cpp"  | 
 | 8 | +      speaker:  | 
 | 9 | +        name: "Abhinav Kumar"  | 
 | 10 | +      time_cest: "15:00 - 15:20"  | 
 | 11 | +      description: |  | 
 | 12 | +        This proposal outlines integrating debugging into the xeus-cpp kernel   | 
 | 13 | +        for Jupyter using LLDB and its Debug Adapter Protocol (lldb-dap).   | 
 | 14 | +        Modeled after xeus-python, it leverages LLDB’s Clang and JIT debugging   | 
 | 15 | +        support to enable breakpoints, variable inspection, and step-through   | 
 | 16 | +        execution. The modular design ensures compatibility with Jupyter’s   | 
 | 17 | +        frontend, enhancing interactive C++ development in notebooks.  | 
 | 18 | +
  | 
 | 19 | +        This project achieved DAP protocol integration with xeus-cpp. User can   | 
 | 20 | +        use the JupyterLab’s debugger panel to debug C++ JIT code. Applying and   | 
 | 21 | +        hitting breakpoints, stepping in and out of functions are supported in   | 
 | 22 | +        xeus-cpp. Additionally, during this project I had refactored   | 
 | 23 | +        the Out-of-Process JIT execution which was the major part in implementing   | 
 | 24 | +        the debugger.   | 
 | 25 | +
  | 
 | 26 | +
  | 
 | 27 | +      # slides: /assets/presentations/...  | 
 | 28 | + | 
 | 29 | +    - title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"  | 
 | 30 | +      speaker:  | 
 | 31 | +        name: "Maksym Andriichuk"  | 
 | 32 | +      time_cest: "15:20 - 15:40"  | 
 | 33 | +      description: |  | 
 | 34 | +        Clad is a Clang plugin designed to provide automatic differentiation (AD) for C++   | 
 | 35 | +        mathematical functions. It generates code for computing derivatives modifying   | 
 | 36 | +        Abstract-Syntax-Tree(AST) using LLVM compiler features. It performs advanced program   | 
 | 37 | +        optimization by implementing more sophisticated analyses because it has access to a   | 
 | 38 | +        rich program representation – the Clang AST.   | 
 | 39 | +
  | 
 | 40 | +        The project achieved to optimize code  that contains potential data-race conditions,   | 
 | 41 | +        significantly speeding up the execution. Thread Safety Analysis is a static analysis   | 
 | 42 | +        that detects possible data-race conditions that would enable reducing atomic   | 
 | 43 | +        operations in the Clad-produced code.  | 
 | 44 | +          | 
 | 45 | +      # slides: /assets/presentations/...  | 
 | 46 | + | 
 | 47 | +    - title: "Enable automatic differentiation of OpenMP programs with Clad"  | 
 | 48 | +      speaker:  | 
 | 49 | +        name: "Jiayang Li"  | 
 | 50 | +      time_cest: "15:40 - 16:00"  | 
 | 51 | +      description: |  | 
 | 52 | +        This project extends Clad, a Clang-based automatic differentiation tool for C++, to   | 
 | 53 | +        support OpenMP programs. This project enables Clad to parse and differentiate   | 
 | 54 | +        functions with OpenMP directives, thereby enabling gradient computation in   | 
 | 55 | +        multi-threaded environments.  | 
 | 56 | +
  | 
 | 57 | +        This project achieved Clad support for both forward and reverse mode differentiation   | 
 | 58 | +        of common OpenMP directives (parallel, parallel for) and clauses (private,   | 
 | 59 | +        firstprivate, lastprivate, shared, atomic, reduction) by implementing OpenMP-related   | 
 | 60 | +        AST parsing and designing corresponding differentiation strategies. Additional   | 
 | 61 | +        contributions include example applications and comprehensive tests.  | 
 | 62 | +
  | 
 | 63 | +          | 
 | 64 | +      # slides: /assets/presentations/...  | 
 | 65 | + | 
 | 66 | +    - title: "Using ROOT in the field of Genome Sequencing"  | 
 | 67 | +      speaker:  | 
 | 68 | +        name: "Aditya Pandey"  | 
 | 69 | +      time_cest: "16:00 - 16:20"  | 
 | 70 | +      description: |  | 
 | 71 | +        The project extends ROOT, CERN's petabyte-scale data processing framework, to address   | 
 | 72 | +        the critical challenge of managing genomic data that generates upto 200GB per human   | 
 | 73 | +        genome. By leveraging ROOT's big data expertise and introducing the next-generation   | 
 | 74 | +        RNTuple columnar storage format specifically optimized for genomic sequences, the   | 
 | 75 | +        project eliminates the traditional trade-off between compression efficiency and   | 
 | 76 | +        access speed in bioinformatics.  | 
 | 77 | +
  | 
 | 78 | +        The project achieved comprehensive genomic data support through validating GeneROOT   | 
 | 79 | +        baseline performance benchmarks against BAM/SAM formats, implementing RNTuple-based   | 
 | 80 | +        RAM (ROOT Alignment Maps) format with full SAM/BAM field support and smart reference   | 
 | 81 | +        management, demonstrating 23.5% smaller file sizes compared to CRAM while delivering   | 
 | 82 | +        1.9x faster large region queries and 3.2x faster full chromosome scans, optimizing   | 
 | 83 | +        FASTQ compression from 14.2GB to 6.8GB. We also developed chromosome based   | 
 | 84 | +        file-splitting for larger genome file so that chromosome based data can be extracted.   | 
 | 85 | +
  | 
 | 86 | +          | 
 | 87 | +      # slides: /assets/presentations/...  | 
 | 88 | + | 
 | 89 | +- name: "CompilerResearchCon 2025 (day 1)"  | 
 | 90 | +  date: 2025-10-30 15:00:00 +0200  | 
 | 91 | +  time_cest: "15:00"  | 
 | 92 | +  connect: "[Link to zoom](https://princeton.zoom.us/j/94431046845?pwd=D5i77Qb0PgfwwIubvbo2viEunne7eQ.1)"  | 
 | 93 | +  label: gsoc2025_wrapup_1  | 
 | 94 | +  agenda:  | 
 | 95 | +    - title: "CARTopiaX an Agent-Based Simulation of CAR -T -Cell Therapy built on BioDynaMo"  | 
 | 96 | +      speaker:  | 
 | 97 | +        name: "Salvador de la Torre Gonzalez"  | 
 | 98 | +      time_cest: "15:00 - 15:20"  | 
 | 99 | +      description: |  | 
 | 100 | +        CAR- T-cell therapy is a form of cancer immunotherapy that engineers a   | 
 | 101 | +        patient’s T cells to recognize and eliminate malignant cells. Although   | 
 | 102 | +        highly effective in leukemias and other hematological cancers, this therapy   | 
 | 103 | +        faces significant challenges in solid tumors due to the complex and   | 
 | 104 | +        heterogeneous tumor microenvironment. CARTopiaX is an advanced agent-based   | 
 | 105 | +        model developed to address this challenge, using the mathematical framework   | 
 | 106 | +        proposed in the Nature paper “In silico study of heterogeneous tumour-derived   | 
 | 107 | +        organoid response to CAR T-cell therapy,” successfully replicating its core   | 
 | 108 | +        results. Built on BioDynaMo, a high-performance, open-source platform for   | 
 | 109 | +        large-scale and modular biological modeling, CARTopiaX enables detailed   | 
 | 110 | +        exploration of complex biological interactions, hypothesis testing, and   | 
 | 111 | +        data-driven discovery within solid tumor microenvironments.   | 
 | 112 | +
  | 
 | 113 | +        The project achieved major milestones, including simulations that run more than   | 
 | 114 | +        twice as fast as previous model, allowing rapid scenario exploration and robust   | 
 | 115 | +        hypothesis validation; high-quality, well-structured, and maintainable C++ code   | 
 | 116 | +        developed following modern software engineering principles; and a scalable,   | 
 | 117 | +        modular, and extensible architecture that fosters collaboration, customization,   | 
 | 118 | +        and the continuous evolution of an open-source ecosystem. Altogether, this work   | 
 | 119 | +        represents a meaningful advancement in computational biology, providing   | 
 | 120 | +        researchers with a powerful tool to investigate CAR- T- cell dynamics in solid   | 
 | 121 | +        tumors and accelerating scientific discovery while reducing the time and cost   | 
 | 122 | +        associated with experimental wet-lab research.  | 
 | 123 | +
  | 
 | 124 | +      # slides: /assets/presentations/...  | 
 | 125 | + | 
 | 126 | +    - title: "Efficient LLM Training in C++ via Compiler-Level Autodiff with Clad"  | 
 | 127 | +      speaker:  | 
 | 128 | +        name: "Rohan Timmaraju"  | 
 | 129 | +      time_cest: "15:20 - 15:40"  | 
 | 130 | +      description: |  | 
 | 131 | +        The computational demands of Large Language Model (LLM) training are   | 
 | 132 | +        often constrained by the performance of Python frameworks. This project   | 
 | 133 | +        tackles these bottlenecks by developing a high-performance LLM training   | 
 | 134 | +        pipeline in C++ using Clad, a Clang plugin for compiler-level automatic   | 
 | 135 | +        differentiation. The core of this work involved creating cladtorch, a new   | 
 | 136 | +        C++ tensor library with a PyTorch-style API designed for compatibility   | 
 | 137 | +        with Clad's differentiation capabilities. This library provides a more   | 
 | 138 | +        user-friendly interface for building and training neural networks while   | 
 | 139 | +        enabling Clad to automatically generate gradient computations for   | 
 | 140 | +        backpropagation.  | 
 | 141 | +
  | 
 | 142 | +        Throughout the project, I successfully developed two distinct LLM training   | 
 | 143 | +        implementations. The first, using the cladtorch library, established a   | 
 | 144 | +        functional and flexible framework for Clad-driven AD. To further push   | 
 | 145 | +        performance boundaries, I then developed a second, highly-optimized   | 
 | 146 | +        implementation inspired by llm.c, which utilizes pre-allocated memory buffers   | 
 | 147 | +        and custom kernels. This optimized C-style approach, when benchmarked for   | 
 | 148 | +        GPT-2 training on a multithreaded CPU, outperformed the equivalent PyTorch   | 
 | 149 | +        implementation. This work successfully demonstrates the viability and   | 
 | 150 | +        performance benefits of compiler-based AD for deep learning in C++ and   | 
 | 151 | +        provides a strong foundation for future hardware acceleration, such as porting   | 
 | 152 | +        the implementation to CUDA.  | 
 | 153 | +          | 
 | 154 | +      # slides: /assets/presentations/...  | 
 | 155 | + | 
 | 156 | +    - title: "Implement and improve an efficient, layered tape with prefetching capabilities"  | 
 | 157 | +      speaker:  | 
 | 158 | +        name: "Aditi Milind Joshi"  | 
 | 159 | +      time_cest: "15:40 - 16:00"  | 
 | 160 | +      description: |  | 
 | 161 | +        Clad relies on a tape data structure to store intermediate values during reverse   | 
 | 162 | +        mode differentiation. This project focuses on enhancing the core tape implementation   | 
 | 163 | +        in Clad to make it more efficient and scalable. Key deliverables include replacing   | 
 | 164 | +        the existing dynamic array-based tape with a slab allocation approach and small   | 
 | 165 | +        buffer optimization, enabling multilayer storage, and introducing thread safety to   | 
 | 166 | +        support concurrent access.  | 
 | 167 | +
  | 
 | 168 | +        The current implementation replaces the dynamic array with a slab-based structure   | 
 | 169 | +        and a small static buffer, eliminating costly reallocations. Thread-safe access   | 
 | 170 | +        functions have been added through a mutex locking mechanism, ensuring safe parallel   | 
 | 171 | +        tape operations. Ongoing work includes developing a multilayer tape system with   | 
 | 172 | +        offloading capabilities, which will allow only the most recent slabs to remain in   | 
 | 173 | +        memory.  | 
 | 174 | +
  | 
 | 175 | +          | 
 | 176 | +      # slides: /assets/presentations/...  | 
 | 177 | + | 
 | 178 | +    - title: "Support usage of Thrust API in Clad"  | 
 | 179 | +      speaker:  | 
 | 180 | +        name: "Abdelrhman Elrawy"  | 
 | 181 | +      time_cest: "16:00 - 16:20"  | 
 | 182 | +      description: |  | 
 | 183 | +        This project integrates NVIDIA's Thrust library into Clad, a Clang-based automatic   | 
 | 184 | +        differentiation tool for C++. By extending Clad's source-to-source transformation   | 
 | 185 | +        engine to recognize and differentiate Thrust parallel algorithms, the project   | 
 | 186 | +        enables automatic gradient generation for GPU-accelerated scientific computing   | 
 | 187 | +        and machine learning applications.  | 
 | 188 | +
  | 
 | 189 | +        The project achieved Thrust support in Clad through implementing custom derivatives   | 
 | 190 | +        for core algorithms including thrust::reduce, thrust::transform,   | 
 | 191 | +        thrust::transform_reduce, thrust::inner_product, thrust::copy, scan operations   | 
 | 192 | +        (inclusive/exclusive), thrust::adjacent_difference, and sorting primitives.   | 
 | 193 | +        Additional contributions include Thrust data containers like thrust::device_vector,   | 
 | 194 | +        generic functor handling for transformations, demonstration applications, and   | 
 | 195 | +        comprehensive unit tests.  | 
 | 196 | +          | 
 | 197 | +      # slides: /assets/presentations/...  | 
0 commit comments