|
| 1 | +title: "Results from CERN Summer School 2025: Supporting Automatic |
| 2 | +Differentiation in CMS Combine profile likelihood scans" |
| 3 | +layout: post |
| 4 | +excerpt: "A CERN Summer Student 2025 project aiming at the support of |
| 5 | +automatic differentiation (AD) for likelihood scans in the CMS Combine |
| 6 | +tool to accelerate statistical inference by leveraging RooFit's AD |
| 7 | +support and LLVM-based gradient generation." |
| 8 | +sitemap: false |
| 9 | +author: Galin Bistrev |
| 10 | +permalink: blogs/2025_galin_bistrev_results_blog/ |
| 11 | +banner_image: /images/blog/banner-cern.jpg |
| 12 | +date: 2025-09-25 |
| 13 | +tags: cern cms root combine c++ RooFit automatic-differentiation |
| 14 | +--- |
| 15 | + |
| 16 | +### **Introduction** |
| 17 | +Greetings! I’m Galin Bistrev, a fourth-year student specializing in |
| 18 | + Nuclear and Particle Physics at the University of Sofia "St. Kliment Ohridski." |
| 19 | +As part of the CERN Summer Student Programme 2025, I was working on a |
| 20 | +project that aimed to provide support for Automatic Differentiation |
| 21 | +(AD) into the CMS Combine tool profile likelihood scans. |
| 22 | + |
| 23 | +Mentors: Jonas Rembser, Vassil Vasilev, David Lange |
| 24 | + |
| 25 | +### **Description of the Project** |
| 26 | + |
| 27 | +This project aims to enhance support for Automatic Differentiation (AD) |
| 28 | +in likelihood scans within the CMS Combine framework, the primary |
| 29 | +statistical analysis tool of the CMS experiment at CERN. Combine is |
| 30 | +built on top of RooFit, which has recently introduced AD to improve |
| 31 | +minimization techniques. By providing computationally efficient |
| 32 | +gradients through AD, RooFit achieves substantial performance |
| 33 | +improvements. In RooFit, Clad converts internal likelihood |
| 34 | +representations into standalone C++ code, from which gradient |
| 35 | +routines for AD are generated. This strategy not only speeds up the |
| 36 | +fitting process but also increases the portability and shareability |
| 37 | +of likelihood models, making them usable even by those without |
| 38 | +detailed knowledge of RooFit or Combine internals. |
| 39 | + |
| 40 | +### **Brief overview of the CMS Combine engine** |
| 41 | +Combine is a statistical analysis framework that compares models of |
| 42 | +expected observations with real data. It is widely used for tasks such |
| 43 | +as searching for new particles or processes, setting limits on |
| 44 | +potential new physics, and measuring physical quantities like cross-sections. |
| 45 | +Although developed with High Energy Physics (HEP) |
| 46 | +applications in mind, Combine contains no intrinsic physics assumptions, |
| 47 | +making it fully general and independent of any specific analysis. |
| 48 | +This flexibility allows it to be applied across a broad range of |
| 49 | +statistical problems. |
| 50 | + |
| 51 | +Roughly, Combine performs three main functions: |
| 52 | + |
| 53 | +- Builds a statistical model of expected observations. |
| 54 | +- Runs statistical tests comparing the model with observed data. |
| 55 | +- Provides tools for validating, inspecting, and understanding both the |
| 56 | +model and the results of the statistical tests. |
| 57 | + |
| 58 | +### **Project goals** |
| 59 | + |
| 60 | +In order for AD to be supported in Combine likelihood scans, a number of goals needed to be achieved: |
| 61 | + |
| 62 | +- Refactoring some of Combine's logic into RooFit, so that Combine can |
| 63 | +reuse the AD-enabled minimization algorithm already present there. |
| 64 | +- Integrate gradient computation into likelihood scans, ensuring that |
| 65 | +derivatives are correctly propagated for efficient and accurate minimization. |
| 66 | +- Validate correctness and performance, confirming that the AD-based |
| 67 | +scans produce results consistent with traditional methods while |
| 68 | +offering improved performance. |
| 69 | + |
| 70 | +## **Overview of Completed Work** |
| 71 | +Over the course of the project, several major tasks were completed to achieve the stated objectives: |
| 72 | + |
| 73 | +- Imported the `RooMultiPdf` class in RooFit from Combine, enabling |
| 74 | +switching between multiple PDF-s, applying statistical penalties, |
| 75 | +and supporting code generation for AD. |
| 76 | + |
| 77 | +- The implementation of the new class was made to be supported by |
| 78 | +`codegen` in RooFit by adding a new function in `MathFunc.h` and |
| 79 | +extending `CodegenImpl.cxx` to generate code for models making use of it. |
| 80 | + |
| 81 | +- Imported three pieces of code from Combine that handle the |
| 82 | +minimization procedures within the framework in RooFit's `RooMinimizer.cxx`. |
| 83 | +The first is a class imported by Jonas Rembser |
| 84 | +called `FreezeDisconnectedParametersRAII`, which automatically |
| 85 | +freezes and unfreezes parameters disconnected from the likelihood graph. |
| 86 | +The second is the function `generateOrthogonalCombinations`, which |
| 87 | +generates a list of index combinations by initializing a base |
| 88 | +configuration with all indices set to zero and then varying one category at a time. |
| 89 | +The third and final piece of code is a function called `reorderCombinations`, |
| 90 | +which takes the set of indices produced by `generateOrthogonalCombinations` |
| 91 | +and adjusts each combination by adding the corresponding base values |
| 92 | +modulo the maximum allowed index, effectively shifting the combinations |
| 93 | +relative to the current best indices. |
| 94 | + |
| 95 | +- Using the above-stated functions, the discrete profiling algorithm, |
| 96 | +which is the main minimization algorithm in Combine, was imported |
| 97 | +into `RooMinimizer.cxx`. |
| 98 | +- A [tutorial](https://root.cern/doc/master/rf619__discrete__profiling_8py.html) |
| 99 | +was created along with a [benchmark](https://github.com/vgvassilev/clad/issues/1521), |
| 100 | +made by Jonas Rembser, demonstrating discrete profiling with RooMultiPdf objects |
| 101 | +and evaluating the performance of AD in the likelihood scans. |
| 102 | + |
| 103 | +## **Results** |
| 104 | +With those objectives accomplished, RooFit now provides AD support for |
| 105 | +discrete profiling. However, the developed benchmark indicates that AD |
| 106 | +does not currently improve efficiency, as the gradient code generated by |
| 107 | +Clad introduces overhead. Further optimization in Clad is needed to achieve |
| 108 | +the potential performance gains for RooFit likelihood scans. More information |
| 109 | +regarding the issue can be found at [#1521](https://github.com/vgvassilev/clad/issues/1521). |
| 110 | + |
| 111 | +## **Conclusions** |
| 112 | +Thanks to this project, RooFit now enables AD support for discrete profiling in Combine, |
| 113 | +which, after addressing the current overhead in Clad, would allow for |
| 114 | +significantly faster and more efficient likelihood scans while maintaining |
| 115 | +accurate optimization of both discrete and continuous parameters. |
| 116 | + |
| 117 | +## **Future work** |
| 118 | +- Further benchmarking is required to quantify the potential performance |
| 119 | +gains from automatic differentiation. |
| 120 | +- Additional optimization of Clad is needed to eliminate unnecessary |
| 121 | +overhead in gradient generation. |
| 122 | +- The discrete profiling logic implemented in RooMinimizer should be |
| 123 | +tested across different models to evaluate the minimizer’s behavior and |
| 124 | +robustness. |
| 125 | +- Extend doxygen documentation of RooMinimizer to describe treatment of discrete |
| 126 | +parameters. |
| 127 | +- Test if the implementation of discrete profiling works also inside CMS Combine , |
| 128 | +replacing their implementation in `CascadeMinimizer.cxx`. |
| 129 | + |
| 130 | +## **Acknowledgements** |
| 131 | +I would like to express my sincere gratitude to the CERN Summer School |
| 132 | +for the opportunity to participate in such an inspiring project. |
| 133 | +I extend special thanks to Jonas Rembser, Vassil Vassilev, and David Lange for |
| 134 | +their invaluable guidance and for providing continuous learning opportunities throughout this journey. |
| 135 | +I am also grateful to the ROOT team for welcoming me and supporting me throughout my stay at CERN. |
| 136 | + |
| 137 | +## **Related Links** |
| 138 | +- [CMS Combine GitHub page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/) |
| 139 | +- [ROOT official repository](https://github.com/root-project/root) |
| 140 | +- [My GitHub profile](https://github.com/GalinBistrev2) |
| 141 | +- [Presentation](/assets/presentations/CaaS_Weekly_25_09_2025_Galin_Bistrev_AD_in_CMS_Combine.pdf) |
| 142 | + |
| 143 | + |
| 144 | + |
0 commit comments