Skip to content
Merged
11 changes: 11 additions & 0 deletions _gsocorgs/2025/umanchester.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: "University of Manchester"
author: "Caterina Doglioni"
layout: default
organization: UManchester
logo: UofM-logo.png
description: |
The [University of Manchester](<https://www.manchester.ac.uk>) is a leading UK research university. We have a large particle physics group with contributions to LHC experiments, dark matter, flavour, neutrino and muon experiments. We also carry out research into new detector technologies and new data acquisition strategies for future experiments. We are also involved in distributed computing for LHC experiments, hosting one of the largest and most successful Tier-2 distributed computing centres in the UK.
---

{% include gsoc_proposal.ext %}
69 changes: 69 additions & 0 deletions _gsocproposals/2025/proposal_SMARTHEP_GreenSoftware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Estimating the energy cost of ML scientific software
layout: gsoc_proposal
project: SMARTHEP
year: 2025
organization:
- UManchester
- CERN
difficulty: medium
duration: 350
mentor_avail: June-October (with 2-3 weeks mentor vacation where student will work independently with minimal guidance)
---
# Description

At a time where “energy crisis” is something that we hear daily,
we can’t help but wonder whether our research software can be made more sustainable,
and more efficient as a byproduct.
In particular, this question arises for ML scientific software used in high-throughput scientific
computing, where large datasets composed of many similar chunks are analysed with similar operations
on each chunk of data.
Moreover, CPU/GPU-efficient software algorithms are crucial for the real-time data selection (trigger)
systems in LHC experiments,
as the initial data analysis necessary to select interesting collision events
is executed on a computing farm located at CERN that has finite CPU resources.

The questions we want to start answering in this work are:
* what is the trade off between performance of a ML algorithm and its energetic efficiency?
* can small efficiency improvements in ML algorithms running on Large Hadron Collider data
have a sizable energetic impact?
* how do these energy efficiency improvements vary
when using different computing architectures (1) and/or job submission systems (2)?

## Task ideas

The students in this project will use metrics from the [Green Software Foundation](<https://greensoftware.foundation>)
and from other selected resources to estimate the energy efficiency of machine learning software from LHC experiments
(namely, top tagging using ATLAS Open data) and from machine learning algorithms for data compression
(there is another GSoC project developing this code, called Baler).
This work will build on previous GSoC / Master's thesis work, and will expand these results for GPU architectures.
If time allows, the student will then have the chance to make small changes to the code
to make it more efficient, and evaluate possible savings.

## Expected results and milestones

* Understand and summarise the metrics for software energy consumption, focusing on computing resources at CERN;
* Become familiar with running and debugging the selected software frameworks and algorithms;
* Set up tests and visualisation for applying metrics to the selected software
* Run tests and visualise results (preferably using a Jupyter notebook)
* Vary platforms and job submission systems
* Identify possible improvements, apply them, and run tests again

## Requirements

* Python
* git
* Jupyter notebooks
* PyTorch or equivalent ML toolkit
* Desirable: code profiling experience

## Mentors

* **[Caterina Doglioni](mailto:caterina.doglioni@cern.ch)**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **[Caterina Doglioni](mailto:caterina.doglioni@cern.ch)**
* **[Caterina Doglioni](mailto:caterina.doglioni@cern.ch)**

If possible add a second mentor, even just as backup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

* **[Tobias Fitschen](mailto:tobias.fitschen@cern.ch)** as backup mentor
* **[James Smith](mailto:james.smith-7@manchester.ac.uk)** as backup mentor

## Links

* (1) [Green Software Foundation course](<https://learn.greensoftware.foundation/>)
* (2) [Code by the previous GSoC student](<https://summerofcode.withgoogle.com/archive/2023/projects/Nks9akq7>)
3 changes: 3 additions & 0 deletions gsoc/2025/mentors.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ layout: plain
* Lukas Breitwieser [lukas.johannes.breitwieser@cern.ch](mailto:lukas.johannes.breitwieser@cern.ch) CERN
* Andy Buckley [andy.buckley@gla.ac.uk](mailto:andy.buckley@gla.ac.uk) UofGlasgow
* Vipul Cariappa [vipulcariappa@gmail.com](mailto:vipulcariappa@gmail.com) CompRes
* Caterina Doglioni [caterina.doglioni@cern.ch](mailto:caterina.doglioni@cern.ch) UManchester
* Mateusz Fila [mateusz.jakub.fila@cern.ch](mailto:mateusz.jakub.fila@cern.ch) CERN
* Tobias Fitschen [tobias.fitschen@cern.ch](mailto:tobias.fitschen@cern.ch) UManchester
* Chris Gutschow [chris.g@cern.ch](mailto:chris.g@cern.ch) UCLondon
* Aaron Jomy [aaron.jomy@cern.ch](mailto:aaron.jomy@cern.ch) CERN/CompRes
* Christina Koutsou [christinakoutsou22@gmail.com](mailto:@christinakoutsou22@gmail.com) CompRes
Expand All @@ -22,6 +24,7 @@ layout: plain
* Felice Pantaleo [felice.pantaleo@cern.ch](mailto:felice.pantaleo@cern.ch) CERN
* Giacomo Parolini [giacomo.parolini@cern.ch](mailto:giacomo.parolini@cern.ch) CERN
* Alexander Penev [alexander.p.penev@gmail.com](mailto:alexander.p.penev@gmail.com) CompRes/University of Plovdiv, BG
* James Smith [james.smith-7@manchester.ac.uk](mailto:james.smith-7@manchester.ac.uk) UManchester
* Mayank Sharma [mayank.sharma@cern.ch](mailto:mayank.sharma@cern.ch) UMich
* Simon Spannagel [simon.spannagel@desy.de](mailto:simon.spannagel@desy.de) DESY
* Graeme Stewart [graeme.andrew.stewart@cern.ch](mailto:graeme.andrew.stewart@cern.ch) CERN
Expand Down