# Module 5: HPC Workload and Energy Analysis in Fusion Research


## 🎯 Learning Outcomes

By the end of this project, you will be able to:

- Investigate HPC resource utilization patterns in fusion energy research using Python.
- Interpret performance data and scaling behaviors across fusion workloads at NERSC.
- Collaborate on proposing optimizations and insights to improve the efficiency of fusion simulations.


In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Synthetic dataset mimicking HPC fusion workloads
df = pd.DataFrame({
    'JobID': range(1, 11),
    'Project': ['DIII-D', 'NSTX-U', 'NIF', 'DIII-D', 'NIF', 'NSTX-U', 'DIII-D', 'NIF', 'NSTX-U', 'DIII-D'],
    'Facility': ['NERSC', 'Summit', 'NERSC', 'Frontier', 'Frontier', 'Summit', 'NERSC', 'Summit', 'Frontier', 'NERSC'],
    'Cores': [512, 1024, 2048, 1024, 4096, 2048, 512, 1024, 2048, 1024],
    'RuntimeMinutes': [120, 180, 300, 90, 400, 150, 110, 190, 210, 130],
    'JobType': ['Simulation', 'ML', 'Post-processing', 'Simulation', 'ML', 'Simulation', 'Simulation', 'ML', 'Post-processing', 'ML'],
    'Energy_kWh': [150, 320, 600, 130, 950, 500, 145, 315, 525, 330]
})


## 🌟 Introduction


Welcome to the **Fusion Energy & High-Performance Computing (HPC)** module — an exciting journey into the technologies powering the **future of clean energy** and scientific discovery! 

Fusion energy — the process that powers the Sun — promises a virtually limitless and sustainable source of energy for our planet. Researchers across the globe, including at leading **U.S. Department of Energy (DOE)** laboratories, are working to bring this stellar energy source to Earth. 

However, harnessing fusion isn't easy. It involves **extreme conditions**, **complex physics**, and **massive data volumes**. That's where **High-Performance Computing (HPC)** comes in. DOE labs like [LLNL](https://www.llnl.gov/), [ORNL](https://www.ornl.gov/), and [PPPL](https://www.pppl.gov/) rely on some of the **fastest supercomputers in the world** to simulate plasma behavior, optimize reactor designs, and process experimental data.

---

### 🌐 Why Fusion + HPC?

- 🔬 **Modeling extreme plasma physics** — where temperatures exceed millions of degrees.
- 🧠 **Training AI/ML models** to predict reactor behavior.
- ⚙️ **Designing and testing reactor materials** under intense conditions.
- 📈 **Analyzing massive datasets** from experiments and simulations.

---

### 🖼️ Visual: Fusion & Supercomputing

![Fusion Energy](https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/NIF_target_chamber.jpg/800px-NIF_target_chamber.jpg)
*Image: Target chamber of the National Ignition Facility (NIF), a fusion experiment at LLNL.*  
*Source: Wikimedia Commons*

---

By the end of this module, you'll gain insight into how **HPC accelerates breakthroughs** in fusion research — and how your future work can contribute to this critical mission.

Let’s ignite the stars here on Earth — together. 🌟🌍


## 💡 Goals of this Module

This 3-day project module is designed to help you explore how **fusion energy research** and **high-performance computing** intersect in real-world DOE applications.

By the end of this module, you will be able to:

🔹 **Explain** the fundamentals of fusion energy and how it differs from fission.  
🔹 **Understand** why HPC systems are essential for modeling fusion reactions.  
🔹 **Analyze** synthetic fusion workload data using Python and common scientific libraries.  
🔹 **Visualize** patterns in computational demand across DOE fusion experiments.  
🔹 **Present findings** in a short group report simulating a DOE project debrief.

This project combines **scientific computing**, **data analysis**, and **real-world DOE mission contexts**, giving you a unique opportunity to contribute to one of the most important scientific challenges of our time.



## 🖥️ HPC Systems Used for Fusion Research

To unlock the secrets of fusion, DOE researchers rely on some of the world’s most advanced supercomputers. These systems simulate plasma dynamics, materials stress, and energy confinement in complex fusion devices like **tokamaks** and **laser-based inertial confinement reactors**.

Here are some notable HPC systems and how they contribute to fusion:

### 🔷 NERSC — Perlmutter (LBNL)
- Used for plasma simulation, turbulence modeling, and data analysis.
- Houses both CPU and GPU nodes, ideal for parallel workloads.

### 🔷 Summit (ORNL)
- Powered deep learning efforts in fusion forecasting and control.
- Supported research into magnetic confinement and MHD (magnetohydrodynamic) stability.

### 🔷 Frontier (ORNL)
- World’s first exascale supercomputer (as of 2024).
- Enables multi-scale modeling of fusion materials and magnetic fields.

### 🔷 Cori (LBNL) *(Retired but historically important)*
- Served fusion researchers developing scalable codes like **XGC** and **GTC**.

---

### 🧑‍🔬 Why HPC is Essential for Fusion:
- Fusion involves **nonlinear, multi-physics phenomena**.
- Requires **high spatial and temporal resolution** to simulate plasma behavior.
- Real-time data from experiments can produce **terabytes per run**.
- HPC allows researchers to test **thousands of reactor configurations** virtually.

---

📎 *Further Reading:*
- [Fusion Energy Sciences at DOE](https://science.osti.gov/fes)
- [NERSC Science Highlights](https://www.nersc.gov/news-publications/science-highlights/)




## 🇺🇸 National Strategy: The DOE Fusion Energy Vision

The **U.S. Department of Energy (DOE)** has unveiled an ambitious roadmap for commercializing fusion energy, driven by the **Fusion Energy Strategy 2024** and the **Fusion Energy Sciences (FES) Building Bridges Vision**. These strategic blueprints are guiding the U.S. toward realizing commercially viable fusion systems within the next decade.

Key pillars include:

- **Commercial Deployment by the 2030s**: The DOE envisions rapid deployment of pilot plants through public-private partnerships.
- **Closing Critical Technology Gaps**: From materials to reactor design, the strategy identifies shortfalls in confinement, tritium handling, and superconducting magnets.
- **Integrated Research Infrastructure (IRI)**: By linking experimental facilities with HPC and real-time analysis tools like NERSC and ESnet, DOE aims to create an interconnected ecosystem.
- **Workforce and Equity Goals**: Building a diverse, capable workforce through national training programs and inclusive partnerships is central to the plan.

📘 [DOE Fusion Energy Sciences](https://www.energy.gov/science/fes/fusion-energy-sciences)  
📘 [Fusion Energy Strategy 2024 PDF](https://science.osti.gov/-/media/fes/pdf/program-documents/FES-Fusion-Energy-Strategy-2024.pdf)  
📗 White Paper: "Building Bridges: FES Vision" (2023)  
📚 Sweeney, M. A., et al. (2023). *The U.S. Fusion Strategy: From Science to Commercialization*. Fusion Sci. & Tech.



## 🔬 DOE Facilities and Fusion Experiments

The U.S. fusion ecosystem is anchored by world-class experimental facilities operating under the DOE Office of Science. These serve as both testbeds and data generators for fusion science and HPC modeling:

- **DIII-D National Fusion Facility (GA, San Diego)**: The largest operational U.S. tokamak, contributing real-time profile diagnostics, disruption prediction data, and simulation validation.
- **NSTX-U (Princeton Plasma Physics Lab)**: Focused on spherical tokamak physics, currently undergoing major upgrades after hardware failures.
- **NIF (Lawrence Livermore National Laboratory)**: Inertial confinement facility where the first ignition-level fusion event occurred in Dec 2022, producing >3 MJ of energy.

These experiments are deeply integrated with computational workflows. DIII-D, for example, sends real-time diagnostic data via **ESnet** to **NERSC’s Perlmutter**, where plasma reconstructions using MHD codes (e.g., EFIT++, TRANSP) guide experimental decisions within minutes.

📚 References:
- [DIII-D and HPC Integration](https://www.ga.com/diii-d-national-fusion-facility-nersc-and-esnet-collaboration-speeds-nuclear-fusion-research)
- Hittinger, J. A., et al. (2023). *Accelerating Fusion Science with Integrated Infrastructure*. J. Comp. Plasma Physics



## 🔥 Hands-On Activity: Fusion Facility Utilization Heatmap

Using the synthetic dataset, let’s analyze HPC usage patterns across different DOE fusion experiments and facilities.

We’ll create a heatmap showing average runtime (in minutes) by facility and project type.

This helps reveal which fusion centers are most HPC-intensive — useful for resource planning and infrastructure scaling.


In [None]:

# Create a pivot table showing average runtime by Facility and Project
heatmap_data = df.pivot_table(index='Project', columns='Facility', values='RuntimeMinutes', aggfunc='mean')

# Plot the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(heatmap_data, annot=True, cmap="YlGnBu", fmt=".1f")
plt.title("Average Job Runtime by Facility and Project")
plt.ylabel("Fusion Project")
plt.xlabel("Facility")
plt.tight_layout()
plt.show()



## 🤖 AI & Machine Learning in Fusion Research

Artificial Intelligence (AI) and Machine Learning (ML) are emerging as powerful tools in fusion energy R&D. These techniques are being used to:

- **Predict Disruptions**: ML models trained on DIII-D plasma shots predict disruptions with >90% accuracy (Kates-Harbeck et al., 2019).
- **Guide Magnetic Control**: Reinforcement learning (RL) agents are being tested to control plasma shape in real time.
- **Accelerate Simulation Workflows**: Surrogate models reduce compute time for gyrokinetic codes by orders of magnitude.

### 🧪 SciDAC Initiatives
The DOE’s **Scientific Discovery through Advanced Computing (SciDAC)** program supports AI/ML infusion in MHD modeling, turbulence prediction, and materials degradation analysis.

Discussion Prompt:
> *How could AI be deployed in fusion reactors to make control decisions under uncertainty? What tradeoffs exist between accuracy, speed, and interpretability?*

🧠 [Learn more about SciDAC](https://www.scidac.gov/about.html)  
📚 Kates-Harbeck, J., et al. (2019). *Predicting disruptive instabilities in fusion plasmas using deep learning*. Nature.



## 🏛️ Policy, Startups & Commercialization

Fusion is no longer a distant ambition—it's an active area of commercialization with strong federal backing.

### 🔹 DOE Support
- **$1.4B in milestone-based grants** support U.S. fusion startups advancing toward pilot plants by the 2030s.
- **Clean energy tax credit eligibility** (post-IRA) supports demo plant financing.
- **TINEX** (Tokamak Innovation Network for Experimentation) is driving U.S. tokamak engineering innovation.

### 🔹 Leading Private Companies
- **Commonwealth Fusion Systems (CFS)** – MIT spinoff targeting net energy by 2026.
- **Helion Energy** – Field-reversed configuration with private PPAs.
- **Zap Energy, TAE, General Fusion** – Each pursuing alternate confinement approaches.

### 🔹 Workforce Initiatives
- DOE and NSF fund workforce pipelines like the **FIRE Collaboratives** and **SCGSR Fellowship**.
- Broader inclusion goals target MSIs and underrepresented groups.

Discussion Prompt:
> *Should the government continue subsidizing fusion if early commercial plants prove viable? What role does regulation play in ensuring safety and public trust?*

📗 [CATF Analysis on U.S. Fusion Policy](https://www.catf.us/2025/06/fusion-energy-opportunities-federal-action-support-energy-innovation-commercialization/)
📊 [DOE FOA 2025](https://science.osti.gov/grants/FOAs/-/media/grants/pdf/foas/2025/DE-FOA-0003516-000002.pdf)



## 🧑‍🔬 Capstone Option: DOE HPC Advisory Project

This project gives you the opportunity to act as a strategic advisor to a national fusion initiative.

### 📝 Instructions:
Choose one DOE-funded fusion project (e.g., DIII-D, NSTX-U, FIRE, Helion) and:

1. **Research its experimental goals and reactor design.**
2. **Analyze its synthetic or real HPC workload profile.**
3. **Identify key performance challenges (latency, data volume, scalability).**
4. **Recommend enhancements using DOE HPC resources (NERSC, Summit, Frontier) and ML techniques.**

📈 Considerations:
- How would AI-based diagnostics reduce HPC load?
- Can real-time edge computing assist in-loop analysis?
- Are Superfacility tools (ESnet + NERSC) sufficient?

📌 Deliverables:
- One infographic or 1-slide advisory report.
- 2–3 paragraph memo to the DOE Office of Fusion Energy Sciences.

🌐 [Explore Projects and Collaborations](https://usfusionenergy.org)
📚 Case Study: NERSC-DIII-D Workflow [GA Article](https://www.ga.com/diii-d-national-fusion-facility-nersc-and-esnet-collaboration-speeds-nuclear-fusion-research)



## 🤖 Mini ML Activity: Predicting Runtime from HPC Inputs

Fusion workloads vary based on the number of cores used and project complexity. In this hands-on activity, we’ll use a **linear regression model** to predict job runtime using core count and project type.

This gives a basic example of how machine learning can support intelligent job scheduling or forecasting.


In [None]:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Prepare the dataset
X = df[['Cores', 'Project']]
y = df['RuntimeMinutes']

# One-hot encode project types
encoder = OneHotEncoder(sparse=False)
X_encoded = encoder.fit_transform(X[['Project']])
X_full = pd.concat([pd.DataFrame(X_encoded, columns=encoder.get_feature_names_out()), X['Cores'].reset_index(drop=True)], axis=1)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_full, y, test_size=0.2, random_state=42)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Test MSE: {mse:.2f}")


## 📊 Dataset Overview

In this notebook, we’ll use a **synthetic dataset** that mimics real-world fusion HPC workloads submitted to systems like NERSC’s **Perlmutter**. This data simulates what a workload trace might look like over time across multiple DOE experiments.

### 📁 Dataset Columns:
- `JobID`: Unique identifier for each submitted job.
- `Project`: Fusion project or facility (e.g., DIII-D, NSTX-U, NIF).
- `SubmissionTime`: Timestamp of job submission.
- `StartTime`, `EndTime`: Run time markers.
- `NodesRequested`: Number of nodes requested.
- `RuntimeMinutes`: Duration of the job.
- `JobType`: Categorized workload (e.g., simulation, post-processing, ML).
- `Facility`: Computing system used.

---

### 🧪 Project Use:
You will use this dataset to:
- Analyze **resource usage trends**.
- Investigate **which projects use HPC most intensively**.
- Compare **job types and facilities**.
- Explore **optimization strategies** and implications for scheduling.

📌 *Note:* The data has been anonymized and synthesized to protect institutional privacy but reflects real usage patterns.



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Synthetic data sample
data = pd.DataFrame({
    'Application': ['GTC', 'XGC', 'TRANSP', 'GTC', 'XGC', 'GENE'],
    'Cores': [512, 1024, 768, 2048, 1536, 1024],
    'Runtime_hours': [5.2, 3.1, 6.0, 10.5, 7.2, 4.4],
    'Power_kW': [130, 210, 180, 400, 320, 150]
})
data['Energy_kWh'] = data['Runtime_hours'] * data['Power_kW']
data

## 📈 Energy Consumption per Application

Energy usage is a critical factor in HPC operations — especially when considering the **cost and carbon footprint** of running large-scale fusion simulations.

In this section, we will explore:
- How different fusion applications consume energy,
- The computational cost per job type,
- Potential opportunities for **energy-efficient computing**.

---

### 🔬 Simulated Applications
Here are some fusion-related workloads you might see:
- `PlasmaSim`: Models plasma turbulence and instabilities.
- `FusionAI`: Trains ML models on reactor sensor data.
- `ReactMat`: Simulates material degradation under neutron flux.
- `BeamOpt`: Optimizes beam configurations for inertial confinement.
- `DataPost`: Post-processing of experiment output.

Each application has an associated **energy footprint**, estimated using:
```
Energy (kWh) = RuntimeMinutes × PowerPerNode × NodesRequested ÷ 60
```

We’ll use simulated `PowerPerNode` values (in kW) to compute energy for each job and **visualize usage per application**.

📊 *Your Task:* Group jobs by application and compute total and average energy consumption. Then, identify the **most and least energy-efficient apps**.


In [None]:
sns.barplot(x='Application', y='Energy_kWh', data=data)
plt.title('Energy Usage by Fusion Application')
plt.ylabel('Total Energy (kWh)')
plt.xlabel('Application')
plt.grid(True)
plt.show()

## 🧠 Questions for Reflection

As you analyze the fusion HPC workloads, use these questions to guide your thinking and shape your findings:

### 🔎 Scientific Insight
- What types of fusion workloads dominate HPC usage?
- Which experiments appear to be the most resource-intensive?

### ⚡ Energy Awareness
- Which applications consume the most energy per job?
- Are there apps that could benefit from GPU acceleration or code optimization?

### 💡 Efficiency Strategies
- If you were designing a job scheduler, how would you prioritize jobs?
- What trade-offs might exist between runtime, energy, and scientific output?

### 🌱 Sustainability Lens
- How can HPC centers reduce energy usage without compromising science?
- What role might **renewable energy** and **green computing** play in DOE labs of the future?

Use these questions to prepare your **final group reflections** or reports at the end of Day 3.


## 🔍 Efficiency Metrics

Understanding how well compute resources are used is central to optimizing both **cost** and **scientific throughput**.

We’ll define and calculate several key metrics:

---

### 🔧 Runtime Efficiency
Measures how effectively the allocated nodes are used:
```
RuntimeEfficiency = (EndTime - StartTime) / (AllocatedTime)
```
High efficiency means minimal idle time.

---

### ⚡ Energy Efficiency (kWh per core-hour)
```
EnergyPerCoreHour = EnergyConsumed / (RuntimeMinutes × CoresPerNode / 60)
```
Lower values indicate better utilization of compute energy.

---

### 🧠 ML Efficiency (optional)
If your dataset includes `FusionAI` jobs, you may compare:
- **Accuracy per Watt**
- **Training Time per kWh**

---

📌 *Action:* Compute these metrics across job types and identify which workloads show:
- The **best resource utilization**, and
- The **highest energy efficiency**.

Use this to make **data-driven recommendations** to a fictional HPC director on Day 3.


In [None]:
data['Core_Hours'] = data['Cores'] * data['Runtime_hours']
data['kWh_per_CoreHour'] = data['Energy_kWh'] / data['Core_Hours']
data[['Application', 'Core_Hours', 'kWh_per_CoreHour']]

## 📉 Visualization: Energy per Core-Hour

In [None]:
sns.barplot(x='Application', y='kWh_per_CoreHour', data=data)
plt.title('Energy per Core-Hour by Application')
plt.ylabel('kWh / Core-Hour')
plt.xlabel('Application')
plt.grid(True)
plt.show()

## 📌 Takeaways and Implications
- Some applications scale more efficiently than others.
- High energy cost per core-hour may indicate poor scaling or I/O bottlenecks.
- These insights help optimize batch submission strategies and resource allocation.

## 🧪 Challenge Exercise
Use this data to:
1. Identify the most efficient application by core-hour.
2. Suggest how runtime or core count could be adjusted for better energy efficiency.

## 📚 Further Resources
- [NERSC Fusion Workloads](https://www.nersc.gov/science/fusion-energy/)
- [Energy-Aware Scheduling Survey (Kocot et al., 2023)](https://www.mdpi.com/1996-1073/16/2/890)
- [Top500 and Green500 Rankings](https://www.top500.org/lists/green500/2023/06/)


## 🧪 Interactive Exercise: DOE Project Scenario Analyzer

As part of your capstone, choose a fusion project and analyze its synthetic workload profile.

Here’s an example: Use the filter below to focus on jobs from the **DIII-D** project and generate basic statistics.


In [None]:

# Choose a project to simulate (e.g., DIII-D)
project_filter = 'DIII-D'
project_data = df[df['Project'] == project_filter]

# Summary stats
print(f"Summary for project: {project_filter}")
display(project_data.describe(include='all'))

# Plot job types
plt.figure(figsize=(8, 4))
sns.countplot(data=project_data, x='JobType')
plt.title(f"Job Types for {project_filter}")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()



## 👥 Group Project: Insights and Optimization for Fusion Workloads

A
## 👥 Collaborative Group Project: Performance & Energy Strategy for Fusion Science at NERSC

### Project Brief

You are part of a research computing team tasked with analyzing and improving the efficiency of fusion energy simulations at NERSC. Based on workload analysis data, your team will produce actionable insights that improve how HPC systems are used in fusion science, balancing performance and sustainability.

### Project Goals

Your team will:

1. **Analyze Fusion HPC Workloads**:
   - Examine metrics such as CPU hours, GPU utilization, job scaling, memory usage, and I/O performance.
   - Identify inefficiencies, bottlenecks, or underutilized resources.

2. **Model Energy Usage and Carbon Impact**:
   - Estimate energy consumption and emissions using job characteristics and sustainability calculators.
   - Compare resource needs across different workloads or user groups.

3. **Recommend Performance Optimizations**:
   - Suggest ways to improve job throughput, scaling efficiency, or data handling.
   - Explore tools like job profiling, containerization, or using newer architectures.

4. **Design a Sustainability Strategy**:
   - Draft a policy or workflow recommendation that supports energy-efficient computing in fusion science.
   - Include tradeoffs between precision, performance, and energy use.

### Deliverables

- A 5–7 slide group presentation that includes:
  - Your analysis findings and visualizations
  - A proposed optimization strategy
  - Reflections on the balance between scientific needs and sustainability
- Code snippets, charts, or pseudocode demonstrating your data handling and analysis
- Optionally: a written 1-page summary or infographic

Use Python tools like `pandas`, `matplotlib`, `plotly`, `numpy`, or `dask`. External libraries like `carbonai`, `pyJoules`, or `line_profiler` are also encouraged if relevant.
