# Module: Powering the Future — HPC Workload and Performance Analysis in Fusion Energy Research

## 🌟 Introduction

Welcome to the **Fusion Energy & High-Performance Computing (HPC)** module — an exciting journey into the technologies powering the **future of clean energy** and scientific discovery! ⚛️💻

Fusion energy — the process that powers the Sun — promises a virtually limitless and sustainable source of energy for our planet. Researchers across the globe, including at leading **U.S. Department of Energy (DOE)** laboratories, are working to bring this stellar energy source to Earth. 

However, harnessing fusion isn't easy. It involves **extreme conditions**, **complex physics**, and **massive data volumes**. That's where **High-Performance Computing (HPC)** comes in. DOE labs like [LLNL](https://www.llnl.gov/), [ORNL](https://www.ornl.gov/), and [PPPL](https://www.pppl.gov/) rely on some of the **fastest supercomputers in the world** to simulate plasma behavior, optimize reactor designs, and process experimental data.

---

### 🌐 Why Fusion + HPC?

- 🔬 **Modeling extreme plasma physics** — where temperatures exceed millions of degrees.
- 🧠 **Training AI/ML models** to predict reactor behavior.
- ⚙️ **Designing and testing reactor materials** under intense conditions.
- 📈 **Analyzing massive datasets** from experiments and simulations.

---

### 🖼️ Visual: Fusion & Supercomputing

![Fusion Energy](https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/NIF_target_chamber.jpg/800px-NIF_target_chamber.jpg)
*Image: Target chamber of the National Ignition Facility (NIF), a fusion experiment at LLNL.*  
*Source: Wikimedia Commons*

---

By the end of this module, you'll gain insight into how **HPC accelerates breakthroughs** in fusion research — and how your future work can contribute to this critical mission.

Let’s ignite the stars here on Earth — together. 🌟🌍


## 💡 Goals of this Module

This 3-day project module is designed to help you explore how **fusion energy research** and **high-performance computing** intersect in real-world DOE applications.

By the end of this module, you will be able to:

🔹 **Explain** the fundamentals of fusion energy and how it differs from fission.  
🔹 **Understand** why HPC systems are essential for modeling fusion reactions.  
🔹 **Analyze** synthetic fusion workload data using Python and common scientific libraries.  
🔹 **Visualize** patterns in computational demand across DOE fusion experiments.  
🔹 **Present findings** in a short group report simulating a DOE project debrief.

This project combines **scientific computing**, **data analysis**, and **real-world DOE mission contexts**, giving you a unique opportunity to contribute to one of the most important scientific challenges of our time.



## 🖥️ HPC Systems Used for Fusion Research

To unlock the secrets of fusion, DOE researchers rely on some of the world’s most advanced supercomputers. These systems simulate plasma dynamics, materials stress, and energy confinement in complex fusion devices like **tokamaks** and **laser-based inertial confinement reactors**.

Here are some notable HPC systems and how they contribute to fusion:

### 🔷 NERSC — Perlmutter (LBNL)
- Used for plasma simulation, turbulence modeling, and data analysis.
- Houses both CPU and GPU nodes, ideal for parallel workloads.

### 🔷 Summit (ORNL)
- Powered deep learning efforts in fusion forecasting and control.
- Supported research into magnetic confinement and MHD (magnetohydrodynamic) stability.

### 🔷 Frontier (ORNL)
- World’s first exascale supercomputer (as of 2024).
- Enables multi-scale modeling of fusion materials and magnetic fields.

### 🔷 Cori (LBNL) *(Retired but historically important)*
- Served fusion researchers developing scalable codes like **XGC** and **GTC**.

---

### 🧑‍🔬 Why HPC is Essential for Fusion:
- Fusion involves **nonlinear, multi-physics phenomena**.
- Requires **high spatial and temporal resolution** to simulate plasma behavior.
- Real-time data from experiments can produce **terabytes per run**.
- HPC allows researchers to test **thousands of reactor configurations** virtually.

---

📎 *Further Reading:*
- [Fusion Energy Sciences at DOE](https://science.osti.gov/fes)
- [NERSC Science Highlights](https://www.nersc.gov/news-publications/science-highlights/)



## 📊 Dataset Overview

In this notebook, we’ll use a **synthetic dataset** that mimics real-world fusion HPC workloads submitted to systems like NERSC’s **Perlmutter**. This data simulates what a workload trace might look like over time across multiple DOE experiments.

### 📁 Dataset Columns:
- `JobID`: Unique identifier for each submitted job.
- `Project`: Fusion project or facility (e.g., DIII-D, NSTX-U, NIF).
- `SubmissionTime`: Timestamp of job submission.
- `StartTime`, `EndTime`: Run time markers.
- `NodesRequested`: Number of nodes requested.
- `RuntimeMinutes`: Duration of the job.
- `JobType`: Categorized workload (e.g., simulation, post-processing, ML).
- `Facility`: Computing system used.

---

### 🧪 Project Use:
You will use this dataset to:
- Analyze **resource usage trends**.
- Investigate **which projects use HPC most intensively**.
- Compare **job types and facilities**.
- Explore **optimization strategies** and implications for scheduling.

📌 *Note:* The data has been anonymized and synthesized to protect institutional privacy but reflects real usage patterns.



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Synthetic data sample
data = pd.DataFrame({
    'Application': ['GTC', 'XGC', 'TRANSP', 'GTC', 'XGC', 'GENE'],
    'Cores': [512, 1024, 768, 2048, 1536, 1024],
    'Runtime_hours': [5.2, 3.1, 6.0, 10.5, 7.2, 4.4],
    'Power_kW': [130, 210, 180, 400, 320, 150]
})
data['Energy_kWh'] = data['Runtime_hours'] * data['Power_kW']
data

## 📈 Energy Consumption per Application

Energy usage is a critical factor in HPC operations — especially when considering the **cost and carbon footprint** of running large-scale fusion simulations.

In this section, we will explore:
- How different fusion applications consume energy,
- The computational cost per job type,
- Potential opportunities for **energy-efficient computing**.

---

### 🔬 Simulated Applications
Here are some fusion-related workloads you might see:
- `PlasmaSim`: Models plasma turbulence and instabilities.
- `FusionAI`: Trains ML models on reactor sensor data.
- `ReactMat`: Simulates material degradation under neutron flux.
- `BeamOpt`: Optimizes beam configurations for inertial confinement.
- `DataPost`: Post-processing of experiment output.

Each application has an associated **energy footprint**, estimated using:
```
Energy (kWh) = RuntimeMinutes × PowerPerNode × NodesRequested ÷ 60
```

We’ll use simulated `PowerPerNode` values (in kW) to compute energy for each job and **visualize usage per application**.

📊 *Your Task:* Group jobs by application and compute total and average energy consumption. Then, identify the **most and least energy-efficient apps**.


In [None]:
sns.barplot(x='Application', y='Energy_kWh', data=data)
plt.title('Energy Usage by Fusion Application')
plt.ylabel('Total Energy (kWh)')
plt.xlabel('Application')
plt.grid(True)
plt.show()

## 🧠 Questions for Reflection

As you analyze the fusion HPC workloads, use these questions to guide your thinking and shape your findings:

### 🔎 Scientific Insight
- What types of fusion workloads dominate HPC usage?
- Which experiments appear to be the most resource-intensive?

### ⚡ Energy Awareness
- Which applications consume the most energy per job?
- Are there apps that could benefit from GPU acceleration or code optimization?

### 💡 Efficiency Strategies
- If you were designing a job scheduler, how would you prioritize jobs?
- What trade-offs might exist between runtime, energy, and scientific output?

### 🌱 Sustainability Lens
- How can HPC centers reduce energy usage without compromising science?
- What role might **renewable energy** and **green computing** play in DOE labs of the future?

Use these questions to prepare your **final group reflections** or reports at the end of Day 3.


## 🔍 Efficiency Metrics

Understanding how well compute resources are used is central to optimizing both **cost** and **scientific throughput**.

We’ll define and calculate several key metrics:

---

### 🔧 Runtime Efficiency
Measures how effectively the allocated nodes are used:
```
RuntimeEfficiency = (EndTime - StartTime) / (AllocatedTime)
```
High efficiency means minimal idle time.

---

### ⚡ Energy Efficiency (kWh per core-hour)
```
EnergyPerCoreHour = EnergyConsumed / (RuntimeMinutes × CoresPerNode / 60)
```
Lower values indicate better utilization of compute energy.

---

### 🧠 ML Efficiency (optional)
If your dataset includes `FusionAI` jobs, you may compare:
- **Accuracy per Watt**
- **Training Time per kWh**

---

📌 *Action:* Compute these metrics across job types and identify which workloads show:
- The **best resource utilization**, and
- The **highest energy efficiency**.

Use this to make **data-driven recommendations** to a fictional HPC director on Day 3.


In [None]:
data['Core_Hours'] = data['Cores'] * data['Runtime_hours']
data['kWh_per_CoreHour'] = data['Energy_kWh'] / data['Core_Hours']
data[['Application', 'Core_Hours', 'kWh_per_CoreHour']]

## 📉 Visualization: Energy per Core-Hour

In [None]:
sns.barplot(x='Application', y='kWh_per_CoreHour', data=data)
plt.title('Energy per Core-Hour by Application')
plt.ylabel('kWh / Core-Hour')
plt.xlabel('Application')
plt.grid(True)
plt.show()

## 📌 Takeaways and Implications
- Some applications scale more efficiently than others.
- High energy cost per core-hour may indicate poor scaling or I/O bottlenecks.
- These insights help optimize batch submission strategies and resource allocation.

## 🧪 Challenge Exercise
Use this data to:
1. Identify the most efficient application by core-hour.
2. Suggest how runtime or core count could be adjusted for better energy efficiency.

## 📚 Further Resources
- [NERSC Fusion Workloads](https://www.nersc.gov/science/fusion-energy/)
- [Energy-Aware Scheduling Survey (Kocot et al., 2023)](https://www.mdpi.com/1996-1073/16/2/890)
- [Top500 and Green500 Rankings](https://www.top500.org/lists/green500/2023/06/)