<img src=https://www.nersc.gov/_resources/themes/nersc/images/NERSC_logo_no_spacing.svg width="500">

### National Energy Research Scientific Computing Center
#### Introduction to High Performance Computing Bootcamp 2025

# Module 5 — Project 2: Designing a Sustainable & Equitable HPC Future

## 🎯 Learning Outcomes

By the end of this project, you will be able to:
- Interpret energy efficiency and sustainability metrics (Perf/W, PUE, kWh, CO₂e).
- Reason about HPC design trade-offs without running code — using only theory, assumptions, and logic.
- Connect DOE’s sustainability goals to concrete HPC center design and policy decisions.
- Propose equitable, cost-conscious, and environmentally responsible strategies for future supercomputing facilities.
---
## Project Overview

In **Module 5 – Project 2**, you will build on concepts from **Modules 1–4** to create a **theoretical, decision-driven plan** for a high-performance computing environment that is both world-class and environmentally responsible.  
This is a **thinking challenge**, not a coding exercise. You will work from provided data, realistic assumptions, and your own research to make evidence-based recommendations. You’ll reason about **power efficiency (Perf/W), PUE, carbon intensity, and equity implications**—without running any experiments—using clear assumptions and simple calculations

---
### **Option A — HPC Facility Design for Performance & Sustainability**
Your role: act as the **lead architect** for a new TechVille HPC center that must be:
- **Top 20 on the Top500**
- **Top 10 on the Green500**
- A leader in **energy justice and carbon-conscious computing**

You will:
- Determine the **hardware architecture** (CPU, GPU, APU balance, interconnect, memory, storage).
- Plan **facility infrastructure** (cooling type, renewable integration, PUE targets, land requirements).
- Recommend vendors, technologies, and design features that align with sustainability and equity goals.

---

### **Option B — HPC Energy Optimization & Sustainability Policy**
Your role: act as the **chief strategist** for TechVille HPC’s Emerging Technology Division, crafting a **software + hardware optimization roadmap** that:
- Stays within strict **MW power caps**
- Reduces **carbon footprint**
- Preserves **fair access for all users**

You will:
- Identify **software optimizations** (algorithmic efficiency, scaling strategies, I/O reduction) that save energy.
- Recommend **hardware & facility upgrades** to improve performance-per-watt.
- Propose **job scheduling and allocation policies** that account for time-of-use energy costs, grid carbon intensity, and equitable access.

---

## What You’ll Practice
- Translating **technical efficiency metrics** into procurement and policy decisions.
- Connecting **grid realities** to HPC scheduling, cost, and carbon goals.
- Balancing **performance, equity, and environmental impact** under real-world constraints.

---

💡 **Tip:** This is a *strategic design exercise* — your creativity, logical reasoning, and ability to justify trade-offs will be as important as the numbers you present.


## Data & Assumptions (Examples for Use)

## 🌱 Why This Matters in HPC

High-Performance Computing (HPC) is the engine of discovery — but it’s also an energy-hungry beast.  
Every simulation, training run, or data processing job you launch draws from a complex ecosystem of power infrastructure, cooling systems, and the electrical grid.  

In 2025, a **top-tier supercomputer can consume as much electricity as a small town**. That energy footprint is tied not just to **cost**, but also to **carbon emissions**, **grid stability**, and **community impact**.  
With DOE’s push toward **energy justice**, **sustainability**, and **science per kilowatt-hour**, the next generation of HPC professionals must think beyond FLOPS — and design systems that deliver maximum scientific impact with minimal environmental cost.

This is where **data-driven thinking** comes in.

In this exercise, you’ll be handed a realistic set of **facility parameters, job classes, and energy metrics**. Your mission is to reason through trade-offs:
- **When** should jobs run to minimize both **dollars** and **CO₂**?
- How do **cooling choices** and **PUE** shift the sustainability profile?
- Which workloads give you the most **science per unit of energy**?
- If the grid is dirty at noon but clean at midnight, can your scheduler take advantage?

This is exactly the kind of decision-making HPC center directors, facility managers, and DOE program leads face every day.

The assumptions provided here give you a starting point — but you’re encouraged to tweak them to explore **what-if scenarios**.  
In the real world, a small shift in **PUE**, **energy price**, or **job mix** can mean millions of dollars saved and thousands of tons of CO₂ avoided.

In short: **this is not just about running code faster — it’s about running HPC smarter.**

### 🏢 Facility & Power
| Parameter | Value | Notes |
|-----------|-------|-------|
| **IT Load (compute only)** | up to **10 MW** | Peak electrical demand from compute nodes only |
| **PUE (Power Usage Effectiveness)** | Baseline **1.25** | Hot/cold aisle + free cooling |
|  | Alt **1.15** | Advanced liquid cooling in temperate climate |
|  | Alt **1.35** | Hot climate with higher cooling demand |
| **Electricity Price** | $0.10/kWh (flat) | Alternatively, use Time-of-Use: Peak $0.16, Off-peak $0.06 |

---

### 🌍 Grid Carbon Intensity (illustrative)
| Time Period | Carbon Intensity (gCO₂e/kWh) | Notes |
|-------------|------------------------------|-------|
| **Daytime** | 350 | Higher fossil-fuel mix during peak demand |
| **Night / Off-peak** | 220 | Cleaner mix with more baseload & renewables |
| **With Renewables PPA** | 80–120 | For contracted renewable energy blocks |

---

### ⚙️ Job Classes *(derived from Modules 2–3 analysis)*
| Class | Nodes | kW/node | Runtime (h) | Preemptible? | Science units/job *(define)* |
|-------|------:|--------:|------------:|--------------|-----------------------------:|
| **S1: Large Simulation** | 512 | 6.5 | 12 | No | 1 |
| **S2: Diagnostics Batch** | 64 | 4.2 | 4 | Yes | 50 shots |
| **S3: ML Training** | 128 | 7.0 | 6 | Yes | 200 epochs |

💡 *Science units/job* is a **custom metric** you define to represent completed scientific output — e.g., simulations, processed datasets, or training epochs.

---

### 📐 Core Metrics & Formulas

### Core Metrics & Formulas

Let  
T = runtime (hours),  
N = number of nodes,  
P_node = average power per node (kW),  
U = average node utilization (0–1, default 1),  
PUE = Power Usage Effectiveness (≥1),  
I_kgkWh = grid carbon intensity (kg CO₂e per kWh),  
I_gkWh  = grid carbon intensity (g CO₂e per kWh),  
S = science units per job.

**IT Energy (kWh)**
$$
E_{IT} = T \times N \times P_{node} \times U
$$

**Facility Energy (kWh)**
$$
E_{facility} = E_{IT} \times PUE
$$

**Carbon Emissions (kg CO₂e)** (choose one form)
$$
\text{if intensity in kg/kWh:}\quad CO_2 = E_{facility} \times I_{kgkWh}
$$
$$
\text{if intensity in g/kWh:}\quad CO_2 = E_{facility} \times \frac{I_{gkWh}}{1000}
$$

**Science per kWh**
$$
\eta_{science} = \frac{S}{E_{facility}}
$$

*Notes:*  
- If power per node is in watts, convert: \(P_{node}\,[\mathrm{kW}] = P_{node}\,[\mathrm{W}]/1000\).  
- Use a **blank line before and after each `$$` block**.  
- Make sure the cell type is **Markdown**, not Code, and do **not** wrap the math in triple backticks.


---

✅ **Your Task:**  
1. Start with the baseline parameters above.  
2. Adjust *one or more* variables (PUE, prices, grid mix, job class runtimes) to explore trade-offs.  
3. Document your reasoning and how your changes affect **energy use, carbon footprint, and science output**.

## 📋 Introduction — Energy, Efficiency, and Strategy in HPC

High-performance computing (HPC) is no longer just about *how fast* we can process data.  
In 2025, *how efficiently* we run workloads — and the infrastructure decisions behind them — are equally important.  
With growing pressure from **energy costs**, **carbon reduction targets**, and **sustainability mandates**, HPC professionals must think beyond performance alone.

In this set of worksheets, you’ll explore the **energy, cost, and carbon trade-offs** faced by real-world HPC facilities.  
You’ll work with realistic, theoretical scenarios to understand how design choices affect:

- **PUE (Power Usage Effectiveness)** and cooling strategies
- **Carbon footprint** under different grid and scheduling patterns
- **Performance-per-watt** for modern CPU, GPU, and hybrid systems

By the end, you’ll see that decisions made at the facility and scheduling level can:
- Reduce operational costs
- Minimize carbon emissions
- Align with DOE and institutional sustainability goals
- Support equitable access to computing resources

These activities are designed to help you *think like an HPC architect and sustainability planner* — connecting the concepts from **Modules 1–4** into practical, big-picture reasoning.

---

### 🛠 The Worksheets You’ll Complete

**Worksheet A1 — Cooling & PUE Scenarios**  
- Compare three facility PUE scenarios for the same IT energy load.  
- See how cooling technology choice impacts overall energy use and costs.  
- Discuss trade-offs in capital expenditure (capex) vs. operational efficiency.

**Worksheet A2 — Time-of-Use Carbon vs Cost**  
- Simulate three scheduling policies with different peak/off-peak usage mixes.  
- Calculate total CO₂ emissions and energy costs.  
- Explore how carbon-aware scheduling can save money *and* reduce footprint.

**Worksheet A3 — Perf/W & Vendor/Node Choice (theoretical)**  
- Compare CPU, GPU, and hybrid node types for different job classes.  
- Reason about performance-per-watt for 2025’s top architectures  
  *(Frontier, Aurora, El Capitan, Perlmutter, Jupiter)*.  
- Justify choices based on algorithm fit, memory bandwidth, and energy efficiency.

💡 **Remember:**  
You are not running experiments — this is a *thinking and reasoning* exercise.  
Document your **assumptions** clearly, use the formulas provided, and connect your conclusions back to the real-world challenges HPC centers face.


## Worksheet A1 — Cooling & PUE Scenarios
Compare three PUE scenarios for the same weekly IT energy:

| Scenario | PUE | Cooling approach | Pros | Cons |
|---|---:|---|---|---|
| A | 1.35 | Traditional air in warm climate | Low capex | High energy & cost |
| B | 1.25 | Optimized air + free cooling | Balanced | Seasonal sensitivity |
| C | 1.15 | Liquid cooling | Best efficiency | Higher capex, facility mods |

**Compute Facility Energy** for each scenario using the same IT load to see the overhead impact.

---

## Worksheet A2 — Time-of-Use Carbon vs Cost
Define two daily blocks (peak/off-peak). Compute **CO₂** and **\$** under three scheduling policies:

| Policy | Description | Peak blocks used | Off-peak blocks used | CO₂ total (kg) | Energy \$ | Notes |
|---|---|---|---|---:|---:|---|
| 1 | Throughput-first FIFO | | | | | |
| 2 | Carbon-aware (off-peak heavy) | | | | | |
| 3 | Blended (priority + carbon + cost) | | | | | |

---

## Worksheet A3 — Perf/W & Vendor/Node Choice (theoretical)
For each job class, argue **CPU vs GPU vs hybrid** based on 2025 architectures (Frontier/Aurora/El Capitan/Perlmutter/Jupiter).  
Summarize where **Perf/W** is best and why (tensor cores, memory BW, comm limits, algorithmic fit).


## Analysis Prompts

1. **PUE inflection:** At what point does improving PUE from 1.25 → 1.15 beat any scheduling trick you tried? Support with kWh math.  
2. **Carbon time-shifting:** Which job classes are best moved off-peak? Which cannot be moved?  
3. **Perf/W crossover:** For which class does GPU Perf/W dominate? Where do comms erase gains?  
4. **Procurement implications:** Would you trade 10% peak perf for 20% better PUE? Why?


## 📋 Introduction — Balancing Performance, Fairness, and Energy Limits in HPC

In large-scale HPC environments, the challenge isn’t just *what* to run — it’s *when* and *how* to run it so that  
performance, fairness, and sustainability goals are all met **simultaneously**.  

As systems grow more powerful, so do their **energy footprints** and **user demand pressures**.  
Schedulers must juggle:
- **Energy caps** (MW limits) to avoid exceeding facility or grid constraints
- **Fair access** across different job classes and scientific priorities
- **Efficiency metrics** such as **Science per kWh** to ensure maximum research output for every unit of energy

These worksheets will put you in the seat of an **HPC scheduling strategist** — where you must think not just about performance, but about *energy justice* and *operational balance*.

---

### 🛠 The Worksheets You’ll Complete

**Worksheet B1 — Job Class & Access Constraints**  
- You’ll work with three distinct job classes (S1, S2, S3) each with:
  - Specific **node counts**, **power per node**, and **runtime**
  - Whether they are **preemptible** or not
  - Minimum guaranteed share and maximum wait times
- Using the provided formulas, calculate:
  - **Per-job facility energy consumption**
  - **Science per kWh** for each job class
- Interpret how different workloads impact total energy usage and fairness.

**Worksheet B2 — Policy Designs**  
- Define three different scheduling approaches:
  1. **FIFO + Backfill** — simple, throughput-first
  2. **Energy-Aware + Fairness** — prioritize by Science/kWh while meeting minimum shares
  3. **Priority Blending** — a weighted score combining priority, efficiency, power draw, and job age
- Create a **weekly coarse-block schedule**:
  - Respect the **MW cap** at all times
  - Ensure **minimum shares** for each job class
  - Balance throughput, fairness, and sustainability

---

💡 **Why This Matters**  
This activity links the core lessons from **Modules 1–4** — power measurement, system performance, and sustainability — into real scheduling and policy decisions.  
It mirrors the complexity faced by **facility managers at DOE, NERSC, and other HPC centers**, where  
technical optimization must also align with:
- Energy cost control
- Carbon reduction commitments
- Equitable access for diverse scientific communities

By completing these worksheets, you’ll gain a practical understanding of **how to design policies that serve both science and sustainability** — a key skill for the future of HPC leadership.


## Project B1 — Job Class & Access Constraints

Add minimum guaranteed shares or max wait times.

| Class | Nodes | kW/node | Runtime (h) | Preemptible? | Min share | Max wait (h) | Science units/job |
|---|---:|---:|---:|---|---:|---:|---:|
| S1 | 512 | 6.5 | 12 | No | 25% | 24 | 1 |
| S2 | 64 | 4.2 | 4 | Yes | 35% | 12 | 50 |
| S3 | 128 | 7.0 | 6 | Yes | 40% | 18 | 200 |

**Per-job energy**  
$$
E_\mathrm{facility} = (\text{Runtime}\times\text{Nodes}\times\text{kW/node})\times\mathrm{PUE}
$$

**Science/kWh**  
$$
\frac{\text{Science units/job}}{E_\mathrm{facility}}
$$

---

## Worksheet B2 — Policy Designs
Define three schedulers:

1. **FIFO + backfill** (baseline)  
2. **Energy-aware + fairness** (order by Science/kWh but enforce min shares & age boosts)  
3. **Priority blending**  
   $$\text{Score} = \alpha\cdot \text{Priority} + \beta\cdot \text{Science/kWh} - \gamma\cdot \text{MW draw} + \delta\cdot \text{Age}$$

Sketch a **weekly plan** (coarse blocks) and show the **MW cap** never exceeded.

| Block | Available MW | Scheduled classes | Est. MW draw | Meets min shares? | Notes |
|---|---:|---|---:|---|---|
| Mon AM | 12 |  |  |  |  |
| Mon PM | 12 |  |  |  |  |
| … |  |  |  |  |  |


# 🏗️ Project 2a — HPC Design Challenge: Designing an Energy-Efficient HPC Center for TechVille HPC

## Objective
Your team has been commissioned by **TechVille HPC**, a forward-thinking start-up, to design a **next-generation HPC data center** that is both **world-class in performance** and a leader in **energy justice and sustainability**.  
The center must rank in the **top 20 of the Top500 list** and in the **top 10 of the Green500 list**, while supporting the **Department of Energy’s science mission**.

**Special constraint:** Fusion energy applications will account for **20% of total compute capacity**. The remaining 80% must accommodate a balanced mix of other DOE science workloads (ASCR, BES, BER, HEP, NP, ARDAP).

You will use the **provided HPC workload datasets** to make data-driven decisions about:
- **System architecture**
- **Energy efficiency**
- **Scheduling policies**
- **Vendor & technology choices**

---

## Instructions

### 1. HPC Center Size & Facility Design
- Determine the **infrastructure requirements** to house a system meeting the performance and efficiency goals.
- Use dataset fields such as `Nodes`, `Power_per_Node_kW`, and `Runtime_h` to estimate:
  - **Total power draw**
  - **Thermal load**
  - **Required floor space**
- Consider **eco-friendly cooling**, renewable power integration, and **community engagement spaces**.

---

### 2. Machine Specifications & Architecture Mix
- Identify the **node types** (CPU, GPU, Hybrid, emerging architectures like Quantum/FPGA) and their proportions.
- Use dataset columns such as:
  - `Node_Type` → architecture mix decisions
  - `Min_Mem_per_Node_GB` → memory configuration
  - `Interconnect_Sensitivity` & `Bottleneck` → topology design
- Ensure that **20% capacity** is optimized for fusion workloads, and the remaining **80%** is tuned for other science areas.

---

### 3. Application & Software Ecosystem
- Analyze dataset fields such as:
  - `Workload` → identify primary science areas to support
  - `Programming_Language`, `Algorithm_Library`, `Parallel_Framework` → determine software stack and optimization targets
- Ensure your design supports:
  - Fusion workloads (simulation, diagnostics, ML disruption prediction)
  - Non-fusion workloads across DOE science areas
- Highlight energy-efficient programming tools and compiler/runtime optimizations.

---

### 4. Scheduling & Policy Integration
- Define a **scheduling policy** that:
  - Reserves 20% of cycles for fusion workloads
  - Maximizes **Science/kWh** for all workloads
  - Balances fairness across science areas
- Use dataset performance data (`Perf_W`, `Work_Units`, `Runtime_h`) to simulate potential scheduling outcomes.
- Decide on preemption, backfill, and priority handling for different workload classes.

---

### 5. Vendor & Technology Analysis
- Map dataset node types and architecture requirements to real-world HPC vendors (e.g., NVIDIA, AMD, Intel, HPE, Lenovo, Atos).
- Research green technology offerings:
  - Energy-efficient GPUs/CPUs
  - Advanced cooling solutions
  - Carbon-aware workload management
- Choose vendors and technologies that best align with energy justice and sustainability goals.

---

### 6. Recommendations & Deliverables
- Summarize your **HPC system design**, including:
  - Architecture mix
  - Node count & configuration
  - Interconnect topology
  - Cooling and power systems
- Present your **scheduling policy** and how it integrates with the system design.
- Justify your vendor and technology choices with data from the provided datasets.
- Include metrics to monitor post-deployment:
  - **MW utilization**
  - **Science/kWh**
  - **Fusion allocation compliance**
  - **Carbon footprint**

---

## Analysis Prompts
When working with the datasets, address:
1. Which workloads have the highest **energy efficiency** and how does this affect architecture design?
2. Which workloads are most sensitive to **interconnect performance**?
3. How does fusion workload scheduling affect **non-fusion throughput**?
4. Which programming languages and frameworks dominate high-efficiency workloads?
5. How should preemptible workloads be used to backfill unused capacity?

---

## Presentation Requirements
Prepare a **10–15 minute presentation** including:
1. **Facility layout & sustainability features**
2. **Architecture design decisions** and justification
3. **Application/workload support strategy**
4. **Scheduling policy overview**
5. **Vendor and technology selection**
6. **Projected rankings** on the Top500 & Green500 lists

---

> **Tip:** Use visualizations (charts, heatmaps, architecture diagrams) generated from the datasets to support your recommendations.


## Section 1: Geographical Distribution and Correlation with Climate Change Indicators

In this section, you want to consider appropriat factors related to geographical conditions and locations.  Specifcally, explore the geographical distribution of the Green500 and Top500 HPC systems and their correlation with climate change indicators.

<img src=https://miro.medium.com/v2/resize:fit:1400/format:webp/1*6LkA4Zq-rUQHF59C7pK46Q.jpeg width="500">

### Research Questions

1. What is the geographical distribution of the Green500 and Top500 HPC systems? Are there regions that have a higher concentration of these systems?
2. How do the locations of these HPC systems correlate with climate change indicators such as temperature and precipitation patterns?
3. Are there specific geographical locations that offer natural cooling advantages for HPC centers?

### Recommended Readings:

1. **Zwan, J.** (Oct 2022). Climate Change Threatens Supercomputings. Science.org. https://www.science.org/content/article/climate-change-threatens-supercomputers.

2. **Frédéric Hourdin et al.** ,Toward machine-assisted tuning avoiding the underestimation of uncertainty in climate change projections.Sci. Adv.9,eadf2758(2023).DOI:10.1126/sciadv.adf2758

3. **Dongarra, J., Deelman, E., Hey, T., Matsuoka, S., Sarakar, V., Bell, G., ... & Yelick, K.** (2023). Can the United States Maintain Its Leadership in High-Performance Computing?-A report from the ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR Office. USDOE Office of Science (SC)(United States). https://www.osti.gov/servlets/purl/1989107

These articles provide insights into the geographical distribution of various species and their correlation with climate change indicators, which can offer a perspective on the potential impacts of HPC systems on global climate patterns.

## Section 2: Correlation with Low-Income Energy Affordability Data and Energy Efficiency Trends

In this section, we will explore how the locations of these HPC systems correlate with low-income energy affordability data and the trend in their energy efficiency.
<img src=https://www.ase.org/sites/ase.org/files/061521_energy_burden_2.png width="500">


### Research Questions to Consider

3. How do the locations of these HPC systems correlate with low-income energy affordability data? Are these systems located in regions where energy is less affordable?

4. What is the trend in energy efficiency of the Green500 and Top500 HPC systems over the years? Is there a significant improvement in energy efficiency?
5. How do energy efficiency trends correlate with advancements in HPC hardware and software technologies?

### Recommended Readings
Based on the topic "Correlation with Low-Income Energy Affordability Data and Energy Efficiency Trends," I've found the following open-access articles that might be relevant:

### Recommended Readings:

1. **Ghezlane Halhoul Merabet, M. Essaaidi, M. Haddou, Basheer Qolomany, Junaid Qadir, M. Anan, A. Al-Fuqaha, M. Abid, & D. Benhaddou** (2021). [Intelligent Building Control Systems for Thermal Comfort and Energy-Efficiency: A Systematic Review of Artificial Intelligence-Assisted Techniques](https://www.frontiersin.org/articles/10.3389/fenrg.2022.832189/pdf). *Journal of Renewable and Sustainable Energy Reviews*. [PDF Link](https://arxiv.org/pdf/2104.02214)

2. **C. Koronen, Max Åhman, & L. Nilsson** (2019). [Data centres in future European energy systems—energy efficiency, integration and policy](https://www.frontiersin.org/articles/10.3389/fenrg.2022.832578/pdf). *Energy Efficiency*. [PDF Link](https://link.springer.com/content/pdf/10.1007/s12053-019-09833-8.pdf)

3. **Min Li, Michael Yao-Ping Peng, Raima Nazar, Bosede Ngozi Adeleye, Meng Shang, & Muhammad Waqas** (2022). [How Does Energy Efficiency Mitigate Carbon Emissions Without Reducing Economic Growth in Post COVID-19 Era](https://www.frontiersin.org/articles/10.3389/fenrg.2022.832189/pdf). *Frontiers in Energy Research*. [PDF Link](https://www.frontiersin.org/articles/10.3389/fenrg.2022.832189/pdf)

These articles provide insights into energy efficiency, its implications, and strategies for improvement, especially in the context of HPC systems.

## Section 3: Social Justice Implications and Strategies for Improving Energy Sustainability

In this section, we will explore the social justice implications of the energy consumption of these HPC systems and potential strategies for improving their energy sustainability.
<img src=https://www.mdpi.com/energies/energies-16-06698/article_deploy/html/images/energies-16-06698-g001.png width="500">


### Research Questions to Consider

5. What are the social justice implications of the energy consumption of these HPC systems? Are they contributing to energy inequality?

6. What are the potential impacts of these HPC systems on local and global climate patterns?

7. What strategies can be implemented to improve the energy sustainability of these HPC systems?
8. How can HPC centers contribute to community engagement and education on energy sustainability?


### Recommended Readings:

1. **Vallarta-Serrano, S. I., Santoyo-Castelazo, E., Santoyo, E., García-Mandujano, E. O., & Vázquez-Sánchez, H.** (2023). [Integrated Sustainability Assessment Framework of Industry 4.0 from an Energy Systems Thinking Perspective: Bibliometric Analysis and Systematic Literature Review](https://dx.doi.org/10.3390/en16145440). *Energies, 16*(14).
   - **Citation**: Vallarta-Serrano, S. I., Santoyo-Castelazo, E., Santoyo, E., García-Mandujano, E. O., & Vázquez-Sánchez, H. (2023). Integrated Sustainability Assessment Framework of Industry 4.0 from an Energy Systems Thinking Perspective: Bibliometric Analysis and Systematic Literature Review. Energies, 16(14).
   - [PDF Link](https://www.mdpi.com/1996-1073/16/14/5440/pdf?version=1689653126)

2. **Davydenko, L., Davydenko, N., Deja, A., Wiśnicki, B., & Dzhuguryan, T.** (2023). [Efficient Energy Management for the Smart Sustainable City Multifloor Manufacturing Clusters: A Formalization of the Water Supply System Operation Conditions Based on Monitoring Water Consumption Profiles](https://dx.doi.org/10.3390/en16114519). *Energies, 16*(11).
   - **Citation**: Davydenko, L., Davydenko, N., Deja, A., Wiśnicki, B., & Dzhuguryan, T. (2023). Efficient Energy Management for the Smart Sustainable City Multifloor Manufacturing Clusters: A Formalization of the Water Supply System Operation Conditions Based on Monitoring Water Consumption Profiles. Energies, 16(11).
   - [PDF Link](https://www.mdpi.com/1996-1073/16/11/4519/pdf?version=1686043641)

3. **De Giovanni, P.** (2023). [Sustainability of the Metaverse: A Transition to Industry 5.0](https://dx.doi.org/10.3390/su15076079). *Sustainability, 15*(7).
   - **Citation**: De Giovanni, P. (2023). Sustainability of the Metaverse: A Transition to Industry 5.0. Sustainability, 15(7).
   - [PDF Link](https://www.mdpi.com/2071-1050/15/7/6079/pdf?version=1680265695)

These articles provide insights into the social justice implications of energy consumption and strategies for improving energy sustainability in various sectors, which can offer a perspective on the potential impacts of HPC systems on global climate patterns and energy inequality.

# 🚀 Project 2B — Energy Optimization Strategies for TechVille HPC's Emerging Technology Division

## Objective
TechVille HPC's Emerging Technology Division is exploring **next-generation HPC energy optimization strategies** in both **software** and **hardware**.  
Your team's challenge is to design a set of **data-driven recommendations** that enable TechVille HPC to remain a leader in computational capability while being a pioneer in **energy efficiency** and **energy justice**.

**Special constraint:** Fusion energy workloads account for **10-15% of total compute capacity**, with the remaining 60% supporting other DOE science workloads (ASCR, BES, BER, HEP, NP, ARDAP).  
Your solutions should ensure both fusion and non-fusion workloads run at maximum **science per kWh**.

You will use the **provided HPC workload datasets** to analyze:
- **Software optimization opportunities**
- **Hardware infrastructure improvements**
- **Scheduling and operational policies**
- **Social and environmental impacts**

---

## Guiding Instructions

### **Section 1: Software-Based Optimization for Energy Efficiency**

1. **Understanding Code Efficiency**
   - Use dataset fields such as `Workload`, `Programming_Language`, `Algorithm_Library`, and `Parallel_Framework` to assess where inefficient coding practices may be affecting performance and energy use.
   - Analyze the relationship between **precision modes** (`Precision` column) and energy efficiency.
   - Quantify the potential **energy cost of inefficiency** using `Perf_W` (performance per watt) and `Runtime_h`.

2. **Strategies for Code Efficiency**
   - Identify programming languages and libraries in the datasets associated with the **highest Perf/W** values.
   - Explore:
     - **Mixed precision computing** for ML and simulation workloads
     - **Kernel fusion and vectorization**
     - **Memory access optimization** for memory-bound workloads
   - Recommend optimization strategies for fusion vs. non-fusion workloads, backed by dataset insights.

---

### **Section 2: Hardware & Power Infrastructure Efficiency**

3. **Measuring Energy Efficiency**
   - From dataset fields like `Nodes`, `Power_per_Node_kW`, and `kWh`, calculate the **energy profile** for different workload classes.
   - Identify workloads and architectures (`Node_Type`) that achieve the **highest Science/kWh**.
   - Compare GPU-, CPU-, and Hybrid-based designs for fusion-heavy workloads vs. others.

4. **Strategies for Infrastructure Efficiency**
   - Based on dataset findings (`Bottleneck`, `Interconnect_Sensitivity`), propose:
     - Network topology adjustments (e.g., high-bandwidth low-latency for comm-bound fusion simulations)
     - Dynamic cooling strategies (liquid vs. air)
     - Peak-load shifting strategies
   - Consider the **impact of architecture mix** on sustainability and performance.

---

### **Section 3: Social Justice & Sustainability Implications**

5. **Understanding Social Justice Implications**
   - Using dataset scheduling metrics (`Priority`, `Suitability`), explore how resource allocation affects different science areas.
   - Evaluate:
     - How prioritizing fusion (40%) impacts other science workloads
     - Equity of access for smaller-scale projects
   - Discuss the environmental and community impacts of your energy plan.

6. **Recommendations for Energy Sustainability**
   - Synthesize your findings into a **balanced strategy** covering:
     - Software optimizations (e.g., code refactoring, precision tuning)
     - Hardware upgrades (e.g., GPU refresh cycles, hybrid systems)
     - Facility-level improvements (e.g., renewable integration, carbon-aware scheduling)
   - Ensure your recommendations are **aligned with DOE mission goals** and TechVille HPC’s sustainability commitments.

---

## Analysis Prompts
When working with the datasets, address:
- Which workloads show the **best Perf/W ratios** and why?
- Which programming languages and frameworks are most associated with **high-efficiency jobs**?
- How does the **fusion workload’s energy profile** differ from other science workloads?
- Which bottlenecks are most common, and how could they be mitigated?
- How does interconnect sensitivity shape hardware and topology design?
- How would a **carbon-aware scheduling policy** affect workload placement?

---

## Deliverables
Prepare a **10–15 minute presentation** that includes:
1. **Software efficiency analysis** and recommendations
2. **Hardware and infrastructure efficiency proposals**
3. **Scheduling policy adjustments** for fusion and non-fusion workloads
4. **Social justice and sustainability considerations**
5. **Projected energy and cost savings**

> **Tip:** Use visualizations from the datasets (e.g., energy efficiency by workload, programming language usage trends, bottleneck distributions) to support your arguments.


## Section 1: Code Optimization for Energy Efficiency

<img src=https://developer-blogs.nvidia.com/wp-content/uploads/2024/05/genai-multi-modal-rag-featured.jpg width="500">

In this section, we will focus on how to optimize code for parallel processing, enabling efficient utilization of system resources. The importance of this topic lies in the fact that inefficient code can lead to excessive energy usage in HPC systems. This not only increases the operational costs but also contributes to the environmental impact of these systems.

By improving code efficiency, we can significantly reduce the energy consumption of these systems. This is crucial in promoting a more sustainable use of computing resources and aligning with the principles of energy justice.

Moreover, writing efficient code for parallel processing is a key skill in the field of HPC. It allows us to make the most out of the available system resources and achieve higher performance. Therefore, understanding the best practices for writing efficient code is essential for anyone working with HPC systems.



### Research Questions

1. How does inefficient code lead to excessive energy usage in HPC systems?
2. How can we improve code efficiency to reduce energy consumption?
3. What are some of the best practices for writing efficient code for parallel processing?

### Recommended Readings


1. **Tipu, A. J. S., Ó Conbhuí, P., & Howley, E.** (2022). [Artificial neural networks based predictions towards the auto-tuning and optimization of parallel IO bandwidth in HPC system](https://dx.doi.org/10.1007/s10586-022-03814-w). *Cluster Computing*. [PDF Link](https://link.springer.com/content/pdf/10.1007/s10586-022-03814-w.pdf)
   
 
2. **Younesi, S., Ahmadi, B., Ceylan, O., & Ozdemir, A.** (2022). [Optimum Parallel Processing Schemes to Improve the Computation Speed for Renewable Energy Allocation and Sizing Problems](https://dx.doi.org/10.3390/en15249301). *Energies*. [PDF Link](https://www.mdpi.com/1996-1073/15/24/9301/pdf?version=1670493597)
   
  

3. **Wei, L., Ning, Z., Quan, L., Wang, A., & Gao, Y.** (2022). [Research on Parameter Matching of the Asymmetric Pump Potential Energy Recovery System Based on Multi-Core Parallel Optimization Method](https://dx.doi.org/10.3390/pr10112298). *Processes*. [PDF Link](https://www.mdpi.com/2227-9717/10/11/2298/pdf?version=1668420648)
 

## Section 2: Energy Efficiency in Power Infrastructure

<img src="https://www.mdpi.com/sustainability/sustainability-15-09487/article_deploy/html/images/sustainability-15-09487-g003.png" width =500>

In this section, we will explore energy efficiency in our power infrastructure and how it is measured and improved. The importance of this topic lies in the fact that HPC systems consume substantial resources, including electricity and cooling infrastructure. These resources are not evenly distributed globally, and their availability can be limited in certain regions.

By understanding how energy efficiency is measured in our power infrastructure, we can identify areas for improvement and implement strategies to increase efficiency. This not only reduces the operational costs but also promotes a more sustainable use of resources.

Furthermore, improving energy efficiency in power infrastructure contributes to energy justice by ensuring that these resources are used in a way that is fair and sustainable. Therefore, understanding the strategies for improving energy efficiency is crucial for anyone working with HPC systems.




### Research Questions

4. How is energy efficiency measured in our power infrastructure?

5. What are some of the strategies for improving energy efficiency in power infrastructure?

6. How does improving energy efficiency in power infrastructure contribute to energy justice and reduce the environmental impact of HPC systems?

### Recommended Readings

1. **Schöne, R., Ilsche, T., Bielert, M., Gocht-Zech, A., & Hackenberg, D.** (2019). [Energy Efficiency Features of the Intel Skylake-SP Processor and Their Impact on Performance](https://dx.doi.org/10.1109/HPCS48598.2019.9188239). *Cluster Computing*. [PDF Link](http://arxiv.org/pdf/1905.12468)
   
  
2. **Sadiq, M., Ali, S. W., Terriche, Y., Mutarraf, M. U., Hassan, M., Hamid, K., Ali, Z., Sze, J., Su, C., & Guerrero, J.** (2021). [Future Greener Seaports: A Review of New Infrastructure, Challenges, and Energy Efficiency Measures](https://dx.doi.org/10.1109/ACCESS.2021.3081430). *IEEE Access*. [PDF Link](https://ieeexplore.ieee.org/ielx7/6287639/9312710/09433559.pdf)
   
 
3. **Borghesi, A., Bartolini, A., Milano, M., & Benini, L.** (2018). [Pricing schemes for energy-efficient HPC systems: Design and exploration](https://dx.doi.org/10.1177/1094342018814593). *The International Journal of High Performance Computing Applications*. [PDF Link](https://arxiv.org/pdf/1806.05135)
   
These articles provide insights into various aspects of energy efficiency in power infrastructure for HPC systems, including strategies for improving energy efficiency, the impact of energy efficiency features on performance, and the economic viability of performance scaling solutions.

## Section 3: Social Justice Implications and Strategies for Improving Energy Sustainability


<img src="https://blog.routledge.com/wp-content/uploads/2024/12/BlogPageBody_Image_showing_energy_crisis_with_fossil_fuels_becoming_scarce_and_rising_CO2_emissions.webp" width=500>

In this section, we will delve into the social justice implications of the energy consumption of HPC systems and potential strategies for improving their energy sustainability. The importance of this topic lies in the fact that the energy consumption of these systems can contribute to energy inequality, particularly in regions where resources are limited.

By understanding the social justice implications of the energy consumption of these systems, we can identify ways to promote energy justice and reduce the environmental impact of these systems. This is crucial in promoting a more sustainable use of computing resources and aligning with the principles of energy justice.

Furthermore, by exploring potential strategies for improving the energy sustainability of these systems, we can contribute to the development of more sustainable HPC systems. Therefore, understanding these strategies is essential for anyone working with HPC systems.



### Research Questions

7. What are the social justice implications of the energy consumption of these HPC systems? Are they contributing to energy inequality?

8. What are the potential impacts of these HPC systems on local and global climate patterns?

9. What strategies can be implemented to improve the energy sustainability of these HPC systems?

### Recommended Readings


1. **Givens, J. E., Padowski, J., Guzman, C., Malek, K., Witinok-Huber, R., Cosens, B., Briscoe, M. D., Boll, J., & Adam, J.** (2018). [Incorporating Social System Dynamics in the Columbia River Basin: Food-Energy-Water Resilience and Sustainability Modeling in the Yakima River Basin](https://dx.doi.org/10.3389/fenvs.2018.00104). *Frontiers in Environmental Science*. [PDF Link](https://www.frontiersin.org/articles/10.3389/fenvs.2018.00104/pdf)
   
 

2. **Romero‑Lankao, P., & Gnatz, D. M.** (2019). [Risk Inequality and the Food-Energy-Water (FEW) Nexus: A Study of 43 City Adaptation Plans](https://dx.doi.org/10.3389/fsoc.2019.00031). *Frontiers in Sociology*. [PDF Link](https://www.frontiersin.org/articles/10.3389/fsoc.2019.00031/pdf)
   
  

3. **Axon, S., & Morrissey, J.** (2020). [Just energy transitions? Social inequities, vulnerabilities and unintended consequences](https://dx.doi.org/10.5334/bc.14). *Buildings and Cities*. [PDF Link](http://journal-buildingscities.org/articles/10.5334/bc.14/galley/46/download/)
 

These articles provide insights into various aspects of social justice implications and strategies for improving energy sustainability in HPC systems, including the impacts of energy consumption on social dynamics, the role of cities in addressing these challenges, and the unintended consequences of energy transitions.