GridShift: Safe-Mode Orchestration for Grid-Aware AI Workloads

A grid-aware AI orchestrator that refuses to act on telemetry it cannot trust.

SCSP Hackathon Submission

Team name: Zama
Track: Electric Grid Optimization
Repository: https://github.com/AYUSHMIT/gridshift-safe-mode

Team members

Ayush Pandey
Arash Rezaee
Zahra Sharifi Soltani
Mehran Sasaninia

What we built

GridShift is a grid-aware AI workload orchestrator for AI/data-center demand under grid stress. It verifies whether telemetry can be trusted before acting. When a controller lies, firmware attestation fails, or reported load diverges from observed load, GridShift enters safe mode and unwinds workloads off the untrusted node instead of freezing them in place.

Datasets / APIs used

Simulated Boston-area grid load model
Simulated AI/data-center workload traces
Simulated controller attestation fields: signature, PCR status, nonce freshness
Simulated reported-vs-observed load mismatch
Optional Anthropic API or OpenAI API for operator briefings
Deterministic offline narrator fallback when no API key is configured

The idea in one line

In this prototype, every simulated data-center controller emits attestation-style fields and signed-style telemetry. The orchestrator cross-checks two independent signals: cryptographic attestation and behavioral consistency. The orchestrator cross-checks two independent signals cryptographic attestation and behavioral consistency (reported vs. observed load). If either fails, the system enters safe mode which UNWINDS workloads off the untrusted node rather than freezing them in place. An LLM-powered incident narrator turns each tick's structured state into a plain-language operator briefing.

Where the AI lives

GridShift's AI component is honest about what it does and does not do:

The AI does not make control decisions. All decisions (run / delay / migrate / block) are made by a deterministic safety layer. Putting an LLM in the control loop of a power grid would be reckless; we don't do that.
The AI generates operator briefings. After each tick, the structured trust state and decision list are passed to an LLM (Claude or GPT, configurable) which produces a 3–5 sentence operator-log-style briefing: what happened, what GridShift did, and what the operator should inspect.
The AI falls back cleanly offline. If no API key is configured or the network is down, a rule-based fallback generates an operator briefing deterministically. The demo works with or without Wi-Fi.

This separation AI for explanation, not decision is itself part of the pitch. It's the right architectural pattern for AI in critical infrastructure.

Why safe mode unwinds instead of freezes

An early version of the design blocked all migrations involving an untrusted node. A hardware-security supervisor pointed out this could be weaponized: an adversary with a workload on a DC could trigger a false attestation failure, and then inflate the load on that DC, using the migration block to trap their own inflated workload in place and create a DoS against the grid. The refined design below addresses this directly.

Refined safety policy:

Migration INTO an untrusted node → blocked (never place new work on a dubious node).
Migration OUT OF an untrusted node → allowed and preferred (reduce exposure; unwind).
Grid-side observed load is always authoritative for hard safety limits, independent of trust state. If observed utilization on any node exceeds 75%, an unwind-migration is emitted regardless of what the controller reports.
Safe mode is an investigate-and-unwind state, not a freeze.

Setup

python -m venv .venv
source .venv/bin/activate           # Windows: .venv\Scripts\activate
pip install -r requirements.txt
streamlit run app.py

Important: launch with streamlit run app.py, not python app.py. The latter runs the script in "bare mode" — Streamlit's UI functions become no-ops, nothing renders, and you'll see missing ScriptRunContext! warnings in the terminal. After running the correct command, open the printed Local URL (typically http://localhost:8501) in a browser.

Optional: enable the LLM narrator

Without an API key, the AI panel runs a deterministic rule-based fallback. To use an actual LLM:

export ANTHROPIC_API_KEY=sk-ant-...   # preferred
# or
export OPENAI_API_KEY=sk-...
streamlit run app.py

See .env.example for configurable model IDs. At a venue with unreliable Wi-Fi, force offline mode with export GRIDSHIFT_FORCE_FALLBACK=1.

Running the core loop without the UI

python -m core.orchestrator

This runs the supervisor-scenario attack end-to-end and prints the unwind behavior.

Module tests

python -m core.grid_model
python -m core.dc_simulator
python -m core.verifier

Demo scenes (on the dashboard)

The dashboard now has a 🎬 Guided demo with a single ▶ Next demo step button that walks through five steps, plus a 🏆 Load winning scenario shortcut for time-pressured demos. Manual controls are tucked into an expander.

Step 1 Heatwave begins. Baseline grid load rises.
Step 2 AI job burst. A wave of AI jobs lands across the three DCs.
Step 3 Behavioral lie. BOS-1 under-reports by 16 MW. Behavioral monitor catches the mismatch; trust flips to compromised; safe mode ON.
Step 4 Firmware tamper. PCR no longer matches known-good. Attestation catches it even when reported and observed agree.
Step 5 Load spike + unwind (the winning moment). BOS-1 is untrusted AND its real load is inflated. The refined safety layer migrates jobs OFF BOS-1 instead of freezing them the supervisor-scenario DoS is defeated.

UI features for judges

Active attack panel narrates exactly what is being injected at each moment.
Reported vs Observed metrics the heart of the story shown as a 4-up metric row.
Load history chart the 900 MW threshold is a bold red dashed line; safe-mode periods are shaded.
Trust legend explains what sig, pcr, nonce, and mismatch mean.
"Why this happened" box plain-English narration of every decision.
Directional safe mode explicit "Blocked: INTO BOS-1 / Allowed: OUT OF BOS-1" panel.
Naive vs GridShift comparison table shows the value of the directional unwind in one glance.
Reset full demo and Load winning scenario buttons recovery if the demo goes sideways.
Takeaway banner GridShift optimizes when trust holds, and safely unwinds when trust breaks.

Core invariants (refined)

ACCEPT  ⟺  signature_valid ∧ pcr_matches_known_good ∧ nonce_fresh
TRUST   ⟺  ACCEPT ∧ |reported − observed| < ε   (per-node)
DECIDE  ⟺  planner, filtered by directional trust policy
          + observed-load override for hard safety limits

Repo map

gridshift/
├── app.py                      # Streamlit UI
├── core/
│   ├── state.py                # shared types [everyone]
│   ├── grid_model.py           # [Smart Grids]
│   ├── dc_simulator.py         # [Optical / DC]
│   ├── attestation.py          # [HW Security] crypto primitives
│   ├── prover.py               # [HW Security] controller side
│   ├── verifier.py             # [HW Security] orchestrator side
│   ├── behavior_monitor.py     # [System Security]
│   ├── safety.py               # [System Security] decision + directional safety
│   ├── orchestrator.py         # [System Security] main tick loop
│   └── ai_narrator.py          # [System Security] LLM incident briefings
├── data/
│   └── sample_jobs.json
├── .env.example
├── requirements.txt
└── README.md

Threat model coverage

Attack	Behavioral check	Attestation check	Safety policy
Controller lies about load	✔ caught	passes	block-into; unwind-out
Firmware tampered	may pass	✔ caught	block-into; unwind-out
Replay of a captured valid message	may pass	✔ caught (nonce)	block-into; unwind-out
Stolen / spoofed controller identity	may pass	✔ caught (sig)	block-into; unwind-out
DoS via safe-mode weaponization (supervisor scenario)		attacker wants this	✔ defeated unwind + observed-load override

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GridShift: Safe-Mode Orchestration for Grid-Aware AI Workloads

SCSP Hackathon Submission

Team members

What we built

Datasets / APIs used

The idea in one line

Where the AI lives

Why safe mode unwinds instead of freezes

Setup

Optional: enable the LLM narrator

Running the core loop without the UI

Module tests

Demo scenes (on the dashboard)

UI features for judges

Core invariants (refined)

Repo map

Threat model coverage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
core		core
data		data
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

GridShift: Safe-Mode Orchestration for Grid-Aware AI Workloads

SCSP Hackathon Submission

Team members

What we built

Datasets / APIs used

The idea in one line

Where the AI lives

Why safe mode unwinds instead of freezes

Setup

Optional: enable the LLM narrator

Running the core loop without the UI

Module tests

Demo scenes (on the dashboard)

UI features for judges

Core invariants (refined)

Repo map

Threat model coverage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages