miniVite Cross-Platform Prediction Server

Overview

Your machine trains a neural net on HPC job runtime data and exposes it as a prediction server. Remote users submit job configs and get back a predicted runtime + scheduling verdict — without needing TensorFlow or the training data.

Remote user A ─┐
Remote user B ─┼──► POST /predict ──► YOUR SERVER ──► model.predict()
Remote user C ─┘                           │
                                           └──► AI scheduler ──► SLURM/PBS

Your machine (server) — one-time setup

pip install -r requirements.txt

# 1. Train the model and save all artifacts
python train_and_save.py

# 2. Start the prediction server
uvicorn prediction_server:app --host 0.0.0.0 --port 8000

Server is now live at http://YOUR_IP:8000

Check it's running:

curl http://localhost:8000/health
# → {"status":"ok","model":"miniVite_pass2"}

See valid machines/apps:

curl http://localhost:8000/schema

Remote user — client setup

# Only needs requests — no TensorFlow required
pip install requests

# Single job
python ai_scheduler.py \
  --server http://localhost:8000/ \
  --machine summit \
  --app miniVite \
  --ranks 64 \
  --nodes 4 \
  --threads 4 \
  --scale 20 \
  --avg-degree 16 \
  --base-time 142

python ai_scheduler.py
--server http://YOUR_SERVER_IP:8000
--machine summit
--app miniVite
--ranks 64
--nodes 4
--threads 4
--scale 20
--avg-degree 16
--base-time 142

Output:

────────────────────────────────────────────────────────
  Job           : (no id)
  Machine / App : summit / miniVite
  Ranks / Nodes : 64 / 4
  Graph scale   : 20M vertices
────────────────────────────────────────────────────────
  Predicted Δ   : 1.32×  (1.122–1.518)
  Est. wall time: 187.4s
  Inference     : 12.3ms
  Verdict       : SCHEDULE_NOW
  Reason        : Predicted cost is low — safe to run immediately.
────────────────────────────────────────────────────────

Batch submission

python ai_scheduler.py \
  --server http://YOUR_SERVER_IP:8000 \
  --batch example_jobs.json

Dry run (predict only, do not submit)

python ai_scheduler.py --server http://YOUR_SERVER_IP:8000 \
  --machine frontier --ranks 256 --scale 80 --dry-run

Use as a Python library

from ai_scheduler import AIScheduler, JobConfig

scheduler = AIScheduler("http://YOUR_SERVER_IP:8000")

job = JobConfig(
    machine="summit", app="miniVite",
    ranks=64, nodes=4, threads_per_rank=4,
    graph_scale_M=20, avg_edges_per_vertex=16,
    base_runtime_s=142.0,
    job_id="my-job-001",
)

result = scheduler.submit(job)
print(result["verdict"])         # "SCHEDULE_NOW"
print(result["est_wall_time_s"]) # 187.4

Files

File	Where it runs	Purpose
`train_and_save.py`	your machine	trains pass-1 + pass-2 model, saves all artifacts
`prediction_server.py`	your machine	FastAPI server, loads model, serves /predict
`ai_scheduler.py`	remote user	client, calls /predict, submits to SLURM
`example_jobs.json`	remote user	sample batch input
`model_output/`	your machine	miniVite_pass2.keras + preprocessor.pkl + feature_names.json

Connecting to a real job scheduler

In ai_scheduler.py, replace the _queue_job method body with your actual SLURM/PBS call:

def _queue_job(self, config: JobConfig, priority: str):
    import subprocess
    cmd = [
        "sbatch",
        f"--job-name={config.app}",
        f"--nodes={config.nodes}",
        f"--ntasks={config.ranks}",
        f"--qos={priority}",
        "run_job.sh",
    ]
    subprocess.run(cmd, check=True)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
miniVite		miniVite
model_output		model_output
=1.3.0		=1.3.0
ai_scheduler.py		ai_scheduler.py
example_jobs.json		example_jobs.json
prediction_server.py		prediction_server.py
preprocessor.py		preprocessor.py
readme.md		readme.md
train_and_save.py		train_and_save.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miniVite Cross-Platform Prediction Server

Overview

Your machine (server) — one-time setup

Remote user — client setup

Batch submission

Dry run (predict only, do not submit)

Use as a Python library

Files

Connecting to a real job scheduler

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

miniVite Cross-Platform Prediction Server

Overview

Your machine (server) — one-time setup

Remote user — client setup

Batch submission

Dry run (predict only, do not submit)

Use as a Python library

Files

Connecting to a real job scheduler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages