Skip to content

Enterprise TCO Calculator for GPU Cloud Decisions - Compare DGX Cloud, On-Premises, and Hyperscaler options. Built with Claude.

License

Notifications You must be signed in to change notification settings

QbitLoop/ai-infra-advisor

Repository files navigation

Next.js 14 TypeScript Tailwind CSS Built with Claude

AI Infrastructure Advisor

Enterprise TCO Calculator for GPU Cloud Decisions

Compare DGX Cloud, On-Premises, and Hyperscaler GPU options through workload-driven analysis.
Uncover hidden costs. Make data-driven infrastructure decisions.

Live DemoFeaturesQuick StartTCO ModelDocs


Why This Exists

The Problem: CTOs and infrastructure leaders face a critical decision when scaling AI workloads. The choice between DGX Cloud, on-premises hardware, or hyperscaler GPU instances involves hidden costs that typical calculators miss:

  • GPU Utilization Waste — On-prem GPUs often run at 40% utilization, but you pay for 100%
  • Engineer Opportunity Cost — ML engineers spending 50%+ of time on infrastructure instead of models
  • Time-to-Production Gap — The 8-month delay from on-prem deployment translates to millions in delayed value

The Solution: AI Infrastructure Advisor provides a workload-first approach that surfaces these "aha moments" alongside traditional TCO metrics, enabling truly informed decisions.


✨ Features

Workload-First Analysis

Start with your AI workloads, not infrastructure specs. The calculator understands:

  • LLM Fine-tuning — Training data, epochs, model sizes
  • RAG & Retrieval — Vector DB sizing, query patterns
  • Inference at Scale — Throughput requirements, latency SLAs
  • Agent Workloads — Multi-model orchestration needs

Industry-Specific Presets

Pre-configured scenarios for:

  • 🏥 Healthcare — HIPAA compliance, medical imaging
  • 🏦 Financial Services — Risk modeling, fraud detection
  • 🏛️ Public Sector — FedRAMP requirements, sovereign data
  • 🏭 Manufacturing — Edge inference, predictive maintenance

3-Tier TCO Model

Comprehensive cost calculation across:

Tier Components Weight
Infrastructure Compute, Storage, Networking 40-60%
Platform Software, Support, Security 20-30%
Operations Labor, Training, Opportunity Cost 20-35%

"Aha Moment" Insights

Surface hidden costs that change the decision:

💡 GPU Idle Time Waste
   $847K/year
   On-premises GPUs typically run at 40% utilization.
   You're paying for 100% but using less than half.

👨‍💻 Engineer Time on Infrastructure
   $375K/year
   Your 3 ML engineers spend ~50% of their time on
   infrastructure, not building models.

⏱️ Time-to-Production Delay Cost
   $1.2M
   On-prem deployment takes 12 months vs 4 months for
   DGX Cloud. That 8-month delay costs $1.2M in delayed value.

Scenario Comparison

Side-by-side analysis of four deployment options:

  1. NVIDIA DGX Cloud — Managed, turnkey solution
  2. On-Premises (DGX) — Maximum control, self-managed
  3. Hyperscaler — AWS/Azure/GCP GPU instances
  4. Current State — Your existing infrastructure baseline

🚀 Live Demo

Try it now: ai-infra-advisor.qbitloop.com


📦 Quick Start

Prerequisites

  • Node.js 18+
  • npm or pnpm

Installation

# Clone the repository
git clone https://github.com/QbitLoop/ai-infra-advisor.git
cd ai-infra-advisor

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:3000 to see the application.

Build for Production

npm run build
npm run start

📊 TCO Model

Infrastructure Costs (40-60% of TCO)

Component DGX Cloud On-Premises Hyperscaler
Compute $236K/node/yr $150K/node/yr* $523K/node/yr
Storage $0.10/GB/mo $0.05/GB/mo $0.12/GB/mo
Networking $0.05/GB egress $0 (internal) $0.09/GB egress

*Amortized over 4 years + data center costs

Platform Costs (20-30% of TCO)

Component DGX Cloud On-Premises Hyperscaler
Software Included $4,500/GPU/yr + MLOps $30K/yr
Support $2K/GPU $3K/GPU $25K/yr
Compliance $25K $75K $50K

Operations Costs (20-35% of TCO)

Role Fully Loaded Cost
ML Engineer $250,000/yr
MLOps Engineer $220,000/yr
DevOps Engineer $180,000/yr

Infrastructure Time Allocation:

  • DGX Cloud: 20% (managed)
  • On-Premises: 55% (heavy burden)
  • Hyperscaler: 40% (medium)

📁 Project Structure

ai-infra-advisor/
├── src/
│   ├── app/                    # Next.js App Router
│   │   ├── page.tsx            # Home (wizard)
│   │   ├── results/page.tsx    # Results dashboard
│   │   └── layout.tsx          # Root layout
│   │
│   ├── components/
│   │   ├── ui/                 # shadcn/ui components
│   │   └── wizard/             # Workload wizard steps
│   │       ├── WorkloadStep.tsx
│   │       ├── ScaleStep.tsx
│   │       ├── ConstraintsStep.tsx
│   │       └── PreviewStep.tsx
│   │
│   └── lib/
│       ├── calculations/       # TCO engine
│       │   ├── types.ts        # Type definitions
│       │   └── tco-engine.ts   # Core calculations
│       └── workloads/          # Workload definitions
│           └── types.ts
│
├── docs/
│   ├── COST-MODEL.md           # Detailed methodology
│   └── guides/
│       └── aiops-101.md        # Educational content
│
└── public/                     # Static assets

📚 Documentation

Document Description
Cost Model Deep Dive Detailed methodology and data sources
AIOps 101 Guide Educational primer on AI infrastructure
Architecture Overview Technical architecture decisions

🎨 Design System

Built following Anthropic Brand Guidelines:

Colors

Token Hex Usage
--foreground #141413 Primary text
--background #faf9f5 Light backgrounds
--primary #d97757 CTAs, highlights
--accent #6a9bcc Links, info states
--success #788c5d Positive indicators

Typography

  • Headings: Poppins (24pt+)
  • Body: System fonts with Lora fallback

🔧 Tech Stack

Layer Technology
Framework Next.js 14 (App Router)
Language TypeScript 5
Styling Tailwind CSS 4
Components shadcn/ui
Deployment GitHub Pages / Vercel

🤝 Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

# Fork the repo, then:
git checkout -b feature/your-feature
npm run lint
npm run build
git commit -m "feat: add your feature"
git push origin feature/your-feature

📄 License

MIT License - see LICENSE for details.


🏗️ Built With

Built with Claude by QbitLoop

Designed for NVIDIA GSI Developer Relations use cases


⭐ Star this repo if you find it useful!

About

Enterprise TCO Calculator for GPU Cloud Decisions - Compare DGX Cloud, On-Premises, and Hyperscaler options. Built with Claude.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published