Behind every AI capability is a stack of silicon. GPUs, TPUs, and custom AI accelerators are the physical substrate that makes training and inference possible. Understanding the economics of this infrastructure—who builds it, who controls supply, what it costs—is essential for understanding AI's trajectory.



## The Chip Supply Chain

The production of AI chips involves one of the most complex supply chains humans have ever built:

**Design**: Companies like NVIDIA, AMD, and Google design chip architectures without owning fabrication facilities ("fabless"). NVIDIA designs; someone else manufactures. This requires massive R&D investment and years of development per generation.

**Fabrication**: Taiwan Semiconductor Manufacturing Company (TSMC) produces ~90% of the world's most advanced chips. Samsung is a distant second. Intel is trying to catch up. This concentration creates extreme supply chain risk.

**Equipment**: The machines that make chips are even more concentrated. ASML (Netherlands) is the sole supplier of EUV lithography machines—the tools that print features at the 7nm scale and below. Without ASML, no one makes cutting-edge chips.

**Packaging and assembly**: Advanced packaging (stacking dies, chiplet architectures) happens at TSMC and specialized facilities. This is increasingly the integration point for AI systems.

**End systems**: Cloud providers (AWS, Google Cloud, Azure) and AI labs (OpenAI, Anthropic, DeepMind) assemble chips into datacenters and provide services.

The result: a small number of chokepoints control global AI compute capacity.



## GPU Scarcity

In 2023-2024, NVIDIA's H100 GPUs became famously difficult to obtain. Waiting lists stretched months. Prices spiked. Why?

**Demand surge**: ChatGPT demonstrated that LLMs were ready for prime time. Everyone wanted to train their own. Demand for training compute spiked.

**Supply constraints**: Fabrication capacity is limited. TSMC can only produce so many wafers. NVIDIA prioritized its most profitable customers.

**Hoarding**: Companies bought more than they immediately needed to secure future supply. This amplified scarcity.

**Lead times**: Building fab capacity takes 3-5 years. Supply can't respond quickly to demand spikes.

The scarcity was real, not artificial. NVIDIA's market cap reflects genuine pricing power when supply is constrained.

Implications for AI development:
- **Rich get richer**: Well-funded labs train larger models. Compute-poor researchers fall behind.
- **Cloud concentration**: Access to compute flows through a few large cloud providers.
- **Innovation bottlenecks**: Ideas that require large-scale experiments wait in line for resources.



## Training Cost Scaling

How much does it cost to train a frontier model?

**Rough estimates**:
- GPT-4-class models: $50-100 million in compute
- Gemini Ultra: likely similar or higher
- Claude 3.5: undisclosed but comparable

These costs are dominated by GPU-hours. A training run might use 10,000+ H100s for months.

**Chinchilla scaling laws**: For a given compute budget, there's an optimal split between model size and training tokens. Earlier we overinvested in parameters; now we train smaller models longer. This affects the compute profile but not the total cost.

**FLOP budgets**: Training compute is often measured in floating-point operations. GPT-3 was ~3.6 × 10²³ FLOPs. GPT-4 is estimated at 10²⁵ FLOPs or higher. Each generation is 10-100× more expensive.

At some point, training costs become prohibitive for most organizations. Only well-funded labs can afford to train frontier models from scratch.



## Inference Economics

Training happens once; inference happens every time someone uses the model. As models deploy to millions of users, inference costs dominate.

**Cost per token**: Depends on model size, hardware, and optimization. GPT-4 costs ~$0.03/1K input tokens, $0.06/1K output tokens (API pricing, which includes margin). Open-source models on self-hosted hardware can be 10-100× cheaper.

**Throughput vs. latency**: Batching multiple requests together is efficient (high throughput) but adds latency. Real-time chat needs low latency and sacrifices efficiency.

**Quantization and optimization**: Reducing precision (FP16 → INT8 → INT4) shrinks model size and speeds inference. Distillation creates smaller, faster models. These techniques trade some quality for major efficiency gains.

**Inference chips**: NVIDIA GPUs are great at training but not necessarily optimal for inference. Specialized inference chips (NVIDIA T4, custom ASICs, AWS Inferentia) optimize for cost-per-query.

The inference cost structure matters for business models. Products that use expensive models on long contexts need to charge accordingly or find efficiency gains.



## Cloud vs. On-Prem

Where does AI compute run?

**Hyperscalers (AWS, Azure, GCP)**: Most training and inference runs on cloud infrastructure. Benefits: scale on demand, no capex, managed services. Costs: higher unit prices, vendor lock-in, data egress fees.

**Specialized AI clouds (Lambda Labs, CoreWeave, Together)**: Focus on GPU availability and AI-specific services. Often more competitive pricing for pure compute.

**On-prem and self-hosted**: Large enterprises and research institutions increasingly build their own GPU clusters. Benefits: control, customization, data sovereignty. Costs: capex, operations, utilization risk.

**Edge deployment**: Running small models on devices (phones, laptops, robots). Latency-sensitive or offline applications. Severely constrained by device thermal and power limits.

The trend is bifurcation: massive training runs on hyperscale clouds; inference increasingly distributed to specialized providers and edge devices.



## Energy Constraints

AI datacenters use enormous amounts of power, and the problem is getting worse:

**Power density**: A rack of H100 GPUs uses 10-20kW. A large AI cluster might use hundreds of megawatts—the output of a small power plant.

**Cooling**: Most datacenter energy goes to cooling, not compute. Liquid cooling is increasingly necessary for high-density GPU racks.

**Location matters**: Cheap power and cool climates attract AI infrastructure. Nordic countries, the Pacific Northwest, and locations near hydropower are popular.

**Carbon implications**: Unless powered by renewables, AI training has significant carbon footprint. Large training runs can emit as much CO2 as hundreds of flights.

Power availability is becoming a constraint on datacenter construction. In some regions, there simply isn't enough grid capacity for proposed AI facilities.



## Geopolitics

The concentration of chip supply chains creates geopolitical vulnerabilities:

**Taiwan risk**: TSMC's fabrication facilities are in Taiwan. A Chinese invasion or blockade would devastate global chip supply. The US, Japan, and Europe are investing heavily to diversify, but new fabs take years.

**Export controls**: The US restricts export of advanced chips and chip-making equipment to China. This limits China's ability to train frontier models and slows their AI development—at least for now.

**CHIPS Act**: The US is investing $52B to rebuild domestic semiconductor manufacturing. Intel, TSMC, and Samsung are building fabs on US soil. But these take 3-5 years to complete.

**China's response**: China is investing heavily in domestic alternatives. Huawei's Ascend chips, SMIC's fabrication capabilities. They're behind but working to close the gap.

The AI compute supply chain is one of the most strategically significant industrial bases in the world. Governments treat it accordingly.



## Future Compute

Several trends may change the compute landscape:

**Algorithmic efficiency**: Moore's Law has slowed, but algorithmic improvements continue. Better architectures, training methods, and representations can achieve the same capability with less compute. This is "software eating hardware."

**Sparsity**: Most neural network weights are unnecessary for any given input. Sparse models activate only relevant subsets, dramatically reducing compute. Mixture-of-experts is one form of this.

**Alternative hardware**: Photonic computing, neuromorphic chips, and analog computing offer potential for more efficient inference. Still nascent but under active development.

**Quantum computing**: Theoretical advantages for some problems, but no near-term impact on the main workloads (transformers, diffusion). Quantum machine learning is mostly hype so far.

The most likely scenario: continued GPU/TPU improvement + algorithmic efficiency gains, with specialized accelerators for specific applications.



## Implications

Understanding AI infrastructure economics matters for:

**Researchers**: Access to compute shapes what's possible to study. Know the economics of your cloud provider.

**Startups**: Compute costs are a major expense. Efficiency is a competitive advantage. Don't assume you can just throw money at the problem.

**Investors**: The "picks and shovels" of AI (NVIDIA, TSMC, datacenter REITs) have been fantastic investments. The question is whether this persists.

**Policymakers**: Compute supply chains are strategic. Export controls, subsidies, and supply chain resilience are legitimate policy concerns.

**Everyone**: The physical infrastructure underlying AI determines who can build powerful systems. It's not just about software.

