
---

# **CHAPTER 29: AI SYSTEM DESIGN & ARCHITECTURE**

*Architecting Planet-Scale Intelligent Systems*

## **Chapter Overview**

This chapter transitions from implementation details to strategic architecture decisions required by staff+ engineers and architects building AI systems that serve billions of users. We examine real-world systems from Google, Meta, Netflix, and OpenAI, distilling patterns for recommendation, search ranking, autonomous systems, and large language model serving. The focus is on navigating fundamental tensions—latency versus accuracy, cost versus coverage, consistency versus availability—while maintaining the velocity to iterate in production.

**Estimated Time:** 35-45 hours (3 weeks)  
**Prerequisites:** Chapter 19 (ML System Design fundamentals), Chapter 22 (Deployment), Chapter 25 (Transformers), industry experience or equivalent study

---

## **29.0 Learning Objectives**

By the end of this chapter, you will be able to:
1. Lead system design interviews for ML infrastructure, articulating trade-offs between batch and real-time processing
2. Architect feature platforms handling petabyte-scale data with sub-10ms serving latency
3. Design training infrastructure supporting trillion-parameter models with fault tolerance and spot instance optimization
4. Engineer serving systems for heterogeneous workloads (CV, NLP, Tabular) with unified observability
5. Evaluate architectural decisions using total cost of ownership (TCO) and business impact metrics
6. Decompose complex AI products (autonomous vehicles, LLM APIs) into verifiable subsystems

---

## **29.1 The System Design Framework**

#### **29.1.1 Requirements Engineering for ML**

Before writing architecture documents, translate business requirements into technical constraints using the **RAIL** framework:

**R - Requirements (Functional):**
- **Prediction Volume:** Queries per second (QPS), peak vs. steady state, burst patterns (Black Friday, viral content)
- **Latency Constraints:** P50/P99/P99.9 targets (user-facing <100ms, batch acceptable in minutes)
- **Freshness:** Feature staleness tolerance (real-time fraud vs. daily recommendations)
- **Accuracy Threshold:** Business metric elasticity (1% accuracy improvement = $X revenue)

**A - Assets (Data & Models):**
- **Data Volume:** TB/day ingested, retention policies, regulatory constraints (GDPR deletion)
- **Model Complexity:** Parameter count, memory footprint, precision requirements (FP16 vs. INT8)
- ** intellectual Property:** Proprietary data sensitivity, model encryption requirements

**I - Infrastructure Constraints:**
- **Compute Budget:** Capex vs. OpEx preferences, GPU availability, carbon neutrality targets
- **Latency Budget:** End-to-end allocation (feature fetch: 20ms, inference: 50ms, post-processing: 10ms)
- **Availability SLA:** 99.9% (8.76h downtime/year) vs. 99.99% (52m/year), multi-region requirements

**L - Legal & Compliance:**
- **Explainability:** Right to explanation (GDPR), adverse action notices (credit decisions)
- **Bias Auditability:** Demographic parity logging, fairness metric dashboards
- **Data Residency:** EU data stays in EU, healthcare data HIPAA isolation

#### **29.1.2 Architectural Decision Records (ADRs)**

Document irreversible decisions:

```markdown
## ADR 023: Feature Store Selection (Redis vs. DynamoDB)

**Context:** Need <5ms p99 latency for 100k TPS user features  
**Decision:** Redis Cluster with RedisJSON for nested features  
**Consequences:** 
- (+) Sub-millisecond latency, rich data structures
- (-) Operational complexity (sharding, failover), AWS vendor lock-in
- (-) 3x cost vs. DynamoDB at scale
**Alternatives Considered:** DynamoDB DAX (higher latency), Aerospike (smaller ecosystem)
**Revisit When:** Cross-region replication requirements exceed Redis limitations
```

---

## **29.2 Feature Platform Architecture**

#### **29.2.1 The Feature Platform Triangle**

All feature platforms balance three forces:
1. **Freshness:** How recently was the feature computed? (Real-time streaming vs. hourly batch)
2. **Fidelity:** How accurate is the feature? (Exact aggregation vs. approximate sketches)
3. **Footprint:** Storage and compute cost (Materialized views vs. on-demand computation)

**Pattern 1: Lambda Architecture (Batch + Speed Layer)**
```
Raw Events → [Kafka] → [Spark Streaming] → Real-time Features (Redis) → Serving
                     → [Spark Batch]     → Historical Features (S3/Delta) → Training
```
*Use when:* Exact historical consistency required, but real-time approximation acceptable (recommendations).

**Pattern 2: Kappa Architecture (Streaming Only)**
```
Raw Events → [Kafka] → [Flink] → Feature Store (Online/Offline unified)
```
*Use when:* Event sourcing possible, strong consistency between training and serving required (fraud detection).

**Pattern 3: Feature Store 2.0 (Materialized Views)**
```
Offline Store (BigQuery) ←→ Feature Registry ←→ Online Store (Redis)
                                ↑
                         Transformation Logic (DBT/Shared Libs)
```
*Use when:* Feature reuse across teams > 50%, strict versioning requirements.

#### **29.2.2 Case Study: Netflix Recommendation Feature Platform**

Netflix serves 450M+ devices with personalized artwork and rankings.

**Architecture Highlights:**
- **Feature Sources:** Playback events (Kafka), Content metadata (Cassandra), Device signals (real-time)
- **Compute:** Spark for batch (user taste clusters), Flink for real-time (continue-watching position)
- **Storage:** EVCache (Memcached fork) for online, S3 for offline, Iceberg for time-travel
- **Consistency:** **Point-in-time correctness** ensures training uses only features available at prediction time (preventing leakage from future ratings)

**Key Insight:** They separate **Contextual features** (device, time, location - injected at API layer) from **Entity features** (user history, movie attributes - pre-materialized).

#### **29.2.3 Advanced Feature Engineering Patterns**

**Approximate Aggregation for High Cardinality:**
```python
# HyperLogLog for distinct count features (cardinality estimation)
# Use case: "Number of unique genres watched by user in last 30 days"
# Error: ~2%, Memory: 12KB vs GB for exact set

from datasketches import hll_sketch

sketch = hll_sketch(12)  # 2^12 bins
for genre in user_watch_history:
    sketch.update(genre)

# Store sketch bytes in feature store, merge at query time
feature_value = sketch.serialize()
```

**Windowed Aggregation with Watermarks:**
For late-arriving events (mobile offline viewing syncing hours later):
- **Allowed Lateness:** 24 hours for watch events
- **Side Inputs:** Update feature values retroactively, mark with `processing_time` vs. `event_time`
- **Backfill Trigger:** Rewind stream by 7 days if late data exceeds 5% of window

---

## **29.3 Training Infrastructure at Scale**

#### **29.3.1 The Training Orchestration Stack**

**Layer 1: Experiment Management (Control Plane)**
- **Metaflow (Netflix):** DAG abstraction for ML workflows
- **Kubeflow Pipelines:** Kubernetes-native orchestration
- **Ray Train:** Distributed training with auto-scaling

**Layer 2: Distributed Execution (Data Plane)**
```
Scheduler (Airflow/Dagster) 
    → Kubernetes Job 
        → PyTorch DDP / FSDP / DeepSpeed
            → GPU Cluster (A100/H100)
                → Checkpoint Storage (S3/GCS with S3Guard for consistency)
```

**Layer 3: Resource Optimization**
- **Spot Instances:** 70% cost reduction with checkpointing every 15 minutes
- **Multi-Instance Training:** FP8 mixed precision on H100s, gradient compression (1-bit Adam)
- **Elastic Training:** Auto-scale workers based on queue depth (Horovod Elastic)

#### **29.3.2 Case Study: Meta's Ads Ranking Infrastructure**

Meta trains models on billions of examples daily for 3M+ advertisers.

**Scale Metrics:**
- **Data:** 100+ PB training data
- **Models:** 10k+ models trained daily (many small for niche audiences)
- **Latency:** New model deployed globally within 2 hours of training completion

**Architecture:**
- **Data Warehouse:** Hive tables partitioned by (date, advertiser_id)
- **Feature Engineering:** Presto for SQL transformations → TensorFlow Transform for consistency
- **Training:** CPU-heavy data preprocessing → GPU training (A100 pods)
- **Continuous Training:** Online learning updates (Follow-the-Regularized-Leader) between batch retrains

**Fault Tolerance Pattern:**
- **Immutable Training Sets:** Snapshots of features at training start, ensuring reproducibility even if source data changes during 6-hour training job
- **Speculative Execution:** Train same model on two different clusters, take result that finishes first (straggler mitigation)

#### **29.3.3 Large Model Training (LLM Infrastructure)**

For 100B+ parameter models:

**Parallelism Strategy:**
```
Model Parallel (Vertical) → Splits layers across GPUs (pipeline parallelism)
Data Parallel (Horizontal) → Splits batch across nodes (FSDP/DeepSpeed ZeRO-3)
Sequence Parallel → Splits attention heads (tensor parallelism, Megatron-LM)
```

**Memory Optimization Checklist:**
- **Activation Checkpointing:** Trade 30% compute for 70% memory (recompute activations in backward pass)
- **CPU Offloading:** Optimizer states in RAM, computation on GPU (DeepSpeed ZeRO-Offload)
- **Mixed Precision:** BF16 forward/backward, FP32 master weights (1-bit Adam reduces communication)

**Checkpoint Strategy for 1TB+ Models:**
- **Sharded Checkpoints:** Each GPU saves its slice (FSDP)
- **Asynchronous Checkpointing:** Copy to storage in background while training continues
- **Incremental:** Only save changed parameters (LoRA adapters) between base model iterations

---

## **29.4 Serving Architecture Patterns**

#### **29.4.1 The Serving Taxonomy**

| Pattern | Latency | Throughput | Use Case | Example |
|---------|---------|------------|----------|---------|
| **Online (Sync)** | <50ms | 1-10k QPS | User-facing recommendations | Product ranking |
| **Batch (Async)** | Minutes-hours | Millions | Overnight risk scoring | Credit decisions |
| **Streaming** | <100ms | 100k+ events/s | Real-time anomaly detection | Fraud, IoT |
| **Edge** | <20ms | Device-limited | Mobile inference, autonomous cars | Tesla Autopilot |
| **Hybrid** | Variable | Mixed | Complex workflows | LLM Agents |

#### **29.4.2 Case Study: Google Search Ranking**

Google processes 8.5B searches daily with <200ms p95 latency.

**Two-Phase Retrieval:**
1. **Candidate Generation (Recall):** 
   - Dual-encoder neural networks (query/doc embeddings)
   - FAISS/ScaNN for approximate nearest neighbor (ANN) search over billions of docs
   - Returns ~1000 candidates from index of trillions

2. **Ranking (Precision):**
   - Heavy ranker: 500+ features, deep neural net with cross-attention between query and document
   - Re-ranks 1000 → 10 results
   - Multi-objective: Click-through rate, dwell time, authority, diversity

**Caching Strategy:**
- **Query Cache:** 30% of traffic served from cache (popular queries)
- **Model Cache:** Embedding lookup table sharded across TPU pods
- **Stale Model Fallback:** If ranking model fails, serve from candidate generator alone (graceful degradation)

#### **29.4.3 Case Study: Tesla Autopilot (Edge + Cloud Hybrid)**

**Edge Inference (In-Car):**
- **Hardware:** Custom FSD Chip (144 TOPS, 72W power envelope)
- **Model:** HydraNet (multi-task CNN) for object detection, lane prediction, depth estimation
- **Constraints:** <10ms latency for emergency braking (hard real-time)

**Cloud Training:**
- **Data Collection:** Fleet learning from 4M+ vehicles (shadow mode triggers on uncertainty)
- **Trigger:** Disengagement (human took over), new road type detected, or model disagreement
- **Training:** Dojo supercomputer (exaflop-scale) trains on video clips with auto-labeling (3D reconstruction from multiple cameras)

**OTA Updates:**
- Differential compression for model weights (only changed parameters)
- A/B testing in shadow mode before activation

---

## **29.5 Case Study: LLM Serving Infrastructure (ChatGPT Scale)**

#### **29.5.1 The Inference Challenge**

Serving GPT-4 class models (1.8T parameters, Mixture of Experts) requires:
- **Memory:** 8x A100 (80GB) just to hold parameters in FP16
- **Compute:** Thousands of GPUs for 100M+ daily active users
- **Bottleneck:** Memory bandwidth (loading weights for each token), not compute

#### **29.5.2 Architecture Patterns**

**Continuous Batching (In-flight Batching):**
Unlike static batching (wait for 32 requests), vLLM/Orca dynamically add/remove requests from GPU batch as they complete. Increases GPU utilization 10-20x for variable-length generation.

**PagedAttention (vLLM):**
Manage KV cache memory like OS virtual memory (non-contiguous blocks). Reduces memory waste from 60% to <4%, enabling 2-4x higher throughput.

**Speculative Decoding:**
- Small draft model (7B) generates 5 tokens ahead
- Large target model (70B) verifies all 5 in parallel (single forward pass)
- Accept rate ~80% → 2-3x speedup with identical output distribution

**Multi-Query Attention (MQA) / Grouped-Query (GQA):**
Reduce KV cache memory by sharing key/value heads (discussed in Chapter 25). Critical for long-context (100k+ tokens).

#### **29.5.3 Deployment Topology**

```
User Request → Load Balancer (Least-Connections)
    → API Gateway (Rate limiting, API key validation)
        → Routing Logic:
            - Short prompt (<1k tokens) → A100 cluster (fast)
            - Long context (>32k) → H100 cluster (high memory)
            - Code generation → Specialized code-optimized model variant
        → Kubernetes Pod (vLLM inference engine)
            → GPU (TensorRT-LLM optimized)
                → Model Shards (TP=8 for 70B models)
```

**Cost Optimization:**
- **Spot Preemption:** Move inference to on-demand only if spot unavailable (batch jobs can wait)
- **Model Quantization:** AWQ/GPTQ 4-bit quantization reduces memory 4x, minimal quality loss
- **Prefix Caching:** Cache system prompts (common across requests) in KV cache, only compute new tokens

---

## **29.6 Advanced Design Patterns**

#### **29.6.1 The Strangler Fig Pattern**

Migrate from legacy ML system to new architecture incrementally:
1. Shadow mode: New model runs parallel to old, logs only
2. Canary: 1% traffic to new system, monitor for errors
3. **Circuit Breaker:** If new system error rate > 0.1%, automatic fallback to legacy
4. **Dark Launch:** New features computed but not used, validating latency/cost
5. **Cutover:** Shift 100% traffic, keep legacy on standby for 30 days

#### **29.6.2 Multi-Model Ensembles**

**Cascade Architecture:**
```
Light Model (1M params) → Confidence > 0.9? → Serve
                        → Confidence < 0.9? → Heavy Model (1B params)
```
*Use case:* 90% of easy queries handled cheaply, 10% hard queries get heavy compute.

**Mixture of Experts (MoE) Routing:**
Train router to send inputs to specialized models (e.g., code expert, math expert, creative writing expert). Gating mechanism ensures only 2/8 experts active per token.

#### **29.6.3 Event-Driven ML (Kappa for Inference)**
```
User Action → Kafka → Feature Enrichment (Flink) → Model Inference (KServe) 
    → Action taken → Feedback loop (reward signal) → Online Learning
```
Enables reinforcement learning loops with <100ms end-to-end latency.

---

## **29.7 Workbook Labs**

### **Lab 1: Design YouTube Recommendation System**
Design for 2B users, 1B hours watched daily:

**Requirements:**
- P99 latency < 100ms
- Handle viral videos (10x traffic spike in 5 minutes)
- Diversity: Don't show same channel repeatedly
- Freshness: New videos (<1 hour) should be recommendable

**Deliverables:**
- Data flow diagram (candidate generation → ranking → re-ranking)
- Feature list with online/offline categorization
- Scaling calculation: QPS, storage, cost estimate
- Failure mode analysis: What if candidate generator returns empty?

### **Lab 2: LLM Serving Cost Analysis**
Compare serving strategies for 10k QPS LLM API:

1. **Option A:** Dedicated A100 cluster (always-on)
2. **Option B:** Serverless (AWS SageMaker) with auto-scaling
3. **Option C:** Mixed (spot instances for batch, on-demand for real-time)

Calculate TCO for 1 year, including:
- GPU costs (reserved vs. on-demand)
- Networking (egress charges)
- Engineering maintenance overhead

**Deliverable:** Cost comparison spreadsheet with breakeven analysis.

### **Lab 3: Feature Platform Migration**
Design migration from batch-only to real-time features for fraud detection:

**Current State:** Daily batch features, 24-hour delay in fraud detection  
**Target State:** <5 minute feature freshness  
**Constraints:** Cannot lose historical data, must maintain training-serving consistency

**Deliverables:**
- Migration plan with rollback strategy
- Dual-write pattern (write to old and new simultaneously during transition)
- Validation strategy (compare batch vs. real-time feature values for 30 days)

### **Lab 4: Multi-Region AI Architecture**
Design for EU data residency requirements with US model training:

**Challenge:**
- Training data (EU users) cannot leave EU
- But models trained on global data perform 15% better
- Inference must be <50ms globally

**Deliverables:**
- Data flow diagram showing GDPR-compliant pipeline
- Model sharding strategy (EU-specific vs. Global models)
- Consistency mechanism for user traveling EU→US (feature synchronization)

---

## **29.8 Common Pitfalls**

1. **Over-Engineering for Scale:** Building for 1B users when you have 10k. **Solution:** "Solve for 10x, design for 100x, worry about 1000x later." Start with monolith, split when team size forces it.

2. **Ignoring the Long Tail:** Optimizing P50 latency while P99 suffers (bad for user experience). **Solution:** Always measure P99/P99.9, use tail latency hedging (send request to two backends, use first response).

3. **Training-Serving Skew Redux:** Even with feature stores, subtle differences in data pipelines (Python 3.9 vs. 3.10 float precision). **Solution:** Integration tests on production data samples, schema validation with strict types.

4. **Cold Start Neglect:** New users/items have no features. **Solution:** Content-based features for cold items, onboarding flows for cold users, exploration-exploitation trade-offs (multi-armed bandits).

5. **Monitoring Blind Spots:** Tracking infrastructure metrics (CPU/GPU) but not business metrics (revenue per prediction). **Solution:** Unified dashboard with both technical and business KPIs.

---

## **29.9 Interview Questions**

**Q1:** Design a Twitter feed ranking system (home timeline) for 500M DAU.
*A: Hybrid push-pull model. For celebrities (>1M followers): Fanout-on-write (push) to dedicated followers, materialized in timelines. For normal users: Fanout-on-read (pull), rank at request time. Candidate generation: Follow graph + interest graph embeddings (approximate nearest neighbors). Ranking: Heavy GBDT/NN with real-time features (recency, engagement likelihood). Write path: Tweet → Kafka → Compute embeddings → Update follower timelines (async). Read path: Fetch timeline IDs → Multi-get from Redis → Rank → Return. Handle spikes: Rate limiting, eventual consistency acceptable (5s delay OK).*

**Q2:** How would you design a system to detect AI-generated text at internet scale?
*A: Architecture: Distributed inference on text streams. Models: Lightweight logistic regression on stylometric features (fast filter) → Transformer classifier (heavy verification) for suspicious content. Features: Perplexity (but modern LLMs have low perplexity), burstiness (human writing has variance in sentence length), watermark detection (if model provider embedded signals). Scale: Spark Streaming for batch processing of archived content, edge deployment for real-time chat filtering. Challenges: Adversarial attacks (paraphrasing breaks detection), false positives (penalizing non-native speakers), evasion via iterative refinement. Mitigation: Ensemble of detectors, human review queue for borderline cases.*

**Q3:** Compare microservices vs. monoliths for ML serving infrastructure.
*A: Monolith: Single deployable unit containing feature engineering, inference, post-processing. Pros: Easy testing, no network latency between stages, simple debugging. Cons: Tech lock-in, cannot scale components independently (if feature store needs 10x but model doesn't). Microservices: Separate services for Features, Inference, Business Logic. Pros: Independent scaling, team autonomy, technology heterogeneity (Python for ML, Go for API). Cons: Network overhead, distributed tracing complexity, cascading failures. Hybrid: "Modular monolith"—logically separate but same process, async communication via queues for heavy tasks. Choose microservices when teams > 50 engineers, otherwise monolith.*

**Q4:** Design data infrastructure for autonomous vehicles (Tesla scale).
*A: Ingestion: 8 cameras × 30fps × 4M cars = massive bandwidth. Edge filtering: Only upload "interesting" clips (disengagements, uncertainty triggers, new scenarios) via LTE/WiFi, not raw stream. Cloud: Object storage (S3) for video, Parquet for CAN bus telemetry. Labeling: Automated (3D reconstruction from fleet data) + Human-in-loop for edge cases. Training: Distributed training on video clips, curriculum learning (easy scenarios first). Simulation: Parallel simulation of safety-critical scenarios (rare events) to augment real data. Validation: Shadow mode in production fleet (model runs but doesn't act), compare to human driver. Rollout: A/B test by geographic region, weather condition.*

**Q5:** How do you handle model updates without downtime in a real-time bidding (RTB) system processing 1M QPS?
*A: Blue-green deployment with traffic mirroring. New model (green) deployed to separate pod pool. Shadow traffic: Duplicate 1% of production traffic to green, compare predictions (log differences). Canary: Shift 1% traffic to green, monitor business metrics (CTR, revenue) for 30 minutes. If metrics neutral or positive, gradual shift 10% → 50% → 100%. Rollback: Instant via load balancer if error rate > threshold. Data consistency: Feature store versioned, new model consumes v2 features while old consumes v1 during transition. State management: Model weights in shared memory (mmap) for zero-copy hot-swapping in some frameworks.*

---

## **29.10 Further Reading**

**Books:**
- *Designing Data-Intensive Applications* (Kleppmann) - Stream processing, consistency models
- *The Data Warehouse Toolkit* (Kimball) - Dimensional modeling for ML features

**Papers:**
- "Monolith: Real Time Recommendation System With Collisionless Embedding Table" (Meta, 2022) - End-to-end unified system
- "GPU Cluster Scheduling for Deep Learning" (Microsoft Philly)
- "Efficient Large Scale Language Model Training on GPU Clusters Using Megatron-LM" (NVIDIA)

**Architecture References:**
- **Netflix Tech Blog:** "How Netflix Scales Its ML Infrastructure"
- **Uber Engineering:** "Michelangelo" (ML Platform case study)
- **OpenAI:** "Scaling Kubernetes to 7,500 Nodes" (LLM training infrastructure)

---

## **29.11 Checkpoint Project: Planet-Scale System Design**

Design a complete AI platform for a global food delivery company (DoorDash/UberEats scale).

**Context:**
- 50M daily active users, 10M restaurants, 5M drivers
- Real-time ETA prediction (food preparation + travel time)
- Dynamic pricing (surge during demand spikes)
- Fraud detection (stolen cards, fake restaurants)

**Requirements:**

1. **Feature Platform:**
   - Real-time: Driver GPS (5s freshness), Restaurant busy status
   - Batch: User taste profiles, neighborhood demand patterns
   - Consistency: Training data must reflect features available at order time (point-in-time correctness)

2. **Model Portfolio:**
   - ETA: GBDT (lightweight, explainable) for base, Neural net for refinement
   - Pricing: Contextual bandit (explore-exploit) for surge multiplier
   - Fraud: Deep neural net with embeddings for transaction sequences

3. **Serving:**
   - ETA: <50ms p99 (user waiting for quote)
   - Pricing: <10ms (blocking order confirmation)
   - Throughput: 100k orders/minute peak

4. **Reliability:**
   - Multi-region (US-East, US-West, EU)
   - Automatic failover if ETA model fails (fallback to historical averages)
   - Circuit breakers for external APIs (restaurant POS systems)

5. **Observability:**
   - Business metrics: Orders per hour, customer satisfaction, driver utilization
   - ML metrics: ETA prediction error (MAE), fraud catch rate, false positive rate
   - Cost tracking: Cost per prediction, total inference spend vs. revenue

**Deliverables:**
- Architecture diagram (C4 model: Context, Containers, Components, Code)
- Technology selection matrix with 3 alternatives per component
- Capacity planning spreadsheet (QPS → GPU/CPU count → Cost)
- Runbook: "Incident Response: ETA Service Outage During Peak Dinner"
- 30-minute presentation to "VP of Engineering" justifying architectural trade-offs

**Evaluation Criteria:**
- Defensible technology choices (Redis vs. Memcached, Spark vs. Flink)
- Clear failure modes and mitigation strategies
- Cost efficiency (<5% of order value spent on ML infrastructure)
- Compliance: PCI-DSS for payment fraud, GDPR for EU data

---

**End of Chapter 29**

*You have mastered AI system architecture at scale. Chapter 30 covers Building Production AI Portfolios—end-to-end projects that demonstrate your capabilities.*

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../6. Advanced_topics_and_research/28. ai_safety_alignment_and_robustness.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='30. building_production_ai_portfolio.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
