# üìò FILE 1-D ‚Äì GPU, Debugging & Exercises

## üéØ M·ª•c ti√™u

Sau b√†i n√†y b·∫°n s·∫Ω hi·ªÉu:
- **CPU vs GPU** - Khi n√†o d√πng g√¨
- C√°ch ki·ªÉm tra v√† s·ª≠ d·ª•ng GPU
- **Device placement** - Ch·ªâ ƒë·ªãnh device cho operations
- **Common errors** - L·ªói newbie th∆∞·ªùng g·∫∑p
- **Debugging tips** - K·ªπ thu·∫≠t debug hi·ªáu qu·∫£
- **Best practices** - Quy t·∫Øc vi·∫øt code t·ªët

---

## üìå T·∫°i sao ph·∫£i h·ªçc v·ªÅ GPU?

- Deep Learning c·∫ßn **t√≠nh to√°n kh·ªïng l·ªì**
- GPU nhanh h∆°n CPU **10-100x** cho matrix operations
- Production models **b·∫Øt bu·ªôc** d√πng GPU
- Bi·∫øt GPU ‚Üí ti·∫øt ki·ªám th·ªùi gian training

---

In [None]:
import tensorflow as tf
import numpy as np
import time
import matplotlib.pyplot as plt

print(f"TensorFlow version: {tf.__version__}")

---

## 1Ô∏è‚É£ CPU vs GPU

### üîπ CPU (Central Processing Unit)

**ƒê·∫∑c ƒëi·ªÉm:**
- 4-16 cores m·∫°nh
- T·ªëc ƒë·ªô clock cao (3-5 GHz)
- T·ªëi ∆∞u cho **sequential tasks**

**Khi n√†o d√πng:**
- Small models (< 1M parameters)
- Data preprocessing
- Inference v·ªõi batch size nh·ªè

---

### üîπ GPU (Graphics Processing Unit)

**ƒê·∫∑c ƒëi·ªÉm:**
- 1000+ cores nh·ªè
- T·ªëc ƒë·ªô clock th·∫•p h∆°n (~1-2 GHz)
- T·ªëi ∆∞u cho **parallel tasks**

**Khi n√†o d√πng:**
- Large models (> 10M parameters)
- Matrix operations (convolution, matmul)
- Training v·ªõi batch size l·ªõn

---

### üîπ So s√°nh tr·ª±c quan

```
CPU: [====================] 1 task

GPU: [=][=][=][=][=][=][=][=][=][=]
     [=][=][=][=][=][=][=][=][=][=]  1000+ tasks parallel
     [=][=][=][=][=][=][=][=][=][=]
```

---

---

## 2Ô∏è‚É£ Ki·ªÉm tra GPU

### üîπ C√°c c√°ch ki·ªÉm tra

In [None]:
# C√°ch 1: List t·∫•t c·∫£ physical devices
print("=" * 60)
print("ALL PHYSICAL DEVICES")
print("=" * 60)
devices = tf.config.list_physical_devices()
for device in devices:
    print(f"  {device.device_type:5s}: {device.name}")

# C√°ch 2: Ch·ªâ GPU
print("\n" + "=" * 60)
print("GPU DEVICES")
print("=" * 60)
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    for i, gpu in enumerate(gpus):
        print(f"  GPU {i}: {gpu.name}")
else:
    print("  No GPU found")

# C√°ch 3: Built-in test
print("\n" + "=" * 60)
print("TF BUILT-IN TEST")
print("=" * 60)
print(f"  GPU available: {tf.test.is_built_with_cuda()}")
print(f"  GPU device: {tf.test.gpu_device_name()}")

In [None]:
# Ki·ªÉm tra chi ti·∫øt GPU (n·∫øu c√≥)
if gpus:
    gpu_details = tf.config.experimental.get_device_details(gpus[0])
    print("GPU Details:")
    for key, value in gpu_details.items():
        print(f"  {key}: {value}")
else:
    print("No GPU available - running on CPU")
    print("Don't worry! Code v·∫´n ch·∫°y b√¨nh th∆∞·ªùng, ch·ªâ ch·∫≠m h∆°n.")

---

## 3Ô∏è‚É£ GPU Memory Management

### üîπ V·∫•n ƒë·ªÅ: GPU memory kh√¥ng ƒë·ªß

**M·∫∑c ƒë·ªãnh:** TensorFlow chi·∫øm **to√†n b·ªô** GPU memory

**V·∫•n ƒë·ªÅ:**
- Kh√¥ng ch·∫°y ƒë∆∞·ª£c nhi·ªÅu process
- Out of Memory (OOM) errors

### üîπ Gi·∫£i ph√°p: Memory Growth

In [None]:
# Enable memory growth
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("Memory growth enabled")
        print("TensorFlow s·∫Ω ch·ªâ d√πng memory khi c·∫ßn")
    except RuntimeError as e:
        print(f"Error: {e}")
else:
    print("No GPU available")

In [None]:
# Gi·ªõi h·∫°n memory c·ª• th·ªÉ (v√≠ d·ª•: 2GB)
if gpus:
    try:
        # Uncomment ƒë·ªÉ test
        # tf.config.set_logical_device_configuration(
        #     gpus[0],
        #     [tf.config.LogicalDeviceConfiguration(memory_limit=2048)]  # MB
        # )
        print("Memory limit: 2GB (n·∫øu uncomment)")
    except RuntimeError as e:
        print(f"Error: {e}")
else:
    print("No GPU available")

---

## 4Ô∏è‚É£ Device Placement

### üîπ T·ª± ƒë·ªông vs Th·ªß c√¥ng

**M·∫∑c ƒë·ªãnh:** TensorFlow t·ª± ch·ªçn device
- GPU n·∫øu c√≥
- CPU n·∫øu kh√¥ng

**Th·ªß c√¥ng:** D√πng `tf.device()`

In [None]:
# Ki·ªÉm tra device m·∫∑c ƒë·ªãnh
x = tf.constant([[1.0, 2.0], [3.0, 4.0]])
print(f"Default device: {x.device}")

In [None]:
# Force CPU
with tf.device('/CPU:0'):
    x_cpu = tf.constant([[1.0, 2.0], [3.0, 4.0]])
    y_cpu = tf.matmul(x_cpu, x_cpu)

print(f"x_cpu device: {x_cpu.device}")
print(f"y_cpu device: {y_cpu.device}")

In [None]:
# Force GPU (n·∫øu c√≥)
if gpus:
    with tf.device('/GPU:0'):
        x_gpu = tf.constant([[1.0, 2.0], [3.0, 4.0]])
        y_gpu = tf.matmul(x_gpu, x_gpu)
    
    print(f"x_gpu device: {x_gpu.device}")
    print(f"y_gpu device: {y_gpu.device}")
else:
    print("No GPU available - skipping GPU test")

### üîπ Benchmark: CPU vs GPU

In [None]:
# Function ƒë·ªÉ benchmark
def benchmark_matmul(device_name, matrix_size, num_iterations=100):
    """Benchmark matrix multiplication"""
    with tf.device(device_name):
        # Create random matrices
        a = tf.random.normal([matrix_size, matrix_size])
        b = tf.random.normal([matrix_size, matrix_size])
        
        # Warm up
        _ = tf.matmul(a, b)
        
        # Benchmark
        start = time.time()
        for _ in range(num_iterations):
            c = tf.matmul(a, b)
        end = time.time()
        
    avg_time = (end - start) / num_iterations * 1000  # ms
    return avg_time

# Test v·ªõi nhi·ªÅu sizes
sizes = [100, 500, 1000, 2000]
cpu_times = []
gpu_times = []

print("Benchmarking CPU...")
for size in sizes:
    cpu_time = benchmark_matmul('/CPU:0', size, num_iterations=10)
    cpu_times.append(cpu_time)
    print(f"  Size {size:4d}: {cpu_time:6.2f} ms")

if gpus:
    print("\nBenchmarking GPU...")
    for size in sizes:
        gpu_time = benchmark_matmul('/GPU:0', size, num_iterations=10)
        gpu_times.append(gpu_time)
        print(f"  Size {size:4d}: {gpu_time:6.2f} ms")
else:
    print("\nNo GPU available - skipping GPU benchmark")
    gpu_times = [0] * len(sizes)

In [None]:
# Visualization
if gpus and any(gpu_times):
    plt.figure(figsize=(12, 5))
    
    plt.subplot(1, 2, 1)
    x = np.arange(len(sizes))
    width = 0.35
    plt.bar(x - width/2, cpu_times, width, label='CPU', alpha=0.8)
    plt.bar(x + width/2, gpu_times, width, label='GPU', alpha=0.8)
    plt.xlabel('Matrix Size')
    plt.ylabel('Time (ms)')
    plt.title('CPU vs GPU Performance')
    plt.xticks(x, sizes)
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    plt.subplot(1, 2, 2)
    speedup = [cpu / gpu if gpu > 0 else 0 for cpu, gpu in zip(cpu_times, gpu_times)]
    plt.plot(sizes, speedup, 'go-', linewidth=2, markersize=8)
    plt.xlabel('Matrix Size')
    plt.ylabel('Speedup (CPU time / GPU time)')
    plt.title('GPU Speedup')
    plt.grid(True, alpha=0.3)
    plt.axhline(y=1, color='r', linestyle='--', alpha=0.5, label='No speedup')
    plt.legend()
    
    plt.tight_layout()
    plt.show()
    
    print("\nSpeedup:")
    for size, speed in zip(sizes, speedup):
        print(f"  Size {size:4d}: {speed:.2f}x faster")
else:
    print("No GPU available - skipping visualization")

---

## 5Ô∏è‚É£ Common Errors & Solutions

### ‚ùå Error 1: Shape Mismatch

In [None]:
# SAI
try:
    a = tf.constant([[1, 2, 3]])
    b = tf.constant([[1], [2]])
    c = tf.matmul(a, b)
except Exception as e:
    print(f"Error: {e}")
    print(f"\nNguy√™n nh√¢n: a.shape={a.shape}, b.shape={b.shape}")
    print(f"Matmul y√™u c·∫ßu: (m, n) @ (n, p) = (m, p)")

In [None]:
# ƒê√öNG
a = tf.constant([[1, 2, 3]])
b = tf.constant([[1], [2], [3]])
c = tf.matmul(a, b)

print(f"a.shape: {a.shape}")
print(f"b.shape: {b.shape}")
print(f"c.shape: {c.shape}")
print(f"Result: {c.numpy()}")

### ‚ùå Error 2: Dtype Mismatch

In [None]:
# SAI
try:
    a = tf.constant([1, 2, 3], dtype=tf.int32)
    b = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
    c = a + b
except Exception as e:
    print(f"Error: {e}")
    print(f"\nNguy√™n nh√¢n: a.dtype={a.dtype}, b.dtype={b.dtype}")

In [None]:
# ƒê√öNG - C√°ch 1: Cast
a = tf.constant([1, 2, 3], dtype=tf.int32)
b = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
c = tf.cast(a, tf.float32) + b
print(f"Result: {c.numpy()}")

# ƒê√öNG - C√°ch 2: D√πng c√πng dtype t·ª´ ƒë·∫ßu
a = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
b = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
c = a + b
print(f"Result: {c.numpy()}")

### ‚ùå Error 3: Gradient is None

In [None]:
# SAI
x = tf.constant(2.0)  # Tensor th∆∞·ªùng

with tf.GradientTape() as tape:
    # QU√äN tape.watch(x)
    y = x ** 2

grad = tape.gradient(y, x)
print(f"Gradient: {grad}")  # None!
print("\nNguy√™n nh√¢n: Qu√™n watch tensor th∆∞·ªùng")

In [None]:
# ƒê√öNG - C√°ch 1: Watch
x = tf.constant(2.0)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = x ** 2

grad = tape.gradient(y, x)
print(f"Gradient: {grad.numpy()}")

# ƒê√öNG - C√°ch 2: D√πng Variable
x = tf.Variable(2.0)

with tf.GradientTape() as tape:
    y = x ** 2

grad = tape.gradient(y, x)
print(f"Gradient: {grad.numpy()}")

### ‚ùå Error 4: OOM (Out of Memory)

In [None]:
# V√ç D·ª§ g√¢y OOM (KH√îNG RUN n·∫øu GPU nh·ªè!)
print("V√≠ d·ª• code g√¢y OOM:")
print("""
# SAI - Batch size qu√° l·ªõn
model.fit(X, y, batch_size=10000, epochs=100)

# ƒê√öNG - Gi·∫£m batch size
model.fit(X, y, batch_size=32, epochs=100)
""")

print("\nC√°ch kh·∫Øc ph·ª•c OOM:")
print("1. Gi·∫£m batch_size")
print("2. Gi·∫£m k√≠ch th∆∞·ªõc model")
print("3. Enable memory growth (ƒë√£ l√†m ·ªü tr√™n)")
print("4. D√πng mixed precision training (s·∫Ω h·ªçc sau)")

### ‚ùå Error 5: Model not compiled

In [None]:
# SAI
try:
    model = tf.keras.Sequential([tf.keras.layers.Dense(1)])
    # QU√äN compile
    X = np.random.randn(10, 5).astype(np.float32)
    y = np.random.randn(10, 1).astype(np.float32)
    model.fit(X, y, epochs=1, verbose=0)
except Exception as e:
    print(f"Error: {e}")
    print("\nNguy√™n nh√¢n: Qu√™n compile model")

In [None]:
# ƒê√öNG
model = tf.keras.Sequential([tf.keras.layers.Dense(1)])
model.compile(optimizer='adam', loss='mse')  # B·∫ÆT BU·ªòC!

X = np.random.randn(10, 5).astype(np.float32)
y = np.random.randn(10, 1).astype(np.float32)
model.fit(X, y, epochs=1, verbose=0)

print("Model trained successfully!")

---

## 6Ô∏è‚É£ Debugging Tips

### üîπ Tip 1: Print shapes th∆∞·ªùng xuy√™n

In [None]:
# BEST PRACTICE
def debug_shapes(model, sample_input):
    """Print output shape c·ªßa m·ªói layer"""
    print("Layer Shapes:")
    print("=" * 60)
    x = sample_input
    for i, layer in enumerate(model.layers):
        x = layer(x)
        print(f"  Layer {i} ({layer.name:15s}): {x.shape}")
    print("=" * 60)

# Test
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

sample = tf.random.normal([5, 10])  # batch_size=5
debug_shapes(model, sample)

### üîπ Tip 2: Check for NaN/Inf

In [None]:
def check_nan_inf(tensor, name="tensor"):
    """Ki·ªÉm tra NaN v√† Inf"""
    has_nan = tf.reduce_any(tf.math.is_nan(tensor))
    has_inf = tf.reduce_any(tf.math.is_inf(tensor))
    
    if has_nan:
        print(f"‚ö†Ô∏è  WARNING: {name} contains NaN!")
    if has_inf:
        print(f"‚ö†Ô∏è  WARNING: {name} contains Inf!")
    if not has_nan and not has_inf:
        print(f"‚úì {name} is clean (no NaN/Inf)")

# Test
good_tensor = tf.constant([1.0, 2.0, 3.0])
bad_tensor = tf.constant([1.0, float('nan'), 3.0])
inf_tensor = tf.constant([1.0, float('inf'), 3.0])

check_nan_inf(good_tensor, "good_tensor")
check_nan_inf(bad_tensor, "bad_tensor")
check_nan_inf(inf_tensor, "inf_tensor")

### üîπ Tip 3: Visualize gradients

In [None]:
def plot_gradients(model, X, y):
    """Plot gradient magnitudes"""
    with tf.GradientTape() as tape:
        predictions = model(X)
        loss = tf.keras.losses.mse(y, predictions)
    
    grads = tape.gradient(loss, model.trainable_variables)
    
    # Compute gradient norms
    grad_norms = [tf.norm(g).numpy() for g in grads if g is not None]
    layer_names = [v.name for v in model.trainable_variables]
    
    plt.figure(figsize=(12, 4))
    plt.bar(range(len(grad_norms)), grad_norms)
    plt.xlabel('Layer')
    plt.ylabel('Gradient Norm')
    plt.title('Gradient Magnitudes per Layer')
    plt.xticks(range(len(grad_norms)), [f"L{i}" for i in range(len(grad_norms))], rotation=45)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    print("Gradient norms:")
    for i, (name, norm) in enumerate(zip(layer_names, grad_norms)):
        print(f"  {name:30s}: {norm:.6f}")

# Test
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

X_sample = tf.random.normal([32, 10])
y_sample = tf.random.normal([32, 1])

plot_gradients(model, X_sample, y_sample)

---

## 7Ô∏è‚É£ Best Practices

### ‚úÖ 1. Always normalize input data

In [None]:
from sklearn.preprocessing import StandardScaler

# BAD
X_bad = np.random.randn(100, 10) * 1000  # Large scale

# GOOD
scaler = StandardScaler()
X_good = scaler.fit_transform(X_bad)

print(f"Before normalization: mean={X_bad.mean():.2f}, std={X_bad.std():.2f}")
print(f"After normalization:  mean={X_good.mean():.2f}, std={X_good.std():.2f}")

### ‚úÖ 2. Use appropriate activation functions

In [None]:
print("""
CHEAT SHEET:

Task                        Hidden Layers    Output Layer
================================================================
Regression                  ReLU             None (linear)
Binary Classification       ReLU             Sigmoid
Multi-class Classification  ReLU             Softmax
""")

### ‚úÖ 3. Monitor training

In [None]:
# GOOD - D√πng validation split
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

X = np.random.randn(1000, 10).astype(np.float32)
y = np.random.randn(1000, 1).astype(np.float32)

history = model.fit(
    X, y,
    epochs=20,
    batch_size=32,
    validation_split=0.2,  # QUAN TR·ªåNG!
    verbose=0
)

# Plot
plt.figure(figsize=(8, 5))
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Monitoring')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("Ki·ªÉm tra:")
print("- Train loss gi·∫£m ‚Üí model ƒëang h·ªçc")
print("- Val loss gi·∫£m ‚Üí model generalize t·ªët")
print("- Val loss tƒÉng ‚Üí overfitting!")

### ‚úÖ 4. Use tf.function for performance

In [None]:
# Slow - Eager mode
def slow_function(x):
    for i in range(10):
        x = x + 1
    return x

# Fast - Graph mode
@tf.function
def fast_function(x):
    for i in range(10):
        x = x + 1
    return x

# Benchmark
x = tf.random.normal([1000, 1000])

# Slow
start = time.time()
for _ in range(10):
    _ = slow_function(x)
slow_time = time.time() - start

# Fast
start = time.time()
for _ in range(10):
    _ = fast_function(x)
fast_time = time.time() - start

print(f"Slow (eager): {slow_time:.4f}s")
print(f"Fast (graph): {fast_time:.4f}s")
print(f"Speedup: {slow_time/fast_time:.2f}x")

---

## 8Ô∏è‚É£ Final Exercises

### üìù Exercise 1: Debug Broken Model

Code d∆∞·ªõi ƒë√¢y c√≥ nhi·ªÅu l·ªói. T√¨m v√† s·ª≠a!

In [None]:
# BROKEN CODE - FIX ME!
"""
# Data
X = np.random.randn(100, 10)
y = np.random.randn(100, 1)

# Model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64),  # Bug 1: No activation
    tf.keras.layers.Dense(32),  # Bug 2: No activation
    tf.keras.layers.Dense(1, activation='relu')  # Bug 3: Wrong activation for regression
])

# Train (Bug 4: Not compiled!)
model.fit(X, y, epochs=10)
"""

# YOUR FIXED CODE HERE
# TODO: Fix all bugs

### üìù Exercise 2: Gradient Debugging

Write function ƒë·ªÉ:
1. Check gradient is None
2. Check gradient contains NaN/Inf
3. Print gradient statistics (min, max, mean)

In [None]:
# YOUR CODE HERE
def debug_gradient(model, X, y, loss_fn):
    """
    Debug gradients c·ªßa model
    
    Args:
        model: Keras model
        X: Input data
        y: Target data
        loss_fn: Loss function
    """
    # TODO: Implement
    pass

# Test your function

### üìù Exercise 3: Build Complete Pipeline

Build end-to-end pipeline:
1. Load data (breast cancer)
2. Normalize
3. Split train/val/test
4. Build model
5. Train v·ªõi monitoring
6. Evaluate
7. Save model

**Y√™u c·∫ßu:**
- Accuracy > 95%
- No overfitting
- Clean code v·ªõi comments

In [None]:
# YOUR CODE HERE
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# TODO: Complete pipeline

### üìù Exercise 4: Custom Training Loop (BONUS)

Implement custom training loop v·ªõi:
- Progress bar
- Learning rate scheduling
- Early stopping
- Gradient clipping

In [None]:
# YOUR CODE HERE
# TODO: Implement custom training loop

---

## üéØ Foundation Summary

### ‚úÖ ƒê√£ h·ªçc (Files 1A-1D)

**File 1-A: TensorFlow & Tensor Basics**
- Tensor operations
- Shape, dtype, slicing
- NumPy interop

**File 1-B: Eager Execution & GradientTape**
- Eager execution
- Automatic differentiation
- Manual training loop
- Linear regression from scratch

**File 1-C: tf.keras & First Neural Network**
- Sequential API
- Dense layers
- Activation functions
- model.compile/fit/evaluate/predict

**File 1-D: GPU, Debugging & Exercises**
- CPU vs GPU
- Device placement
- Common errors
- Debugging tips
- Best practices

---

### üéì Key Skills Acquired

1. ‚úÖ Hi·ªÉu c∆° ch·∫ø TensorFlow
2. ‚úÖ T√≠nh gradient th·ªß c√¥ng
3. ‚úÖ Build & train neural networks
4. ‚úÖ Debug & optimize code
5. ‚úÖ GPU awareness

### üìö Ready for Next Step

**PH·∫¶N 2 - INTERMEDIATE** (Files 2A-2D):
- tf.data pipeline
- Advanced optimizers
- Regularization
- Callbacks & custom training

---

## üìñ References

- [TensorFlow GPU Guide](https://www.tensorflow.org/guide/gpu)
- [TensorFlow Performance Guide](https://www.tensorflow.org/guide/profiler)
- [Debugging TensorFlow](https://www.tensorflow.org/guide/debugging)

---

## üéâ Ch√∫c m·ª´ng!

B·∫°n ƒë√£ ho√†n th√†nh **FOUNDATION (BEGINNER)**!

**ƒêi·ªÅu quan tr·ªçng:**
- Kh√¥ng ph·∫£i h·ªçc thu·ªôc, m√† l√† **hi·ªÉu b·∫£n ch·∫•t**
- Practice, practice, practice!
- Quay l·∫°i xem l·∫°i khi c·∫ßn

**Next steps:**
1. L√†m h·∫øt exercises
2. Build 1-2 projects nh·ªè
3. Chuy·ªÉn sang PH·∫¶N 2

---

**Ch√∫c b·∫°n h·ªçc t·ªët! üöÄ**