# üìô FILE 3-C: MODEL DEPLOYMENT & PRODUCTION

**Ph·∫ßn:** ADVANCED & PROFESSIONAL (Production-Ready) - FINAL

**M·ª•c ti√™u:**
- ‚úÖ Inference Pipeline chuy√™n nghi·ªáp
- ‚úÖ Save & Load Models ƒë√∫ng c√°ch
- ‚úÖ Export SavedModel format
- ‚úÖ Performance Optimization
- ‚úÖ Production Best Practices
- ‚úÖ Common Anti-patterns

**Th·ªùi l∆∞·ª£ng:** 2-3 tu·∫ßn

---

## üìö M·ª•c L·ª•c

### PH·∫¶N 1: SAVE & LOAD MODELS
1. Model Formats trong TensorFlow
2. Keras Format (.keras)
3. SavedModel Format
4. Checkpoints
5. Weights Only
6. Best Practices

### PH·∫¶N 2: INFERENCE PIPELINE
1. Inference Pipeline l√† g√¨?
2. Preprocessing for Inference
3. Batch Inference
4. Real-time Inference
5. Post-processing
6. Error Handling

### PH·∫¶N 3: PERFORMANCE OPTIMIZATION
1. Model Optimization Techniques
2. Quantization
3. Pruning
4. TensorFlow Lite
5. ONNX Export

### PH·∫¶N 4: PRODUCTION BEST PRACTICES
1. Model Versioning
2. Monitoring & Logging
3. A/B Testing
4. Rollback Strategy
5. Common Anti-patterns

---

In [None]:
# Import c√°c th∆∞ vi·ªán c·∫ßn thi·∫øt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import json
import time
import pickle
from datetime import datetime
import shutil

print(f"‚úÖ TensorFlow version: {tf.__version__}")
print(f"‚úÖ Keras version: {tf.keras.__version__}")

---

# PH·∫¶N 1: SAVE & LOAD MODELS

## 1.1 Model Formats trong TensorFlow

### C√°c format ch√≠nh

| Format | Extension | Khi n√†o d√πng | ∆Øu ƒëi·ªÉm | Nh∆∞·ª£c ƒëi·ªÉm |
|--------|-----------|--------------|---------|-------------|
| **Keras Format** | .keras | Training, evaluation | ƒê∆°n gi·∫£n, complete | Ch·ªâ cho Keras |
| **SavedModel** | / (directory) | Production, serving | Universal, TF Serving | Ph·ª©c t·∫°p h∆°n |
| **Checkpoint** | .ckpt | Training checkpoints | Ch·ªâ weights | C·∫ßn architecture |
| **HDF5** | .h5 | Legacy (TF 1.x) | Backward compatible | Deprecated |

### Khuy·∫øn ngh·ªã

- üéØ **Training/Development**: Keras format (.keras)
- üöÄ **Production/Serving**: SavedModel format
- üíæ **Checkpoints**: During training
- ‚ùå **Tr√°nh**: HDF5 (.h5) trong TF 2.x

### So s√°nh SavedModel vs Keras

#### Keras Format (.keras)
```python
model.save('my_model.keras')      # Save
model = keras.models.load_model('my_model.keras')  # Load
```
‚úÖ ƒê∆°n gi·∫£n nh·∫•t
‚úÖ L∆∞u: architecture + weights + optimizer state
‚ùå Ch·ªâ cho Keras

#### SavedModel Format
```python
model.save('my_model')            # Save (directory)
model = keras.models.load_model('my_model')  # Load
```
‚úÖ Universal (TF Serving, TF Lite, TF.js)
‚úÖ Language-agnostic
‚úÖ Include signatures for serving
‚ùå Ph·ª©c t·∫°p h∆°n m·ªôt ch√∫t

## 1.2 Keras Format - C√°ch ƒë∆°n gi·∫£n nh·∫•t

In [None]:
# T·∫°o m·ªôt model ƒë∆°n gi·∫£n ƒë·ªÉ demo
def create_simple_model():
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=(10,)),
        layers.Dropout(0.2),
        layers.Dense(32, activation='relu'),
        layers.Dense(3, activation='softmax')
    ])
    
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Create and train model
model = create_simple_model()

# Fake training data
X_train = np.random.rand(100, 10)
y_train = np.random.randint(0, 3, 100)

# Train
print("üöÄ Training model...")
history = model.fit(X_train, y_train, epochs=5, verbose=0)
print("‚úÖ Training completed!")

model.summary()

In [None]:
# SAVE - Keras Format
model_path = 'saved_models/my_model.keras'
Path('saved_models').mkdir(exist_ok=True)

print("üíæ Saving model...")
model.save(model_path)
print(f"‚úÖ Model saved to {model_path}")

# Check file size
file_size = Path(model_path).stat().st_size / (1024 * 1024)  # MB
print(f"   File size: {file_size:.2f} MB")

In [None]:
# LOAD - Keras Format
print("üìÇ Loading model...")
loaded_model = keras.models.load_model(model_path)
print("‚úÖ Model loaded successfully!")

# Verify
print("\nüîç Verification:")

# Test prediction
X_test = np.random.rand(5, 10)
original_pred = model.predict(X_test, verbose=0)
loaded_pred = loaded_model.predict(X_test, verbose=0)

# Check if predictions are identical
are_equal = np.allclose(original_pred, loaded_pred)
print(f"   Predictions match: {are_equal}")

if are_equal:
    print("   ‚úÖ Model loaded correctly!")
else:
    print("   ‚ùå Something wrong!")

## 1.3 SavedModel Format - Cho Production

In [None]:
# SAVE - SavedModel Format
savedmodel_path = 'saved_models/my_savedmodel'

print("üíæ Saving as SavedModel...")
model.save(savedmodel_path)  # Kh√¥ng c√≥ extension!
print(f"‚úÖ SavedModel saved to {savedmodel_path}")

# Check directory structure
print("\nüìÅ SavedModel directory structure:")
for path in Path(savedmodel_path).rglob('*'):
    if path.is_file():
        size = path.stat().st_size / 1024  # KB
        print(f"   {path.relative_to(savedmodel_path)}: {size:.2f} KB")

In [None]:
# LOAD - SavedModel Format
print("üìÇ Loading SavedModel...")
loaded_savedmodel = keras.models.load_model(savedmodel_path)
print("‚úÖ SavedModel loaded successfully!")

# Verify
savedmodel_pred = loaded_savedmodel.predict(X_test, verbose=0)
are_equal = np.allclose(original_pred, savedmodel_pred)
print(f"\nüîç Predictions match: {are_equal}")

## 1.4 Checkpoints - Save during training

In [None]:
# Setup checkpoint callback
checkpoint_dir = 'checkpoints'
Path(checkpoint_dir).mkdir(exist_ok=True)

# Checkpoint callback
checkpoint_callback = keras.callbacks.ModelCheckpoint(
    filepath=f'{checkpoint_dir}/model_epoch_{{epoch:02d}}_val_acc_{{val_accuracy:.4f}}.keras',
    save_best_only=False,  # Save m·ªói epoch
    save_weights_only=False,  # Save full model
    monitor='val_accuracy',
    verbose=1
)

# Train v·ªõi checkpoints
print("üöÄ Training with checkpoints...\n")

model_new = create_simple_model()
history = model_new.fit(
    X_train, y_train,
    validation_split=0.2,
    epochs=5,
    callbacks=[checkpoint_callback],
    verbose=0
)

print("\n‚úÖ Training completed!")

# List checkpoints
print("\nüìÅ Saved checkpoints:")
for ckpt in sorted(Path(checkpoint_dir).glob('*.keras')):
    print(f"   {ckpt.name}")

In [None]:
# Load t·ª´ checkpoint
checkpoints = sorted(Path(checkpoint_dir).glob('*.keras'))
if checkpoints:
    latest_checkpoint = checkpoints[-1]
    print(f"üìÇ Loading checkpoint: {latest_checkpoint.name}")
    
    restored_model = keras.models.load_model(latest_checkpoint)
    print("‚úÖ Model restored from checkpoint!")
    
    # Continue training
    print("\nüîÑ Continue training...")
    restored_model.fit(X_train, y_train, epochs=2, verbose=0)
    print("‚úÖ Training continued!")

## 1.5 Save Weights Only

In [None]:
# Save weights only (nh·ªè h∆°n, nhanh h∆°n)
weights_path = 'saved_models/my_weights.weights.h5'

print("üíæ Saving weights only...")
model.save_weights(weights_path)
print(f"‚úÖ Weights saved to {weights_path}")

# Compare sizes
full_model_size = Path(model_path).stat().st_size / (1024 * 1024)
weights_size = Path(weights_path).stat().st_size / (1024 * 1024)

print(f"\nüìä Size comparison:")
print(f"   Full model (.keras): {full_model_size:.2f} MB")
print(f"   Weights only (.weights.h5): {weights_size:.2f} MB")
print(f"   Savings: {(1 - weights_size/full_model_size) * 100:.1f}%")

In [None]:
# Load weights only
# ‚ö†Ô∏è PH·∫¢I t·∫°o model v·ªõi c√πng architecture tr∆∞·ªõc!

print("üìÇ Loading weights...")

# Create model v·ªõi same architecture
new_model = create_simple_model()

# Load weights
new_model.load_weights(weights_path)
print("‚úÖ Weights loaded!")

# Verify
new_pred = new_model.predict(X_test, verbose=0)
are_equal = np.allclose(original_pred, new_pred)
print(f"\nüîç Predictions match: {are_equal}")

print("\nüí° L∆∞u √Ω:")
print("   - Weights only: Nh·ªè h∆°n, nhanh h∆°n")
print("   - Nh∆∞ng PH·∫¢I c√≥ architecture code")
print("   - Full model: L·ªõn h∆°n nh∆∞ng self-contained")

## 1.6 Best Practices cho Save/Load

### ‚úÖ DO (N√äN L√ÄM)

#### 1. Versioning
```python
# ‚úÖ GOOD: Version trong t√™n file
model.save(f'models/model_v{version}_{timestamp}.keras')

# ‚úÖ GOOD: Structured directory
models/
‚îú‚îÄ‚îÄ v1.0/
‚îÇ   ‚îú‚îÄ‚îÄ model.keras
‚îÇ   ‚îú‚îÄ‚îÄ config.json
‚îÇ   ‚îî‚îÄ‚îÄ metrics.json
‚îî‚îÄ‚îÄ v1.1/
```

#### 2. Save metadata
```python
# ‚úÖ GOOD: L∆∞u metadata c√πng model
metadata = {
    'version': '1.0',
    'timestamp': datetime.now().isoformat(),
    'accuracy': 0.95,
    'config': config_dict
}
with open('models/metadata.json', 'w') as f:
    json.dump(metadata, f)
```

#### 3. Test sau khi load
```python
# ‚úÖ GOOD: Always verify
loaded_model = keras.models.load_model('model.keras')
pred = loaded_model.predict(X_test)
assert pred.shape == expected_shape
```

### ‚ùå DON'T (KH√îNG N√äN)

#### 1. Overwrite models
```python
# ‚ùå BAD: Overwrite model.keras m·ªói l·∫ßn
model.save('model.keras')  # M·∫•t model c≈©!

# ‚úÖ GOOD: Version or timestamp
model.save(f'model_{timestamp}.keras')
```

#### 2. Kh√¥ng save config
```python
# ‚ùå BAD: Ch·ªâ save model
model.save('model.keras')

# ‚úÖ GOOD: Save config c√πng
model.save('model.keras')
with open('config.json', 'w') as f:
    json.dump(config, f)
```

#### 3. D√πng HDF5 trong TF 2.x
```python
# ‚ùå BAD: Legacy format
model.save('model.h5')

# ‚úÖ GOOD: Keras format
model.save('model.keras')
```

In [None]:
# Example: Complete save with metadata

def save_model_with_metadata(model, model_dir, version, config, metrics):
    """
    Save model v·ªõi metadata ƒë·∫ßy ƒë·ªß
    
    Args:
        model: Keras model
        model_dir: Directory to save
        version: Model version
        config: Configuration dict
        metrics: Metrics dict
    """
    # Create directory
    model_path = Path(model_dir) / f'v{version}'
    model_path.mkdir(parents=True, exist_ok=True)
    
    # Save model
    model.save(model_path / 'model.keras')
    
    # Save config
    with open(model_path / 'config.json', 'w') as f:
        json.dump(config, f, indent=2)
    
    # Save metrics
    with open(model_path / 'metrics.json', 'w') as f:
        json.dump(metrics, f, indent=2)
    
    # Save metadata
    metadata = {
        'version': version,
        'timestamp': datetime.now().isoformat(),
        'tensorflow_version': tf.__version__,
        'metrics': metrics
    }
    with open(model_path / 'metadata.json', 'w') as f:
        json.dump(metadata, f, indent=2)
    
    print(f"‚úÖ Model saved to {model_path}")
    print("   Files:")
    for file in model_path.iterdir():
        print(f"     - {file.name}")

# Save
save_model_with_metadata(
    model=model,
    model_dir='production_models',
    version='1.0',
    config={'batch_size': 32, 'learning_rate': 0.001},
    metrics={'accuracy': 0.95, 'loss': 0.15}
)

---

# PH·∫¶N 2: INFERENCE PIPELINE

## 2.1 Inference Pipeline l√† g√¨?

### ƒê·ªãnh nghƒ©a

**Inference Pipeline** = Quy tr√¨nh t·ª´ raw input ‚Üí predictions

```
Raw Input ‚Üí Preprocessing ‚Üí Model Inference ‚Üí Post-processing ‚Üí Final Output
```

### Components

1. **Preprocessing**: Chu·∫©n b·ªã input (resize, normalize, augment)
2. **Model Inference**: Ch·∫°y model
3. **Post-processing**: X·ª≠ l√Ω output (threshold, NMS, format)
4. **Error Handling**: Handle edge cases

### Batch vs Real-time Inference

| Batch Inference | Real-time Inference |
|-----------------|---------------------|
| X·ª≠ l√Ω nhi·ªÅu samples c√πng l√∫c | X·ª≠ l√Ω t·ª´ng sample |
| Throughput cao | Latency th·∫•p |
| Offline processing | Online serving |
| V√≠ d·ª•: X·ª≠ l√Ω 1M ·∫£nh | V√≠ d·ª•: API endpoint |

## 2.2 Inference Pipeline Class

In [None]:
class InferencePipeline:
    """Production-ready inference pipeline"""
    
    def __init__(self, model_path, config=None):
        """
        Args:
            model_path: Path to saved model
            config: Configuration dict
        """
        self.model_path = model_path
        self.config = config or {}
        self.model = None
        self.metadata = {}
        
        # Load model and metadata
        self._load_model()
        self._load_metadata()
        
        print("‚úÖ InferencePipeline initialized")
    
    def _load_model(self):
        """Load model"""
        try:
            self.model = keras.models.load_model(self.model_path)
            print(f"   Model loaded from {self.model_path}")
        except Exception as e:
            raise RuntimeError(f"Failed to load model: {e}")
    
    def _load_metadata(self):
        """Load metadata if available"""
        model_dir = Path(self.model_path).parent
        metadata_path = model_dir / 'metadata.json'
        
        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                self.metadata = json.load(f)
            print(f"   Metadata loaded")
    
    def preprocess(self, inputs):
        """
        Preprocess inputs
        
        Args:
            inputs: Raw inputs
        
        Returns:
            Preprocessed inputs
        """
        # Example: Normalize
        if isinstance(inputs, np.ndarray):
            inputs = inputs.astype('float32')
            # Add normalization logic here
        
        return inputs
    
    def predict(self, inputs, batch_size=32):
        """
        Batch inference
        
        Args:
            inputs: Input data
            batch_size: Batch size for inference
        
        Returns:
            Predictions
        """
        # Preprocess
        inputs = self.preprocess(inputs)
        
        # Predict
        predictions = self.model.predict(inputs, batch_size=batch_size, verbose=0)
        
        # Post-process
        predictions = self.postprocess(predictions)
        
        return predictions
    
    def predict_single(self, input_sample):
        """
        Single sample inference (for real-time)
        
        Args:
            input_sample: Single input sample
        
        Returns:
            Prediction
        """
        # Add batch dimension
        if len(input_sample.shape) == len(self.model.input_shape) - 1:
            input_sample = np.expand_dims(input_sample, axis=0)
        
        # Predict
        prediction = self.predict(input_sample, batch_size=1)
        
        # Remove batch dimension
        return prediction[0]
    
    def postprocess(self, predictions):
        """
        Post-process predictions
        
        Args:
            predictions: Raw predictions
        
        Returns:
            Post-processed predictions
        """
        # Example: Apply threshold for binary classification
        # Or get class with highest probability
        return predictions
    
    def get_info(self):
        """Get pipeline info"""
        return {
            'model_path': str(self.model_path),
            'input_shape': self.model.input_shape,
            'output_shape': self.model.output_shape,
            'metadata': self.metadata
        }

print("‚úÖ InferencePipeline class defined!")

In [None]:
# Example usage
pipeline = InferencePipeline('production_models/v1.0/model.keras')

# Pipeline info
print("\nüìã Pipeline Info:")
info = pipeline.get_info()
for key, value in info.items():
    print(f"   {key}: {value}")

# Batch inference
X_batch = np.random.rand(10, 10)
print("\nüîÆ Batch inference...")
predictions = pipeline.predict(X_batch)
print(f"   Predictions shape: {predictions.shape}")

# Single inference
X_single = np.random.rand(10)
print("\nüîÆ Single inference...")
prediction = pipeline.predict_single(X_single)
print(f"   Prediction shape: {prediction.shape}")
print(f"   Prediction: {prediction}")

## 2.3 Performance Measurement

In [None]:
def benchmark_inference(pipeline, input_shape, num_samples=1000, batch_sizes=[1, 8, 32, 64]):
    """
    Benchmark inference performance
    
    Args:
        pipeline: InferencePipeline instance
        input_shape: Input shape (without batch dim)
        num_samples: Number of samples to test
        batch_sizes: List of batch sizes to test
    
    Returns:
        results: Dictionary of benchmark results
    """
    results = {}
    
    # Generate test data
    X_test = np.random.rand(num_samples, *input_shape)
    
    print(f"üìä Benchmarking inference with {num_samples} samples...\n")
    
    for batch_size in batch_sizes:
        print(f"Batch size: {batch_size}")
        
        # Warmup
        _ = pipeline.predict(X_test[:batch_size], batch_size=batch_size)
        
        # Benchmark
        start_time = time.time()
        predictions = pipeline.predict(X_test, batch_size=batch_size)
        elapsed_time = time.time() - start_time
        
        # Calculate metrics
        throughput = num_samples / elapsed_time  # samples/second
        latency = (elapsed_time / num_samples) * 1000  # ms per sample
        
        results[batch_size] = {
            'elapsed_time': elapsed_time,
            'throughput': throughput,
            'latency': latency
        }
        
        print(f"  Elapsed time: {elapsed_time:.2f}s")
        print(f"  Throughput: {throughput:.2f} samples/sec")
        print(f"  Latency: {latency:.2f} ms/sample\n")
    
    return results

# Benchmark
results = benchmark_inference(
    pipeline=pipeline,
    input_shape=(10,),
    num_samples=1000,
    batch_sizes=[1, 8, 32, 64]
)

In [None]:
# Visualize benchmark results
batch_sizes = list(results.keys())
throughputs = [results[bs]['throughput'] for bs in batch_sizes]
latencies = [results[bs]['latency'] for bs in batch_sizes]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Throughput
ax1.plot(batch_sizes, throughputs, marker='o', linewidth=2, markersize=8)
ax1.set_xlabel('Batch Size', fontsize=12)
ax1.set_ylabel('Throughput (samples/sec)', fontsize=12)
ax1.set_title('Throughput vs Batch Size', fontsize=13, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.set_xscale('log', base=2)

# Latency
ax2.plot(batch_sizes, latencies, marker='o', linewidth=2, markersize=8, color='orange')
ax2.set_xlabel('Batch Size', fontsize=12)
ax2.set_ylabel('Latency (ms/sample)', fontsize=12)
ax2.set_title('Latency vs Batch Size', fontsize=13, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.set_xscale('log', base=2)

plt.tight_layout()
plt.show()

print("üí° Insights:")
print("   - Batch size l·ªõn ‚Üí Throughput cao (hi·ªáu qu·∫£ h∆°n)")
print("   - Batch size nh·ªè ‚Üí Latency th·∫•p (ph·∫£n h·ªìi nhanh)")
print("   - Trade-off gi·ªØa throughput v√† latency")

---

# PH·∫¶N 3: PERFORMANCE OPTIMIZATION

## 3.1 Model Optimization Techniques

### C√°c k·ªπ thu·∫≠t optimization

| Technique | Gi·∫£m size | TƒÉng t·ªëc | Gi·∫£m accuracy | Khi n√†o d√πng |
|-----------|-----------|----------|---------------|---------------|
| **Quantization** | ‚úÖ‚úÖ‚úÖ‚úÖ (4x) | ‚úÖ‚úÖ‚úÖ | Minimal | Production, mobile |
| **Pruning** | ‚úÖ‚úÖ‚úÖ | ‚úÖ‚úÖ | Minimal | Model compression |
| **Knowledge Distillation** | ‚úÖ‚úÖ‚úÖ‚úÖ | ‚úÖ‚úÖ‚úÖ | Small | Mobile, edge |
| **TensorFlow Lite** | ‚úÖ‚úÖ‚úÖ | ‚úÖ‚úÖ‚úÖ‚úÖ | Minimal | Mobile deployment |
| **ONNX** | ‚úÖ‚úÖ | ‚úÖ‚úÖ | None | Cross-platform |

### Quantization l√† g√¨?

**Quantization** = Gi·∫£m precision c·ªßa weights v√† activations

- **Float32** ‚Üí **Int8**: 4x smaller, faster
- **Post-training quantization**: Kh√¥ng c·∫ßn retrain
- **Quantization-aware training**: Train v·ªõi quantization (accuracy t·ªët h∆°n)

## 3.2 TensorFlow Lite Conversion

In [None]:
# Convert to TensorFlow Lite

def convert_to_tflite(model, optimization='default'):
    """
    Convert Keras model to TensorFlow Lite
    
    Args:
        model: Keras model
        optimization: 'default', 'float16', 'int8'
    
    Returns:
        tflite_model: Converted model (bytes)
    """
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    
    if optimization == 'default':
        # No optimization
        pass
    
    elif optimization == 'float16':
        # Float16 quantization
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        converter.target_spec.supported_types = [tf.float16]
    
    elif optimization == 'int8':
        # Int8 quantization (c·∫ßn representative dataset)
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        
        # Representative dataset for quantization
        def representative_dataset():
            for _ in range(100):
                yield [np.random.rand(1, *model.input_shape[1:]).astype(np.float32)]
        
        converter.representative_dataset = representative_dataset
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        converter.inference_input_type = tf.int8
        converter.inference_output_type = tf.int8
    
    # Convert
    tflite_model = converter.convert()
    
    return tflite_model

# Convert v·ªõi c√°c optimization levels
print("üîÑ Converting to TensorFlow Lite...\n")

optimizations = ['default', 'float16', 'int8']
tflite_models = {}

for opt in optimizations:
    print(f"Converting with {opt} optimization...")
    try:
        tflite_model = convert_to_tflite(model, optimization=opt)
        tflite_models[opt] = tflite_model
        
        # Save
        tflite_path = f'saved_models/model_{opt}.tflite'
        with open(tflite_path, 'wb') as f:
            f.write(tflite_model)
        
        size_mb = len(tflite_model) / (1024 * 1024)
        print(f"  ‚úÖ Saved to {tflite_path} ({size_mb:.2f} MB)\n")
    
    except Exception as e:
        print(f"  ‚ùå Failed: {e}\n")

print("‚úÖ Conversion completed!")

In [None]:
# Compare model sizes
import os

print("üìä MODEL SIZE COMPARISON:")
print("=" * 60)

# Original model
original_size = Path('production_models/v1.0/model.keras').stat().st_size / (1024 * 1024)
print(f"Original (.keras):      {original_size:>8.2f} MB (100%)")

# TFLite models
for opt in optimizations:
    tflite_path = f'saved_models/model_{opt}.tflite'
    if Path(tflite_path).exists():
        size = Path(tflite_path).stat().st_size / (1024 * 1024)
        reduction = (1 - size / original_size) * 100
        print(f"TFLite ({opt:8s}): {size:>8.2f} MB ({100-reduction:.1f}%, -{reduction:.1f}%)")

print("=" * 60)

print("\nüí° Int8 quantization c√≥ th·ªÉ gi·∫£m size 4x v·ªõi accuracy loss minimal!")

## 3.3 ONNX Export

In [None]:
# Convert to ONNX format
# ONNX = Open Neural Network Exchange (cross-platform format)

try:
    import tf2onnx
    
    print("üîÑ Converting to ONNX...")
    
    # Convert
    onnx_model, _ = tf2onnx.convert.from_keras(model)
    
    # Save
    onnx_path = 'saved_models/model.onnx'
    with open(onnx_path, 'wb') as f:
        f.write(onnx_model.SerializeToString())
    
    size_mb = Path(onnx_path).stat().st_size / (1024 * 1024)
    print(f"‚úÖ ONNX model saved to {onnx_path} ({size_mb:.2f} MB)")
    
    print("\nüí° ONNX format cho ph√©p:")
    print("   - Deploy tr√™n nhi·ªÅu platforms (PyTorch, ONNX Runtime, etc.)")
    print("   - Optimize v·ªõi ONNX Runtime")
    print("   - Cross-framework compatibility")

except ImportError:
    print("‚ö†Ô∏è  tf2onnx not installed")
    print("   Install: pip install tf2onnx")

---

# PH·∫¶N 4: PRODUCTION BEST PRACTICES

## 4.1 Model Versioning Strategy

### Semantic Versioning

```
v{MAJOR}.{MINOR}.{PATCH}
```

- **MAJOR**: Breaking changes (architecture change, incompatible API)
- **MINOR**: New features (backward compatible)
- **PATCH**: Bug fixes, small improvements

Examples:
- `v1.0.0` ‚Üí Initial release
- `v1.1.0` ‚Üí Th√™m features m·ªõi
- `v1.1.1` ‚Üí Bug fix
- `v2.0.0` ‚Üí Architecture change

### Model Registry Structure

```
models/
‚îú‚îÄ‚îÄ production/
‚îÇ   ‚îî‚îÄ‚îÄ v1.2.1/          # Current production model
‚îú‚îÄ‚îÄ staging/
‚îÇ   ‚îî‚îÄ‚îÄ v1.3.0/          # Testing before production
‚îú‚îÄ‚îÄ archive/
‚îÇ   ‚îú‚îÄ‚îÄ v1.0.0/
‚îÇ   ‚îú‚îÄ‚îÄ v1.1.0/
‚îÇ   ‚îî‚îÄ‚îÄ v1.2.0/
‚îî‚îÄ‚îÄ experiments/
    ‚îî‚îÄ‚îÄ exp_001/
```

In [None]:
class ModelRegistry:
    """Simple model registry for versioning"""
    
    def __init__(self, base_dir='model_registry'):
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(exist_ok=True)
        
        # Create subdirectories
        for subdir in ['production', 'staging', 'archive', 'experiments']:
            (self.base_dir / subdir).mkdir(exist_ok=True)
    
    def register_model(self, model, version, stage='staging', metadata=None):
        """
        Register a model
        
        Args:
            model: Keras model
            version: Version string (e.g., '1.2.0')
            stage: 'staging', 'production', or 'archive'
            metadata: Optional metadata dict
        """
        # Create version directory
        model_dir = self.base_dir / stage / f'v{version}'
        model_dir.mkdir(parents=True, exist_ok=True)
        
        # Save model
        model.save(model_dir / 'model.keras')
        
        # Save metadata
        if metadata is None:
            metadata = {}
        
        metadata.update({
            'version': version,
            'stage': stage,
            'registered_at': datetime.now().isoformat(),
            'tensorflow_version': tf.__version__
        })
        
        with open(model_dir / 'metadata.json', 'w') as f:
            json.dump(metadata, f, indent=2)
        
        print(f"‚úÖ Model v{version} registered to {stage}")
        return model_dir
    
    def promote_to_production(self, version):
        """
        Promote staging model to production
        
        Args:
            version: Version to promote
        """
        staging_path = self.base_dir / 'staging' / f'v{version}'
        production_path = self.base_dir / 'production' / f'v{version}'
        
        if not staging_path.exists():
            raise ValueError(f"Version {version} not found in staging")
        
        # Archive current production model if exists
        current_prod = list((self.base_dir / 'production').iterdir())
        for old_model in current_prod:
            archive_path = self.base_dir / 'archive' / old_model.name
            shutil.move(str(old_model), str(archive_path))
            print(f"   Archived {old_model.name}")
        
        # Copy to production
        shutil.copytree(staging_path, production_path)
        
        print(f"‚úÖ Model v{version} promoted to production!")
    
    def load_model(self, version=None, stage='production'):
        """
        Load model from registry
        
        Args:
            version: Version to load (None = latest)
            stage: Stage to load from
        
        Returns:
            model, metadata
        """
        stage_dir = self.base_dir / stage
        
        if version is None:
            # Load latest
            versions = sorted(stage_dir.iterdir())
            if not versions:
                raise ValueError(f"No models found in {stage}")
            model_dir = versions[-1]
        else:
            model_dir = stage_dir / f'v{version}'
        
        # Load model
        model = keras.models.load_model(model_dir / 'model.keras')
        
        # Load metadata
        with open(model_dir / 'metadata.json', 'r') as f:
            metadata = json.load(f)
        
        print(f"‚úÖ Loaded model v{metadata['version']} from {stage}")
        return model, metadata
    
    def list_models(self, stage=None):
        """List all models"""
        if stage:
            stages = [stage]
        else:
            stages = ['production', 'staging', 'archive']
        
        for stage in stages:
            stage_dir = self.base_dir / stage
            versions = sorted(stage_dir.iterdir())
            
            print(f"\n{stage.upper()}:")
            if not versions:
                print("  (empty)")
            else:
                for v in versions:
                    print(f"  - {v.name}")

print("‚úÖ ModelRegistry class defined!")

In [None]:
# Example usage
registry = ModelRegistry()

# Register model to staging
registry.register_model(
    model=model,
    version='1.0.0',
    stage='staging',
    metadata={'accuracy': 0.95, 'description': 'Initial release'}
)

# Promote to production
print("\nüöÄ Promoting to production...")
registry.promote_to_production('1.0.0')

# List models
registry.list_models()

# Load from production
print("\nüìÇ Loading from production...")
prod_model, metadata = registry.load_model(stage='production')
print(f"   Metadata: {metadata}")

## 4.2 Monitoring & Logging

### Metrics c·∫ßn monitor

#### Model Performance Metrics
- Accuracy, Precision, Recall, F1
- Distribution of predictions
- Confidence scores

#### System Metrics
- Latency (p50, p95, p99)
- Throughput (requests/sec)
- Error rate
- CPU/Memory usage

#### Data Quality Metrics
- Input distribution shift
- Missing values
- Outliers

In [None]:
import logging
from collections import defaultdict

class ModelMonitor:
    """Monitor model performance in production"""
    
    def __init__(self, log_file='model_monitor.log'):
        self.log_file = log_file
        self.metrics = defaultdict(list)
        
        # Setup logging
        logging.basicConfig(
            filename=log_file,
            level=logging.INFO,
            format='%(asctime)s - %(levelname)s - %(message)s'
        )
        self.logger = logging.getLogger(__name__)
    
    def log_prediction(self, input_data, prediction, confidence, latency):
        """
        Log a prediction
        
        Args:
            input_data: Input features
            prediction: Model prediction
            confidence: Confidence score
            latency: Inference latency (ms)
        """
        # Store metrics
        self.metrics['latency'].append(latency)
        self.metrics['confidence'].append(confidence)
        
        # Log
        self.logger.info(f"Prediction: {prediction}, Confidence: {confidence:.4f}, Latency: {latency:.2f}ms")
    
    def log_error(self, error_type, error_message):
        """Log an error"""
        self.logger.error(f"{error_type}: {error_message}")
        self.metrics['errors'].append(error_type)
    
    def get_statistics(self):
        """Get monitoring statistics"""
        stats = {}
        
        if self.metrics['latency']:
            latencies = np.array(self.metrics['latency'])
            stats['latency'] = {
                'mean': float(np.mean(latencies)),
                'p50': float(np.percentile(latencies, 50)),
                'p95': float(np.percentile(latencies, 95)),
                'p99': float(np.percentile(latencies, 99)),
                'max': float(np.max(latencies))
            }
        
        if self.metrics['confidence']:
            confidences = np.array(self.metrics['confidence'])
            stats['confidence'] = {
                'mean': float(np.mean(confidences)),
                'min': float(np.min(confidences)),
                'max': float(np.max(confidences))
            }
        
        stats['total_predictions'] = len(self.metrics['latency'])
        stats['total_errors'] = len(self.metrics['errors'])
        
        return stats
    
    def print_report(self):
        """Print monitoring report"""
        stats = self.get_statistics()
        
        print("üìä MONITORING REPORT")
        print("=" * 60)
        print(f"Total Predictions: {stats['total_predictions']}")
        print(f"Total Errors: {stats['total_errors']}")
        
        if 'latency' in stats:
            print("\nLatency Statistics (ms):")
            for key, value in stats['latency'].items():
                print(f"  {key}: {value:.2f}")
        
        if 'confidence' in stats:
            print("\nConfidence Statistics:")
            for key, value in stats['confidence'].items():
                print(f"  {key}: {value:.4f}")
        
        print("=" * 60)

# Example usage
monitor = ModelMonitor()

# Simulate predictions
for i in range(100):
    input_data = np.random.rand(10)
    
    start_time = time.time()
    prediction = pipeline.predict_single(input_data)
    latency = (time.time() - start_time) * 1000  # ms
    
    confidence = float(np.max(prediction))
    
    monitor.log_prediction(input_data, prediction, confidence, latency)

# Print report
monitor.print_report()

## 4.3 Common Anti-patterns

### ‚ùå ANTI-PATTERN 1: Kh√¥ng version models

```python
# ‚ùå BAD: Overwrite model.keras
model.save('model.keras')  # M·∫•t track c·ªßa models c≈©!

# ‚úÖ GOOD: Version models
model.save(f'models/v{version}/model.keras')
```

### ‚ùå ANTI-PATTERN 2: Training/Inference preprocessing kh√°c nhau

```python
# ‚ùå BAD: Preprocessing kh√°c nhau
# Training
X_train = X_train / 255.0

# Inference
X_test = (X_test - mean) / std  # KH√ÅC!

# ‚úÖ GOOD: Same preprocessing
def preprocess(X):
    return X / 255.0

X_train = preprocess(X_train)
X_test = preprocess(X_test)
```

### ‚ùå ANTI-PATTERN 3: Kh√¥ng handle errors

```python
# ‚ùå BAD: No error handling
prediction = model.predict(input_data)

# ‚úÖ GOOD: Handle errors
try:
    prediction = model.predict(input_data)
except Exception as e:
    logger.error(f"Prediction failed: {e}")
    return default_prediction
```

### ‚ùå ANTI-PATTERN 4: Kh√¥ng monitor production

```python
# ‚ùå BAD: Deploy and forget
model.predict(X)

# ‚úÖ GOOD: Monitor everything
start = time.time()
prediction = model.predict(X)
latency = time.time() - start
monitor.log(prediction, latency)
```

### ‚ùå ANTI-PATTERN 5: Hardcode config trong code

```python
# ‚ùå BAD: Magic numbers
model = Model(hidden_size=128, dropout=0.2, lr=0.001)

# ‚úÖ GOOD: Config file
config = load_config('config.yaml')
model = Model(**config['model'])
```

### ‚ùå ANTI-PATTERN 6: Kh√¥ng test model sau khi load

```python
# ‚ùå BAD: Load and use
model = load_model('model.keras')
predictions = model.predict(X)

# ‚úÖ GOOD: Verify after load
model = load_model('model.keras')
test_input = np.random.rand(1, *input_shape)
test_output = model.predict(test_input)
assert test_output.shape == expected_shape
```

### ‚ùå ANTI-PATTERN 7: Qu√° optimize s·ªõm

```python
# ‚ùå BAD: Optimize ngay t·ª´ ƒë·∫ßu
# - Quantize model
# - Prune layers
# - Complex serving setup
# ‚Üí Ch∆∞a bi·∫øt bottleneck ·ªü ƒë√¢u!

# ‚úÖ GOOD: Optimize khi c·∫ßn
# 1. Deploy simple version
# 2. Measure performance
# 3. Identify bottlenecks
# 4. Optimize targeted areas
```

## 4.4 Production Checklist

### ‚úÖ Pre-deployment

- [ ] Model achieves target metrics
- [ ] Model versioned properly
- [ ] Config file created
- [ ] Preprocessing code tested
- [ ] Inference pipeline tested
- [ ] Error handling implemented
- [ ] Logging configured
- [ ] Documentation written

### ‚úÖ Deployment

- [ ] Model saved in production format
- [ ] A/B testing setup (if needed)
- [ ] Monitoring enabled
- [ ] Alerts configured
- [ ] Rollback plan ready
- [ ] Load testing completed
- [ ] Gradual rollout plan

### ‚úÖ Post-deployment

- [ ] Monitor metrics daily
- [ ] Check for data drift
- [ ] Review error logs
- [ ] Analyze prediction distribution
- [ ] Gather user feedback
- [ ] Plan next iteration

### üö® Red Flags

- ‚ö†Ô∏è  Accuracy drop > 5%
- ‚ö†Ô∏è  Latency increase > 50%
- ‚ö†Ô∏è  Error rate > 1%
- ‚ö†Ô∏è  Memory leak
- ‚ö†Ô∏è  Input distribution shift
- ‚ö†Ô∏è  Unusual prediction patterns

‚Üí **ROLLBACK IMMEDIATELY!**

---

# üéì T·ªïng k·∫øt FILE 3-C & TO√ÄN B·ªò SERIES

## ‚úÖ FILE 3-C: Model Deployment & Production

### 1. Save & Load Models
- **Formats**: Keras (.keras), SavedModel, Checkpoints
- **Best practices**: Versioning, metadata, verification
- **Recommendation**: .keras cho development, SavedModel cho production

### 2. Inference Pipeline
- **Components**: Preprocessing ‚Üí Inference ‚Üí Post-processing
- **Batch vs Real-time**: Trade-off throughput vs latency
- **Performance**: Benchmark v√† optimize

### 3. Performance Optimization
- **Quantization**: Float32 ‚Üí Int8 (4x smaller)
- **TensorFlow Lite**: Mobile deployment
- **ONNX**: Cross-platform compatibility

### 4. Production Best Practices
- **Versioning**: Semantic versioning (v1.2.3)
- **Monitoring**: Metrics, logs, alerts
- **Anti-patterns**: Common mistakes to avoid
- **Checklist**: Pre/during/post deployment

---

## üéâ HO√ÄN TH√ÄNH TO√ÄN B·ªò SERIES!

### FILE 3-A: Transfer Learning & Mixed Precision
- ‚úÖ Transfer Learning (MobileNetV2, ResNet50)
- ‚úÖ Feature Extraction vs Fine-tuning
- ‚úÖ Mixed Precision Training (2-3x speedup)

### FILE 3-B: Clean ML Pipeline & Evaluation
- ‚úÖ Clean ML Pipeline (config-driven, modular)
- ‚úÖ Reproducibility (seeds, versioning)
- ‚úÖ Model Evaluation (metrics, cross-validation)

### FILE 3-C: Model Deployment & Production
- ‚úÖ Save/Load strategies
- ‚úÖ Inference pipeline
- ‚úÖ Performance optimization
- ‚úÖ Production best practices

---

## üöÄ Next Steps - B·∫°n ƒë√£ s·∫µn s√†ng cho:

### 1. Production ML Projects
- Build end-to-end ML pipelines
- Deploy models to production
- Monitor and maintain models

### 2. Advanced Topics
- TensorFlow Serving
- MLOps with MLflow/Kubeflow
- Distributed training
- Model compression techniques

### 3. Specialized Domains
- Computer Vision (object detection, segmentation)
- NLP (transformers, BERT)
- Time Series forecasting
- Recommendation systems

---

## üí° Key Takeaways - Top 10

1. **Transfer Learning** l√† must-have cho CV
2. **Mixed Precision** = free 2-3x speedup
3. **Clean pipeline** = d·ªÖ maintain v√† scale
4. **Reproducibility** = set seeds + version everything
5. **Right metrics** > high accuracy
6. **SavedModel** for production
7. **Monitor everything** in production
8. **Version models** properly (semantic versioning)
9. **Optimize when needed**, not prematurely
10. **Production checklist** before deployment

---

## üìö T√†i li·ªáu tham kh·∫£o

- **TensorFlow Official Docs**: https://www.tensorflow.org/guide
- **TensorFlow Model Optimization**: https://www.tensorflow.org/model_optimization
- **MLOps Best Practices**: https://ml-ops.org/
- **Production ML Systems**: https://developers.google.com/machine-learning/crash-course/production-ml-systems

---

**üéâ Ch√∫c m·ª´ng b·∫°n ƒë√£ ho√†n th√†nh to√†n b·ªô course TensorFlow t·ª´ Beginner ƒë·∫øn Professional! üéâ**

**B·∫°n gi·ªù ƒë√£ s·∫µn s√†ng build v√† deploy production ML systems! üöÄ**