# TensorFlow Serving for Model Deployment
## AIAT 122 - Deep Learning

## Learning Objectives

- Understand TensorFlow Serving architecture
- Deploy models with TensorFlow Serving
- Create REST API endpoints
- Test production model serving

## Real-World Context

Deploying models for production inference at scale.

**Industry Impact**: TensorFlow Serving is used by Google, Airbnb, and many companies for production ML serving.

In [None]:
%pip install tensorflow tensorflow-serving-api -q
import tensorflow as tf
import numpy as np
print(f'TensorFlow version: {tf.__version__}')
print('✅ Setup complete!')

## Part 1: Save Model in SavedModel Format

In [None]:
# Create a simple model for demonstration
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Save in SavedModel format
model_path = './saved_model'
model.save(model_path, save_format='tf')
print(f'✅ Model saved to {model_path}')

## Part 2: TensorFlow Serving Setup

**Note**: Full TensorFlow Serving requires Docker. Here we demonstrate the concept.

In [None]:
print('📦 TensorFlow Serving Setup:')
print('\n1. Install TensorFlow Serving:')
print('   docker pull tensorflow/serving')
print('\n2. Start serving container:')
print('   docker run -p 8501:8501 --mount type=bind,source=/path/to/model,target=/models/model -e MODEL_NAME=model tensorflow/serving')
print('\n3. Test REST API:')
print('   curl -d \'{"instances": [[1,2,3,...]]}\' -X POST http://localhost:8501/v1/models/model:predict')
print('\n✅ Serving setup understood!')

## Part 3: REST API Client Example

In [None]:
import requests
import json

# Example REST API call (when serving is running)
def predict_rest_api(data, model_name='model', port=8501):
    """
    Make prediction via REST API.
    
    Real-world: Production inference endpoint
    """
    url = f'http://localhost:{port}/v1/models/{model_name}:predict'
    payload = {'instances': data.tolist() if isinstance(data, np.ndarray) else data}
    
    response = requests.post(url, json=payload)
    return response.json()

print('✅ REST API client ready!')
print('\nReal-world: This is how production systems serve predictions')

## Real-World Applications

- **Google**: Serves billions of predictions daily
- **Airbnb**: Dynamic pricing models
- **Uber**: ETA predictions
- **Netflix**: Recommendation systems

---

**End of Notebook**