🛠️ It’s time to go **full engineer mode** — your model is now a **web service.**  
Let’s wrap it in Flask, containerize it with Docker, and serve it like a **real backend**.

---

# 🧪 `08_lab_dockerize_and_test_flask_model_server.ipynb`  
### 📁 `06_deployment_and_scaling`  
> Create a **Flask API** for a PyTorch/ONNX model  
> Package it into a **Docker container**  
> Test it with live POST requests — as if you’re building an **ML-powered microservice**

---

## 🎯 Learning Goals

- Build a **Flask app** that serves predictions  
- Accept **image uploads via POST**  
- Serve using **Docker containerization**  
- Test using **curl or Python requests**  
- Build awareness for **latency, batching, I/O handling**

---

## 💻 Runtime Design

| Component      | Spec               |
|----------------|--------------------|
| Model          | `mlp_mnist.onnx` or PyTorch ✅  
| API Server     | Flask ✅  
| Container      | Dockerfile ✅  
| Testing Tool   | requests / curl ✅  
| Platform       | Local / Colab + ngrok (if needed) ✅  

---

## 🧠 Section 1: Flask App `app.py`

```python
from flask import Flask, request, jsonify
import onnxruntime as ort
import numpy as np
from PIL import Image
import io
import torchvision.transforms as transforms

app = Flask(__name__)
session = ort.InferenceSession("mlp_mnist.onnx")
input_name = session.get_inputs()[0].name

transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.Resize((28, 28)),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.unsqueeze(0).numpy().astype(np.float32))
])

@app.route("/predict", methods=["POST"])
def predict():
    if 'file' not in request.files:
        return jsonify({"error": "No image uploaded"}), 400
    img = Image.open(request.files['file']).convert("RGB")
    img_tensor = transform(img)
    out = session.run(None, {input_name: img_tensor})
    pred = int(np.argmax(out[0]))
    return jsonify({"prediction": pred})
```

---

## 📁 Section 2: Dockerfile

```dockerfile
FROM python:3.10

WORKDIR /app
COPY . /app

RUN pip install flask onnxruntime pillow torchvision

EXPOSE 5000

CMD ["python", "app.py"]
```

---

## 🛠️ Section 3: Build & Run Docker

### ✅ Build
```bash
docker build -t mnist-flask-app .
```

### ✅ Run
```bash
docker run -p 5000:5000 mnist-flask-app
```

---

## 🚀 Section 4: Test the API

### 📤 Test with Python

```python
import requests

files = {'file': open('digit_sample.png', 'rb')}
res = requests.post("http://localhost:5000/predict", files=files)
print(res.json())
```

Or use `curl`:
```bash
curl -X POST -F "file=@digit_sample.png" http://localhost:5000/predict
```

---

## ✅ Wrap-Up Summary

| Feature                           | ✅ |
|------------------------------------|----|
| Flask API from model              | ✅ |
| ONNX inference inside endpoint     | ✅ |
| Docker container builds cleanly    | ✅ |
| RESTful POST request support       | ✅ |
| Local + Colab-compatible (ngrok)   | ✅ |

---

## 🧠 What You Learned

- Models ≠ products — this is how you **turn ML into APIs**  
- Docker ensures **portability and reproducibility**  
- You now know how to **ship** an inference pipeline to production  

---

Wanna go beast mode and hit `09_lab_k8s_microservice_mock_deploy.ipynb` next?  
It’s time to throw this container into a **Kubernetes cluster**, mock a deployment, and simulate **real microservice orchestration**.  

Ready, Captain?