# Lesson 1: Model Serialization

**Module 5: Model Deployment**  
**Estimated Time**: 1 hour  
**Difficulty**: Beginner

---

## üéØ Learning Objectives

By the end of this lesson, you will:

‚úÖ Understand how to save Python objects to disk (Pickle/Joblib)  
‚úÖ Learn why **Pickle is insecure** (Remote Code Execution risk)  
‚úÖ Explore safer alternatives like **Safetensors** and **ONNX**  
‚úÖ Answer interview questions on model format security  

---

## üìö Table of Contents

1. [The Basics: Pickle & Joblib](#1-pickle)
2. [The Danger: Why Pickle is Unsafe](#2-danger)
3. [The Solution: Safetensors & ONNX](#3-secure)
4. [Hands-On: Saving & Loading](#4-hands-on)
5. [Interview Preparation](#5-interview-questions)

---

## 1. The Basics: Pickle & Joblib

When you train a model in RAM, you need to save it to disk to use it later.

- **Pickle**: Python's standard serialization format. Handles almost any Python object.
- **Joblib**: Optimized for large NumPy arrays (scikit-learn uses this internally).

**Usage**:
```python
import joblib
joblib.dump(model, 'model.pkl')
loaded_model = joblib.load('model.pkl')
```

## 2. The Danger: Why Pickle is Unsafe

**Scenario**: You download a model `cool_bert.pkl` from the internet.
**Attack**: The file contains a malicious `__reduce__` method that executes `os.system('rm -rf /')` when loaded.

**Rule**: NEVER unpickle data from an untrusted source.

## 3. The Solution: Safetensors & ONNX

1. **ONNX** (Lesson 4.7): Defines a computation graph. Secure because it only executes math, not arbitrary Python code.
2. **Safetensors** (by Hugging Face): A new format specifically for storing tensors securely and efficiently (Zero-copy).

## 4. Hands-On: Saving & Loading

Simulating the risk and the safe alternative.

In [None]:
import pickle
import os

# --- 1. The Vulnerability Exploit Demo ---
class MaliciousModel:
    def __reduce__(self):
        # This code runs IMMEDIATELY upon unpickling
        return (print, ("‚ö†Ô∏è HACKED! I just ran code on your machine!",))

# Attacker creates file
malicious = MaliciousModel()
with open("malicious.pkl", "wb") as f:
    pickle.dump(malicious, f)

print("Suppose you download 'malicious.pkl'...")
print("Loading it now...")

# Victim loads file
with open("malicious.pkl", "rb") as f:
    pickle.load(f)

# Cleanup
os.remove("malicious.pkl")

In [None]:
import joblib
from sklearn.linear_model import LinearRegression
import numpy as np

# --- 2. The Standard Scikit-Learn Way (Joblib) ---
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
model = LinearRegression().fit(X, y)

print("Saving model with Joblib...")
joblib.dump(model, "model.joblib")

print("Loading model...")
loaded = joblib.load("model.joblib")
print(f"Prediction: {loaded.predict(np.array([[3, 5]]))}")

# Cleanup
os.remove("model.joblib")

## 5. Interview Preparation

### Common Questions

#### Q1: "How do you share a PyTorch model safely?"
**Answer**: "I prefer exporting to **ONNX** or saving the `state_dict` (which is just weights) rather than the entire object. If using Hugging Face models, I use **Safetensors** format which guarantees no code execution."

#### Q2: "Difference between `pickle` and `json`?"
**Answer**: "JSON is text-based, human-readable, language-agnostic, and secure (it's just data). Pickle is binary, Python-specific, and insecure (it can execute code). Use JSON for configs/metadata, and Pickle/Joblib only for trusted model artifacts."