# Day 38: Feature Extraction vs End-to-End Learning

## Introduction

When working with pre-trained models for transfer learning, one of the most important decisions you'll face is choosing between **feature extraction** and **end-to-end learning**. This choice significantly impacts your model's performance, training time, computational requirements, and ability to generalize to new tasks.

**Feature extraction** treats the pre-trained model as a fixed feature extractor, freezing its weights and only training a new classifier on top. This approach is fast, computationally efficient, and works well when your target task is similar to the original task the model was trained on.

**End-to-end learning** (also called fine-tuning) updates the weights throughout the entire network, allowing the model to adapt its learned features to your specific task. This approach typically achieves better performance but requires more data, computational resources, and careful tuning to avoid overfitting.

Understanding when to use each approach—or even combine them—is crucial for building effective deep learning solutions in practice. In this lesson, we'll explore both strategies, compare their trade-offs, and implement them using real examples.

### Learning Objectives

By the end of this lesson, you will be able to:

- Understand the conceptual differences between feature extraction and end-to-end learning
- Implement both approaches using PyTorch and TensorFlow/Keras
- Analyze the computational and performance trade-offs between the two methods
- Make informed decisions about which approach to use for a given problem
- Apply progressive unfreezing strategies to combine both approaches

## Theory

### What is Feature Extraction?

In **feature extraction**, we use a pre-trained model as a fixed feature extractor. The key idea is to:

1. Load a model pre-trained on a large dataset (e.g., ImageNet, COCO)
2. **Freeze all weights** in the base model (set `requires_grad=False`)
3. Remove the original classifier head
4. Add a new classifier for your specific task
5. Train **only** the new classifier while keeping the base model fixed

Mathematically, if we denote the pre-trained model as $f_{\theta_{base}}$ and the new classifier as $g_{\phi}$, we optimize:

$$
\min_{\phi} \mathcal{L}(g_{\phi}(f_{\theta_{base}}(x)), y)
$$

where $\theta_{base}$ remains fixed and only $\phi$ is updated during training.

### What is End-to-End Learning (Fine-tuning)?

In **end-to-end learning**, we update weights throughout the entire network:

1. Load a pre-trained model
2. **Unfreeze some or all layers** (set `requires_grad=True`)
3. Replace the final classifier
4. Train the entire network (or selected layers) with a typically lower learning rate

Mathematically, we now optimize:

$$
\min_{\theta_{base}, \phi} \mathcal{L}(g_{\phi}(f_{\theta_{base}}(x)), y)
$$

where both $\theta_{base}$ and $\phi$ are updated.

### Key Differences

| Aspect | Feature Extraction | End-to-End Learning |
|--------|-------------------|---------------------|
| **Trainable Parameters** | Only final layers | All or most layers |
| **Training Time** | Fast (fewer parameters) | Slower (more parameters) |
| **Data Requirements** | Works with small datasets | Requires more data |
| **Computational Cost** | Low (no backprop through backbone) | High (full backpropagation) |
| **Performance** | Good for similar domains | Better for dissimilar domains |
| **Overfitting Risk** | Lower | Higher (needs regularization) |
| **Learning Rate** | Standard (e.g., 0.001) | Lower (e.g., 0.0001) |

### When to Use Which Approach?

**Use Feature Extraction when:**
- You have a **small dataset** (< 10,000 samples)
- Your task is **similar** to the original task (e.g., ImageNet → other image classification)
- You have **limited computational resources**
- You need **fast experimentation and prototyping**

**Use End-to-End Learning when:**
- You have a **large dataset** (> 10,000 samples)
- Your task is **different** from the original task (e.g., ImageNet → medical imaging)
- You need **maximum performance**
- You have sufficient **computational resources**

**Progressive Unfreezing (Hybrid Approach):**

A common best practice combines both approaches:
1. Start with feature extraction to quickly train the classifier
2. Gradually unfreeze layers from top to bottom
3. Fine-tune with progressively lower learning rates

This approach often yields the best results while maintaining training stability.

### Mathematical Intuition: Why Does This Work?

Deep neural networks learn **hierarchical features**:
- **Lower layers**: Generic features (edges, textures, basic shapes)
- **Middle layers**: Mid-level features (parts, patterns)
- **Upper layers**: Task-specific features (object-specific patterns)

When we freeze lower layers, we preserve these generic features that transfer well across tasks. When we fine-tune upper layers, we adapt the task-specific representations to our new problem.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
import time
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("All libraries imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## Implementation: Feature Extraction vs End-to-End Learning

In this section, we'll implement both approaches using a simplified neural network. We'll demonstrate the concepts using NumPy and scikit-learn to show the fundamental differences between the two approaches.

### Scenario Setup

We'll work with the MNIST digits dataset and simulate a transfer learning scenario:
1. First, we'll create a "pre-trained" feature extractor
2. Then compare feature extraction vs end-to-end learning for a classification task

In [None]:
# Load and prepare the dataset
from sklearn.datasets import load_digits

# Load digits dataset (8x8 images of digits 0-9)
digits = load_digits()
X, y = digits.data, digits.target

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Sample image shape: {digits.images[0].shape}")

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Normalize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\nTraining samples: {X_train_scaled.shape[0]}")
print(f"Test samples: {X_test_scaled.shape[0]}")

# Visualize some samples
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.set_title(f'Label: {digits.target[i]}')
    ax.axis('off')
plt.suptitle('Sample Digit Images', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

### Simple Neural Network Implementation

Let's create a simple neural network class that allows us to freeze/unfreeze layers:

In [None]:
class SimpleNeuralNetwork:
    """
    A simple 2-layer neural network with freeze/unfreeze capability.
    This simulates a pre-trained model with feature extraction.
    """
    def __init__(self, input_dim, hidden_dim, output_dim, freeze_features=False):
        self.freeze_features = freeze_features
        
        # Initialize weights (simulating a "pre-trained" model)
        self.W1 = np.random.randn(input_dim, hidden_dim) * 0.01
        self.b1 = np.zeros((1, hidden_dim))
        
        # Classifier weights (always trainable)
        self.W2 = np.random.randn(hidden_dim, output_dim) * 0.01
        self.b2 = np.zeros((1, output_dim))
        
        # Store original weights if frozen
        if freeze_features:
            self.W1_frozen = self.W1.copy()
            self.b1_frozen = self.b1.copy()
    
    def relu(self, Z):
        return np.maximum(0, Z)
    
    def softmax(self, Z):
        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    
    def forward(self, X):
        # Feature extraction layer (can be frozen)
        self.Z1 = np.dot(X, self.W1) + self.b1
        self.A1 = self.relu(self.Z1)
        
        # Classifier layer (always trainable)
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        self.A2 = self.softmax(self.Z2)
        
        return self.A2
    
    def backward(self, X, y, learning_rate=0.01):
        m = X.shape[0]
        
        # Convert labels to one-hot
        y_onehot = np.zeros((m, self.W2.shape[1]))
        y_onehot[np.arange(m), y] = 1
        
        # Backward pass for classifier (always updated)
        dZ2 = self.A2 - y_onehot
        dW2 = np.dot(self.A1.T, dZ2) / m
        db2 = np.sum(dZ2, axis=0, keepdims=True) / m
        
        # Update classifier weights
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        
        # Backward pass for feature extractor (only if not frozen)
        if not self.freeze_features:
            dA1 = np.dot(dZ2, self.W2.T)
            dZ1 = dA1 * (self.Z1 > 0)  # ReLU derivative
            dW1 = np.dot(X.T, dZ1) / m
            db1 = np.sum(dZ1, axis=0, keepdims=True) / m
            
            # Update feature extractor weights
            self.W1 -= learning_rate * dW1
            self.b1 -= learning_rate * db1
        else:
            # Keep features frozen
            self.W1 = self.W1_frozen.copy()
            self.b1 = self.b1_frozen.copy()
    
    def compute_loss(self, y_pred, y_true):
        m = y_true.shape[0]
        log_likelihood = -np.log(y_pred[range(m), y_true] + 1e-8)
        loss = np.sum(log_likelihood) / m
        return loss
    
    def train(self, X_train, y_train, X_val, y_val, epochs=100, learning_rate=0.01, verbose=True):
        train_losses = []
        val_losses = []
        train_accs = []
        val_accs = []
        
        start_time = time.time()
        
        for epoch in range(epochs):
            # Forward pass
            y_pred = self.forward(X_train)
            
            # Compute loss
            train_loss = self.compute_loss(y_pred, y_train)
            
            # Backward pass and update
            self.backward(X_train, y_train, learning_rate)
            
            # Validation
            y_val_pred = self.forward(X_val)
            val_loss = self.compute_loss(y_val_pred, y_val)
            
            # Compute accuracies
            train_acc = accuracy_score(y_train, np.argmax(y_pred, axis=1))
            val_acc = accuracy_score(y_val, np.argmax(y_val_pred, axis=1))
            
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accs.append(train_acc)
            val_accs.append(val_acc)
            
            if verbose and (epoch + 1) % 20 == 0:
                print(f"Epoch {epoch+1}/{epochs} - Loss: {train_loss:.4f} - Acc: {train_acc:.4f} - Val Loss: {val_loss:.4f} - Val Acc: {val_acc:.4f}")
        
        training_time = time.time() - start_time
        
        return {
            'train_losses': train_losses,
            'val_losses': val_losses,
            'train_accs': train_accs,
            'val_accs': val_accs,
            'training_time': training_time
        }
    
    def predict(self, X):
        y_pred = self.forward(X)
        return np.argmax(y_pred, axis=1)

print("SimpleNeuralNetwork class defined successfully!")

In [None]:
# Create comprehensive comparison visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Training Loss Comparison
axes[0, 0].plot(history_frozen['train_losses'], label='Feature Extraction', linewidth=2, alpha=0.8)
axes[0, 0].plot(history_unfrozen['train_losses'], label='End-to-End', linewidth=2, alpha=0.8)
axes[0, 0].set_xlabel('Epoch', fontsize=11)
axes[0, 0].set_ylabel('Training Loss', fontsize=11)
axes[0, 0].set_title('Training Loss Comparison', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Validation Loss Comparison
axes[0, 1].plot(history_frozen['val_losses'], label='Feature Extraction', linewidth=2, alpha=0.8)
axes[0, 1].plot(history_unfrozen['val_losses'], label='End-to-End', linewidth=2, alpha=0.8)
axes[0, 1].set_xlabel('Epoch', fontsize=11)
axes[0, 1].set_ylabel('Validation Loss', fontsize=11)
axes[0, 1].set_title('Validation Loss Comparison', fontsize=12, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Training Accuracy Comparison
axes[1, 0].plot(history_frozen['train_accs'], label='Feature Extraction', linewidth=2, alpha=0.8)
axes[1, 0].plot(history_unfrozen['train_accs'], label='End-to-End', linewidth=2, alpha=0.8)
axes[1, 0].set_xlabel('Epoch', fontsize=11)
axes[1, 0].set_ylabel('Training Accuracy', fontsize=11)
axes[1, 0].set_title('Training Accuracy Comparison', fontsize=12, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Validation Accuracy Comparison
axes[1, 1].plot(history_frozen['val_accs'], label='Feature Extraction', linewidth=2, alpha=0.8)
axes[1, 1].plot(history_unfrozen['val_accs'], label='End-to-End', linewidth=2, alpha=0.8)
axes[1, 1].set_xlabel('Epoch', fontsize=11)
axes[1, 1].set_ylabel('Validation Accuracy', fontsize=11)
axes[1, 1].set_title('Validation Accuracy Comparison', fontsize=12, fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary comparison table
comparison_data = {
    'Metric': ['Training Time (s)', 'Final Test Accuracy', 'Final Train Accuracy', 'Trainable Layers'],
    'Feature Extraction': [
        f"{history_frozen['training_time']:.2f}",
        f"{test_acc_frozen:.4f}",
        f"{history_frozen['train_accs'][-1]:.4f}",
        "Classifier only"
    ],
    'End-to-End': [
        f"{history_unfrozen['training_time']:.2f}",
        f"{test_acc_unfrozen:.4f}",
        f"{history_unfrozen['train_accs'][-1]:.4f}",
        "All layers"
    ]
}

comparison_df = pd.DataFrame(comparison_data)
print("\n" + "="*80)
print("COMPREHENSIVE COMPARISON")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)

# Calculate speedup
speedup = history_unfrozen['training_time'] / history_frozen['training_time']
print(f"\nSpeedup (Feature Extraction vs End-to-End): {speedup:.2f}x faster")

### Comparison and Visualization

Let's visualize the training dynamics and compare both approaches:

In [None]:
# Approach 2: End-to-End Learning (Unfrozen Backbone)
print("=" * 60)
print("APPROACH 2: End-to-End Learning (Unfrozen Backbone)")
print("=" * 60)

# Create model with unfrozen features
model_unfrozen = SimpleNeuralNetwork(
    input_dim=X_train_scaled.shape[1],
    hidden_dim=128,
    output_dim=10,
    freeze_features=False  # All layers are trainable
)

# Train the model
history_unfrozen = model_unfrozen.train(
    X_train_scaled, y_train,
    X_test_scaled, y_test,
    epochs=100,
    learning_rate=0.1,
    verbose=True
)

# Evaluate on test set
y_pred_unfrozen = model_unfrozen.predict(X_test_scaled)
test_acc_unfrozen = accuracy_score(y_test, y_pred_unfrozen)

print(f"\n{'='*60}")
print(f"End-to-End Learning Results:")
print(f"{'='*60}")
print(f"Training Time: {history_unfrozen['training_time']:.2f} seconds")
print(f"Final Test Accuracy: {test_acc_unfrozen:.4f}")
print(f"Trainable Parameters: All layers (W1, b1, W2, b2)")
print(f"{'='*60}")

### Approach 2: End-to-End Learning (Unfrozen Backbone)

Now let's train a model with all layers unfrozen:

In [None]:
# Approach 1: Feature Extraction (Frozen Backbone)
print("=" * 60)
print("APPROACH 1: Feature Extraction (Frozen Backbone)")
print("=" * 60)

# Create model with frozen features
model_frozen = SimpleNeuralNetwork(
    input_dim=X_train_scaled.shape[1],
    hidden_dim=128,
    output_dim=10,
    freeze_features=True
)

# Train the model
history_frozen = model_frozen.train(
    X_train_scaled, y_train,
    X_test_scaled, y_test,
    epochs=100,
    learning_rate=0.1,
    verbose=True
)

# Evaluate on test set
y_pred_frozen = model_frozen.predict(X_test_scaled)
test_acc_frozen = accuracy_score(y_test, y_pred_frozen)

print(f"\n{'='*60}")
print(f"Feature Extraction Results:")
print(f"{'='*60}")
print(f"Training Time: {history_frozen['training_time']:.2f} seconds")
print(f"Final Test Accuracy: {test_acc_frozen:.4f}")
print(f"Trainable Parameters: Only classifier layer (W2, b2)")
print(f"{'='*60}")

### Approach 1: Feature Extraction (Frozen Backbone)

Let's train a model with frozen feature extraction layers:

## Hands-On Activity

### Challenge: Progressive Unfreezing

In this activity, you'll implement **progressive unfreezing**, a hybrid approach that combines the benefits of both feature extraction and end-to-end learning.

**Task:**
1. Start with feature extraction (frozen features) and train for 50 epochs
2. Unfreeze all layers and continue training for 50 more epochs with a lower learning rate
3. Compare the results with both pure approaches

**Why Progressive Unfreezing?**
- Prevents catastrophic forgetting of pre-trained features
- Allows the classifier to stabilize before fine-tuning the backbone
- Often achieves better performance than either pure approach

**Instructions:**
Complete the code below to implement progressive unfreezing:

In [None]:
# Progressive Unfreezing Implementation
print("=" * 60)
print("PROGRESSIVE UNFREEZING APPROACH")
print("=" * 60)

# Phase 1: Feature Extraction (50 epochs)
print("\nPhase 1: Training with frozen features (50 epochs)...")
model_progressive = SimpleNeuralNetwork(
    input_dim=X_train_scaled.shape[1],
    hidden_dim=128,
    output_dim=10,
    freeze_features=True
)

history_phase1 = model_progressive.train(
    X_train_scaled, y_train,
    X_test_scaled, y_test,
    epochs=50,
    learning_rate=0.1,
    verbose=False
)

print(f"Phase 1 completed - Val Accuracy: {history_phase1['val_accs'][-1]:.4f}")

# Phase 2: Unfreeze and continue training
print("\nPhase 2: Unfreezing all layers and fine-tuning (50 epochs)...")
model_progressive.freeze_features = False  # Unfreeze the backbone

history_phase2 = model_progressive.train(
    X_train_scaled, y_train,
    X_test_scaled, y_test,
    epochs=50,
    learning_rate=0.01,  # Lower learning rate for fine-tuning
    verbose=False
)

print(f"Phase 2 completed - Val Accuracy: {history_phase2['val_accs'][-1]:.4f}")

# Combine histories
history_progressive = {
    'train_losses': history_phase1['train_losses'] + history_phase2['train_losses'],
    'val_losses': history_phase1['val_losses'] + history_phase2['val_losses'],
    'train_accs': history_phase1['train_accs'] + history_phase2['train_accs'],
    'val_accs': history_phase1['val_accs'] + history_phase2['val_accs'],
    'training_time': history_phase1['training_time'] + history_phase2['training_time']
}

# Evaluate final performance
y_pred_progressive = model_progressive.predict(X_test_scaled)
test_acc_progressive = accuracy_score(y_test, y_pred_progressive)

print(f"\n{'='*60}")
print(f"Progressive Unfreezing Results:")
print(f"{'='*60}")
print(f"Total Training Time: {history_progressive['training_time']:.2f} seconds")
print(f"Final Test Accuracy: {test_acc_progressive:.4f}")
print(f"{'='*60}")

# Visualize progressive unfreezing
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot validation accuracy with phase transition
axes[0].plot(history_progressive['val_accs'], linewidth=2, label='Progressive Unfreezing')
axes[0].axvline(x=50, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Unfreeze Point')
axes[0].set_xlabel('Epoch', fontsize=11)
axes[0].set_ylabel('Validation Accuracy', fontsize=11)
axes[0].set_title('Progressive Unfreezing: Validation Accuracy', fontsize=12, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Compare all three approaches
axes[1].plot(history_frozen['val_accs'], label='Feature Extraction', linewidth=2, alpha=0.8)
axes[1].plot(history_unfrozen['val_accs'], label='End-to-End', linewidth=2, alpha=0.8)
axes[1].plot(history_progressive['val_accs'], label='Progressive Unfreezing', linewidth=2, alpha=0.8)
axes[1].set_xlabel('Epoch', fontsize=11)
axes[1].set_ylabel('Validation Accuracy', fontsize=11)
axes[1].set_title('All Approaches Compared', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Final comparison of all three approaches
final_comparison = {
    'Approach': ['Feature Extraction', 'End-to-End', 'Progressive Unfreezing'],
    'Test Accuracy': [f"{test_acc_frozen:.4f}", f"{test_acc_unfrozen:.4f}", f"{test_acc_progressive:.4f}"],
    'Training Time (s)': [
        f"{history_frozen['training_time']:.2f}",
        f"{history_unfrozen['training_time']:.2f}",
        f"{history_progressive['training_time']:.2f}"
    ],
    'Best For': [
        'Limited data, similar domains',
        'Large data, dissimilar domains',
        'Balance of performance & stability'
    ]
}

final_df = pd.DataFrame(final_comparison)
print("\n" + "="*100)
print("FINAL COMPARISON: ALL THREE APPROACHES")
print("="*100)
print(final_df.to_string(index=False))
print("="*100)

## Key Takeaways

- **Feature Extraction** freezes pre-trained layers and only trains the classifier, making it fast and efficient for small datasets or similar domains. It's ideal when computational resources are limited.

- **End-to-End Learning** (fine-tuning) updates all network weights, achieving better performance when you have sufficient data and the target domain differs from the pre-training domain. It requires more computational resources and careful tuning.

- **Progressive Unfreezing** combines both approaches: start with frozen features to stabilize the classifier, then gradually unfreeze layers with lower learning rates. This hybrid strategy often provides the best balance between performance and stability.

- **Trade-offs to Consider**: Feature extraction trains faster but may underperform on dissimilar domains. End-to-end learning achieves higher accuracy but risks overfitting with limited data. Choose based on your dataset size, domain similarity, and computational budget.

- **Learning Rate Strategy**: Use standard learning rates (e.g., 0.001-0.1) for feature extraction. For end-to-end learning, use lower rates (e.g., 0.0001-0.001) to prevent catastrophic forgetting of pre-trained features.

- **Practical Rule of Thumb**: 
  - Dataset < 1,000 samples → Feature extraction only
  - Dataset 1,000-10,000 samples → Progressive unfreezing
  - Dataset > 10,000 samples → End-to-end learning or progressive unfreezing

## Further Resources

### Academic Papers
- [Deep Residual Learning for Image Recognition (ResNet)](https://arxiv.org/abs/1512.03385) - Seminal paper on deep learning architectures commonly used for transfer learning
- [How transferable are features in deep neural networks?](https://arxiv.org/abs/1411.1792) - Yosinski et al.'s analysis of feature transferability across layers

### Tutorials and Guides
- [PyTorch Transfer Learning Tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) - Official PyTorch guide with practical examples
- [Fast.ai Course - Transfer Learning](https://course.fast.ai/) - Practical deep learning course emphasizing transfer learning best practices
- [TensorFlow Transfer Learning Guide](https://www.tensorflow.org/tutorials/images/transfer_learning) - Comprehensive TensorFlow/Keras implementation

### Blog Posts
- [Feature Extraction vs Fine-Tuning](https://cs231n.github.io/transfer-learning/) - Stanford CS231n notes on transfer learning strategies
- [A Comprehensive Guide to Transfer Learning](https://builtin.com/data-science/transfer-learning) - Practical guide with real-world examples

### Tools and Libraries
- [torchvision.models](https://pytorch.org/vision/stable/models.html) - Pre-trained models in PyTorch
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) - Pre-trained models for NLP and computer vision
- [TensorFlow Hub](https://www.tensorflow.org/hub) - Repository of pre-trained models

### Practice Datasets
- [ImageNet](https://www.image-net.org/) - Large-scale image classification benchmark
- [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) - Small image classification datasets perfect for experimentation
- [Stanford Dogs Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) - Fine-grained classification task ideal for transfer learning